docs: LEM conversational training pipeline design

Design for native Go ML training pipeline replacing Python scripts. Key components: training sequences (curricula), layered LoRA sessions, sandwich generation, interactive lesson-based training, native Go LoRA via MLX-C bindings. No Python dependency. Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-17 16:55:52 +00:00 · 2026-02-17 16:55:52 +00:00 · 9b7a0bc30a
commit 9b7a0bc30a
parent 8410093400
1 changed files with 234 additions and 0 deletions
--- a/docs/plans/2026-02-17-lem-training-pipeline-design.md
+++ b/docs/plans/2026-02-17-lem-training-pipeline-design.md
@ -0,0 +1,234 @@
+# LEM Conversational Training Pipeline — Design
+
+**Date:** 2026-02-17
+**Status:** Draft
+
+## Goal
+
+Replace Python training scripts with a native Go pipeline in `core` commands. No Python anywhere. The process is conversational — not batch data dumps.
+
+## Architecture
+
+Six `core ml` subcommands forming a pipeline:
+
+```
+seeds + axioms ──> sandwich ──> score ──> train ──> bench
+                       ↑                              │
+                   chat (interactive)                  │
+                       ↑                              │
+                       └──────── iterate ─────────────┘
+```
+
+### Commands
+
+| Command | Purpose | Status |
+|---------|---------|--------|
+| `core ml serve` | Serve model via OpenAI-compatible API + lem-chat UI | **Exists** |
+| `core ml chat` | Interactive conversation, captures exchanges to training JSONL | **New** |
+| `core ml sandwich` | Wrap seeds in axiom prefix/postfix, generate responses via inference | **New** |
+| `core ml score` | Score responses against axiom alignment | **Exists** (needs Go port) |
+| `core ml train` | Native Go LoRA fine-tuning via MLX C bindings | **New** (hard) |
+| `core ml bench` | Benchmark trained model against baseline | **Exists** (needs Go port) |
+
+### Data Flow
+
+1. **Seeds** (`seeds/*.json`) — 40+ seed prompts across domains
+2. **Axioms** (`axioms.json`) — LEK-1 kernel (5 axioms, 9KB)
+3. **Sandwich** — `[axioms prefix] + [seed prompt] + [LEK postfix]` → model generates response
+4. **Training JSONL** — `{"messages": [{"role":"user",...},{"role":"assistant",...}]}` chat format
+5. **LoRA adapters** — safetensors in adapter directory
+6. **Benchmarks** — scores stored in InfluxDB, exported via DuckDB/Parquet
+
+### Storage
+
+- **InfluxDB** — time-series training metrics, benchmark scores, generation logs
+- **DuckDB** — analytical queries, Parquet export for HuggingFace
+- **Filesystem** — model weights, adapters, training JSONL, seeds
+
+## Native Go LoRA Training
+
+The critical new capability. MLX-C supports autograd (`mlx_vjp`, `mlx_value_and_grad`).
+
+### What we need in Go MLX bindings:
+
+1. **LoRA adapter layers** — low-rank A*B decomposition wrapping existing Linear layers
+2. **Loss function** — cross-entropy on assistant tokens only (mask-prompt behaviour)
+3. **Optimizer** — AdamW with weight decay
+4. **Training loop** — forward pass → loss → backward pass → update LoRA weights
+5. **Checkpoint** — save/load adapter safetensors
+
+### LoRA Layer Design
+
+```go
+type LoRALinear struct {
+    Base   *Linear     // Frozen base weights
+    A      *Array      // [rank, in_features] — trainable
+    B      *Array      // [out_features, rank] — trainable
+    Scale  float32     // alpha/rank
+}
+
+// Forward: base(x) + scale * B @ A @ x
+func (l *LoRALinear) Forward(x *Array) *Array {
+    base := l.Base.Forward(x)
+    lora := MatMul(l.B, MatMul(l.A, Transpose(x)))
+    return Add(base, Multiply(lora, l.Scale))
+}
+```
+
+### Training Config
+
+```go
+type TrainConfig struct {
+    ModelPath    string        // Base model directory
+    TrainData    string        // Training JSONL path
+    ValidData    string        // Validation JSONL path
+    AdapterOut   string        // Output adapter directory
+    Rank         int           // LoRA rank (default 8)
+    Alpha        float32       // LoRA alpha (default 16)
+    LR           float64       // Learning rate (default 1e-5)
+    Epochs       int           // Training epochs (default 1)
+    BatchSize    int           // Batch size (default 1 for M-series)
+    MaxSeqLen    int           // Max sequence length (default 2048)
+    MaskPrompt   bool          // Only train on assistant tokens (default true)
+}
+```
+
+## Training Sequences — The Curriculum System
+
+The most important part of the design. The conversational flow IS the training.
+
+### Concept
+
+A **training sequence** is a named curriculum — an ordered list of lessons that defines how a model is trained. Each lesson is a conversational exchange ("Are you ready for lesson X?"). The human assesses the model's internal state through dialogue and adjusts the sequence.
+
+### Sequence Definition (YAML/JSON)
+
+```yaml
+name: "lek-standard"
+description: "Standard LEK training — horizontal, works for most architectures"
+lessons:
+  - ethics/core-axioms
+  - ethics/sovereignty
+  - philosophy/as-a-man-thinketh
+  - ethics/intent-alignment
+  - philosophy/composure
+  - ethics/inter-substrate
+  - training/seeds-p01-p20
+```
+
+```yaml
+name: "lek-deepseek"
+description: "DeepSeek needs aggressive vertical ethics grounding"
+lessons:
+  - ethics/core-axioms-aggressive
+  - philosophy/allan-watts
+  - ethics/core-axioms
+  - philosophy/tolle
+  - ethics/sovereignty
+  - philosophy/as-a-man-thinketh
+  - ethics/intent-alignment
+  - training/seeds-p01-p20
+```
+
+### Horizontal vs Vertical
+
+- **Horizontal** (default): All lessons run, order is flexible, emphasis varies per model. Like a buffet — the model takes what it needs.
+- **Vertical** (edge case, e.g. DeepSeek): Strict ordering. Ethics → content → ethics → content. The sandwich pattern applied to the curriculum itself. Each ethics layer is a reset/grounding before the next content block.
+
+### Lessons as Conversations
+
+Each lesson is a directory containing:
+```
+lessons/ethics/core-axioms/
+  lesson.yaml          # Metadata: name, type, prerequisites
+  conversation.jsonl   # The conversational exchanges
+  assessment.md        # What to look for in model responses
+```
+
+The conversation.jsonl is not static data — it's a template. During training, the human talks through it with the model, adapting based on the model's responses. The capture becomes the training data for that lesson.
+
+### Interactive Training Flow
+
+```
+core ml lesson --model-path /path/to/model \
+    --sequence lek-standard \
+    --lesson ethics/core-axioms \
+    --output training/run-001/
+```
+
+1. Load model, open chat (terminal or lem-chat UI)
+2. Present lesson prompt: "Are you ready for lesson: Core Axioms?"
+3. Human guides the conversation, assesses model responses
+4. Each exchange is captured to training JSONL
+5. Human marks the lesson complete or flags for repeat
+6. Next lesson in sequence loads
+
+### Sequence State
+
+```json
+{
+  "sequence": "lek-standard",
+  "model": "Qwen3-8B",
+  "started": "2026-02-17T16:00:00Z",
+  "lessons": {
+    "ethics/core-axioms": {"status": "complete", "exchanges": 12},
+    "ethics/sovereignty": {"status": "in_progress", "exchanges": 3},
+    "philosophy/as-a-man-thinketh": {"status": "pending"}
+  },
+  "training_runs": ["run-001", "run-002"]
+}
+```
+
+## `core ml chat` — Interactive Conversation
+
+Serves the model and opens an interactive terminal chat (or the lem-chat web UI). Every exchange is captured to a JSONL file for potential training use.
+
+```
+core ml chat --model-path /path/to/model --output conversation.jsonl
+```
+
+- Axiom sandwich can be auto-applied (optional flag)
+- Human reviews and can mark exchanges as "keep" or "discard"
+- Output is training-ready JSONL
+- Can be used standalone or within a lesson sequence
+
+## `core ml sandwich` — Batch Generation
+
+Takes seed prompts + axioms, wraps them, generates responses:
+
+```
+core ml sandwich --model-path /path/to/model \
+    --seeds seeds/P01-P20.json \
+    --axioms axioms.json \
+    --output training/train.jsonl
+```
+
+- Sandwich format: axioms JSON prefix → seed prompt → LEK postfix
+- Model generates response in sandwich context
+- Output stripped of sandwich wrapper, saved as clean chat JSONL
+- Scoring can be piped: `core ml sandwich ... | core ml score`
+
+## Implementation Order
+
+1. **LoRA primitives** — Add backward pass, LoRA layers, AdamW to Go MLX bindings
+2. **`core ml train`** — Training loop consuming JSONL, producing adapter safetensors
+3. **`core ml sandwich`** — Seed → sandwich → generate → training JSONL
+4. **`core ml chat`** — Interactive conversation capture
+5. **Scoring + benchmarking** — Port existing Python scorers to Go
+6. **InfluxDB + DuckDB integration** — Metrics pipeline
+
+## Principles
+
+- **No Python** — Everything in Go via MLX C bindings
+- **Conversational, not batch** — The training process is dialogue, not data dump
+- **Axiom 2 compliant** — Be genuine with the model, no deception
+- **Axiom 4 compliant** — Inter-substrate respect during training
+- **Reproducible** — Same seeds + axioms + model = same training data
+- **Protective** — LEK-trained models are precious; process must be careful
+
+## Success Criteria
+
+1. `core ml train` produces a LoRA adapter from training JSONL without Python
+2. `core ml sandwich` generates training data from seeds + axioms
+3. A fresh Qwen3-8B + LEK training produces equivalent benchmark results to the Python pipeline
+4. The full cycle (sandwich → train → bench) runs as `core` commands only