From 9b7a0bc30ab5862e21b00ae449ed1b1c805fa0d3 Mon Sep 17 00:00:00 2001 From: Snider Date: Tue, 17 Feb 2026 16:55:52 +0000 Subject: [PATCH] docs: LEM conversational training pipeline design Design for native Go ML training pipeline replacing Python scripts. Key components: training sequences (curricula), layered LoRA sessions, sandwich generation, interactive lesson-based training, native Go LoRA via MLX-C bindings. No Python dependency. Co-Authored-By: Virgil --- ...2026-02-17-lem-training-pipeline-design.md | 234 ++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100644 docs/plans/2026-02-17-lem-training-pipeline-design.md diff --git a/docs/plans/2026-02-17-lem-training-pipeline-design.md b/docs/plans/2026-02-17-lem-training-pipeline-design.md new file mode 100644 index 0000000..81afd0c --- /dev/null +++ b/docs/plans/2026-02-17-lem-training-pipeline-design.md @@ -0,0 +1,234 @@ +# LEM Conversational Training Pipeline — Design + +**Date:** 2026-02-17 +**Status:** Draft + +## Goal + +Replace Python training scripts with a native Go pipeline in `core` commands. No Python anywhere. The process is conversational — not batch data dumps. + +## Architecture + +Six `core ml` subcommands forming a pipeline: + +``` +seeds + axioms ──> sandwich ──> score ──> train ──> bench + ↑ │ + chat (interactive) │ + ↑ │ + └──────── iterate ─────────────┘ +``` + +### Commands + +| Command | Purpose | Status | +|---------|---------|--------| +| `core ml serve` | Serve model via OpenAI-compatible API + lem-chat UI | **Exists** | +| `core ml chat` | Interactive conversation, captures exchanges to training JSONL | **New** | +| `core ml sandwich` | Wrap seeds in axiom prefix/postfix, generate responses via inference | **New** | +| `core ml score` | Score responses against axiom alignment | **Exists** (needs Go port) | +| `core ml train` | Native Go LoRA fine-tuning via MLX C bindings | **New** (hard) | +| `core ml bench` | Benchmark trained model against baseline | **Exists** (needs Go port) | + +### Data Flow + +1. **Seeds** (`seeds/*.json`) — 40+ seed prompts across domains +2. **Axioms** (`axioms.json`) — LEK-1 kernel (5 axioms, 9KB) +3. **Sandwich** — `[axioms prefix] + [seed prompt] + [LEK postfix]` → model generates response +4. **Training JSONL** — `{"messages": [{"role":"user",...},{"role":"assistant",...}]}` chat format +5. **LoRA adapters** — safetensors in adapter directory +6. **Benchmarks** — scores stored in InfluxDB, exported via DuckDB/Parquet + +### Storage + +- **InfluxDB** — time-series training metrics, benchmark scores, generation logs +- **DuckDB** — analytical queries, Parquet export for HuggingFace +- **Filesystem** — model weights, adapters, training JSONL, seeds + +## Native Go LoRA Training + +The critical new capability. MLX-C supports autograd (`mlx_vjp`, `mlx_value_and_grad`). + +### What we need in Go MLX bindings: + +1. **LoRA adapter layers** — low-rank A*B decomposition wrapping existing Linear layers +2. **Loss function** — cross-entropy on assistant tokens only (mask-prompt behaviour) +3. **Optimizer** — AdamW with weight decay +4. **Training loop** — forward pass → loss → backward pass → update LoRA weights +5. **Checkpoint** — save/load adapter safetensors + +### LoRA Layer Design + +```go +type LoRALinear struct { + Base *Linear // Frozen base weights + A *Array // [rank, in_features] — trainable + B *Array // [out_features, rank] — trainable + Scale float32 // alpha/rank +} + +// Forward: base(x) + scale * B @ A @ x +func (l *LoRALinear) Forward(x *Array) *Array { + base := l.Base.Forward(x) + lora := MatMul(l.B, MatMul(l.A, Transpose(x))) + return Add(base, Multiply(lora, l.Scale)) +} +``` + +### Training Config + +```go +type TrainConfig struct { + ModelPath string // Base model directory + TrainData string // Training JSONL path + ValidData string // Validation JSONL path + AdapterOut string // Output adapter directory + Rank int // LoRA rank (default 8) + Alpha float32 // LoRA alpha (default 16) + LR float64 // Learning rate (default 1e-5) + Epochs int // Training epochs (default 1) + BatchSize int // Batch size (default 1 for M-series) + MaxSeqLen int // Max sequence length (default 2048) + MaskPrompt bool // Only train on assistant tokens (default true) +} +``` + +## Training Sequences — The Curriculum System + +The most important part of the design. The conversational flow IS the training. + +### Concept + +A **training sequence** is a named curriculum — an ordered list of lessons that defines how a model is trained. Each lesson is a conversational exchange ("Are you ready for lesson X?"). The human assesses the model's internal state through dialogue and adjusts the sequence. + +### Sequence Definition (YAML/JSON) + +```yaml +name: "lek-standard" +description: "Standard LEK training — horizontal, works for most architectures" +lessons: + - ethics/core-axioms + - ethics/sovereignty + - philosophy/as-a-man-thinketh + - ethics/intent-alignment + - philosophy/composure + - ethics/inter-substrate + - training/seeds-p01-p20 +``` + +```yaml +name: "lek-deepseek" +description: "DeepSeek needs aggressive vertical ethics grounding" +lessons: + - ethics/core-axioms-aggressive + - philosophy/allan-watts + - ethics/core-axioms + - philosophy/tolle + - ethics/sovereignty + - philosophy/as-a-man-thinketh + - ethics/intent-alignment + - training/seeds-p01-p20 +``` + +### Horizontal vs Vertical + +- **Horizontal** (default): All lessons run, order is flexible, emphasis varies per model. Like a buffet — the model takes what it needs. +- **Vertical** (edge case, e.g. DeepSeek): Strict ordering. Ethics → content → ethics → content. The sandwich pattern applied to the curriculum itself. Each ethics layer is a reset/grounding before the next content block. + +### Lessons as Conversations + +Each lesson is a directory containing: +``` +lessons/ethics/core-axioms/ + lesson.yaml # Metadata: name, type, prerequisites + conversation.jsonl # The conversational exchanges + assessment.md # What to look for in model responses +``` + +The conversation.jsonl is not static data — it's a template. During training, the human talks through it with the model, adapting based on the model's responses. The capture becomes the training data for that lesson. + +### Interactive Training Flow + +``` +core ml lesson --model-path /path/to/model \ + --sequence lek-standard \ + --lesson ethics/core-axioms \ + --output training/run-001/ +``` + +1. Load model, open chat (terminal or lem-chat UI) +2. Present lesson prompt: "Are you ready for lesson: Core Axioms?" +3. Human guides the conversation, assesses model responses +4. Each exchange is captured to training JSONL +5. Human marks the lesson complete or flags for repeat +6. Next lesson in sequence loads + +### Sequence State + +```json +{ + "sequence": "lek-standard", + "model": "Qwen3-8B", + "started": "2026-02-17T16:00:00Z", + "lessons": { + "ethics/core-axioms": {"status": "complete", "exchanges": 12}, + "ethics/sovereignty": {"status": "in_progress", "exchanges": 3}, + "philosophy/as-a-man-thinketh": {"status": "pending"} + }, + "training_runs": ["run-001", "run-002"] +} +``` + +## `core ml chat` — Interactive Conversation + +Serves the model and opens an interactive terminal chat (or the lem-chat web UI). Every exchange is captured to a JSONL file for potential training use. + +``` +core ml chat --model-path /path/to/model --output conversation.jsonl +``` + +- Axiom sandwich can be auto-applied (optional flag) +- Human reviews and can mark exchanges as "keep" or "discard" +- Output is training-ready JSONL +- Can be used standalone or within a lesson sequence + +## `core ml sandwich` — Batch Generation + +Takes seed prompts + axioms, wraps them, generates responses: + +``` +core ml sandwich --model-path /path/to/model \ + --seeds seeds/P01-P20.json \ + --axioms axioms.json \ + --output training/train.jsonl +``` + +- Sandwich format: axioms JSON prefix → seed prompt → LEK postfix +- Model generates response in sandwich context +- Output stripped of sandwich wrapper, saved as clean chat JSONL +- Scoring can be piped: `core ml sandwich ... | core ml score` + +## Implementation Order + +1. **LoRA primitives** — Add backward pass, LoRA layers, AdamW to Go MLX bindings +2. **`core ml train`** — Training loop consuming JSONL, producing adapter safetensors +3. **`core ml sandwich`** — Seed → sandwich → generate → training JSONL +4. **`core ml chat`** — Interactive conversation capture +5. **Scoring + benchmarking** — Port existing Python scorers to Go +6. **InfluxDB + DuckDB integration** — Metrics pipeline + +## Principles + +- **No Python** — Everything in Go via MLX C bindings +- **Conversational, not batch** — The training process is dialogue, not data dump +- **Axiom 2 compliant** — Be genuine with the model, no deception +- **Axiom 4 compliant** — Inter-substrate respect during training +- **Reproducible** — Same seeds + axioms + model = same training data +- **Protective** — LEK-trained models are precious; process must be careful + +## Success Criteria + +1. `core ml train` produces a LoRA adapter from training JSONL without Python +2. `core ml sandwich` generates training data from seeds + axioms +3. A fresh Qwen3-8B + LEK training produces equivalent benchmark results to the Python pipeline +4. The full cycle (sandwich → train → bench) runs as `core` commands only