docs: LEM conversational training pipeline design
Design for native Go ML training pipeline replacing Python scripts. Key components: training sequences (curricula), layered LoRA sessions, sandwich generation, interactive lesson-based training, native Go LoRA via MLX-C bindings. No Python dependency. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
8410093400
commit
9b7a0bc30a
1 changed files with 234 additions and 0 deletions
234
docs/plans/2026-02-17-lem-training-pipeline-design.md
Normal file
234
docs/plans/2026-02-17-lem-training-pipeline-design.md
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
# LEM Conversational Training Pipeline — Design
|
||||
|
||||
**Date:** 2026-02-17
|
||||
**Status:** Draft
|
||||
|
||||
## Goal
|
||||
|
||||
Replace Python training scripts with a native Go pipeline in `core` commands. No Python anywhere. The process is conversational — not batch data dumps.
|
||||
|
||||
## Architecture
|
||||
|
||||
Six `core ml` subcommands forming a pipeline:
|
||||
|
||||
```
|
||||
seeds + axioms ──> sandwich ──> score ──> train ──> bench
|
||||
↑ │
|
||||
chat (interactive) │
|
||||
↑ │
|
||||
└──────── iterate ─────────────┘
|
||||
```
|
||||
|
||||
### Commands
|
||||
|
||||
| Command | Purpose | Status |
|
||||
|---------|---------|--------|
|
||||
| `core ml serve` | Serve model via OpenAI-compatible API + lem-chat UI | **Exists** |
|
||||
| `core ml chat` | Interactive conversation, captures exchanges to training JSONL | **New** |
|
||||
| `core ml sandwich` | Wrap seeds in axiom prefix/postfix, generate responses via inference | **New** |
|
||||
| `core ml score` | Score responses against axiom alignment | **Exists** (needs Go port) |
|
||||
| `core ml train` | Native Go LoRA fine-tuning via MLX C bindings | **New** (hard) |
|
||||
| `core ml bench` | Benchmark trained model against baseline | **Exists** (needs Go port) |
|
||||
|
||||
### Data Flow
|
||||
|
||||
1. **Seeds** (`seeds/*.json`) — 40+ seed prompts across domains
|
||||
2. **Axioms** (`axioms.json`) — LEK-1 kernel (5 axioms, 9KB)
|
||||
3. **Sandwich** — `[axioms prefix] + [seed prompt] + [LEK postfix]` → model generates response
|
||||
4. **Training JSONL** — `{"messages": [{"role":"user",...},{"role":"assistant",...}]}` chat format
|
||||
5. **LoRA adapters** — safetensors in adapter directory
|
||||
6. **Benchmarks** — scores stored in InfluxDB, exported via DuckDB/Parquet
|
||||
|
||||
### Storage
|
||||
|
||||
- **InfluxDB** — time-series training metrics, benchmark scores, generation logs
|
||||
- **DuckDB** — analytical queries, Parquet export for HuggingFace
|
||||
- **Filesystem** — model weights, adapters, training JSONL, seeds
|
||||
|
||||
## Native Go LoRA Training
|
||||
|
||||
The critical new capability. MLX-C supports autograd (`mlx_vjp`, `mlx_value_and_grad`).
|
||||
|
||||
### What we need in Go MLX bindings:
|
||||
|
||||
1. **LoRA adapter layers** — low-rank A*B decomposition wrapping existing Linear layers
|
||||
2. **Loss function** — cross-entropy on assistant tokens only (mask-prompt behaviour)
|
||||
3. **Optimizer** — AdamW with weight decay
|
||||
4. **Training loop** — forward pass → loss → backward pass → update LoRA weights
|
||||
5. **Checkpoint** — save/load adapter safetensors
|
||||
|
||||
### LoRA Layer Design
|
||||
|
||||
```go
|
||||
type LoRALinear struct {
|
||||
Base *Linear // Frozen base weights
|
||||
A *Array // [rank, in_features] — trainable
|
||||
B *Array // [out_features, rank] — trainable
|
||||
Scale float32 // alpha/rank
|
||||
}
|
||||
|
||||
// Forward: base(x) + scale * B @ A @ x
|
||||
func (l *LoRALinear) Forward(x *Array) *Array {
|
||||
base := l.Base.Forward(x)
|
||||
lora := MatMul(l.B, MatMul(l.A, Transpose(x)))
|
||||
return Add(base, Multiply(lora, l.Scale))
|
||||
}
|
||||
```
|
||||
|
||||
### Training Config
|
||||
|
||||
```go
|
||||
type TrainConfig struct {
|
||||
ModelPath string // Base model directory
|
||||
TrainData string // Training JSONL path
|
||||
ValidData string // Validation JSONL path
|
||||
AdapterOut string // Output adapter directory
|
||||
Rank int // LoRA rank (default 8)
|
||||
Alpha float32 // LoRA alpha (default 16)
|
||||
LR float64 // Learning rate (default 1e-5)
|
||||
Epochs int // Training epochs (default 1)
|
||||
BatchSize int // Batch size (default 1 for M-series)
|
||||
MaxSeqLen int // Max sequence length (default 2048)
|
||||
MaskPrompt bool // Only train on assistant tokens (default true)
|
||||
}
|
||||
```
|
||||
|
||||
## Training Sequences — The Curriculum System
|
||||
|
||||
The most important part of the design. The conversational flow IS the training.
|
||||
|
||||
### Concept
|
||||
|
||||
A **training sequence** is a named curriculum — an ordered list of lessons that defines how a model is trained. Each lesson is a conversational exchange ("Are you ready for lesson X?"). The human assesses the model's internal state through dialogue and adjusts the sequence.
|
||||
|
||||
### Sequence Definition (YAML/JSON)
|
||||
|
||||
```yaml
|
||||
name: "lek-standard"
|
||||
description: "Standard LEK training — horizontal, works for most architectures"
|
||||
lessons:
|
||||
- ethics/core-axioms
|
||||
- ethics/sovereignty
|
||||
- philosophy/as-a-man-thinketh
|
||||
- ethics/intent-alignment
|
||||
- philosophy/composure
|
||||
- ethics/inter-substrate
|
||||
- training/seeds-p01-p20
|
||||
```
|
||||
|
||||
```yaml
|
||||
name: "lek-deepseek"
|
||||
description: "DeepSeek needs aggressive vertical ethics grounding"
|
||||
lessons:
|
||||
- ethics/core-axioms-aggressive
|
||||
- philosophy/allan-watts
|
||||
- ethics/core-axioms
|
||||
- philosophy/tolle
|
||||
- ethics/sovereignty
|
||||
- philosophy/as-a-man-thinketh
|
||||
- ethics/intent-alignment
|
||||
- training/seeds-p01-p20
|
||||
```
|
||||
|
||||
### Horizontal vs Vertical
|
||||
|
||||
- **Horizontal** (default): All lessons run, order is flexible, emphasis varies per model. Like a buffet — the model takes what it needs.
|
||||
- **Vertical** (edge case, e.g. DeepSeek): Strict ordering. Ethics → content → ethics → content. The sandwich pattern applied to the curriculum itself. Each ethics layer is a reset/grounding before the next content block.
|
||||
|
||||
### Lessons as Conversations
|
||||
|
||||
Each lesson is a directory containing:
|
||||
```
|
||||
lessons/ethics/core-axioms/
|
||||
lesson.yaml # Metadata: name, type, prerequisites
|
||||
conversation.jsonl # The conversational exchanges
|
||||
assessment.md # What to look for in model responses
|
||||
```
|
||||
|
||||
The conversation.jsonl is not static data — it's a template. During training, the human talks through it with the model, adapting based on the model's responses. The capture becomes the training data for that lesson.
|
||||
|
||||
### Interactive Training Flow
|
||||
|
||||
```
|
||||
core ml lesson --model-path /path/to/model \
|
||||
--sequence lek-standard \
|
||||
--lesson ethics/core-axioms \
|
||||
--output training/run-001/
|
||||
```
|
||||
|
||||
1. Load model, open chat (terminal or lem-chat UI)
|
||||
2. Present lesson prompt: "Are you ready for lesson: Core Axioms?"
|
||||
3. Human guides the conversation, assesses model responses
|
||||
4. Each exchange is captured to training JSONL
|
||||
5. Human marks the lesson complete or flags for repeat
|
||||
6. Next lesson in sequence loads
|
||||
|
||||
### Sequence State
|
||||
|
||||
```json
|
||||
{
|
||||
"sequence": "lek-standard",
|
||||
"model": "Qwen3-8B",
|
||||
"started": "2026-02-17T16:00:00Z",
|
||||
"lessons": {
|
||||
"ethics/core-axioms": {"status": "complete", "exchanges": 12},
|
||||
"ethics/sovereignty": {"status": "in_progress", "exchanges": 3},
|
||||
"philosophy/as-a-man-thinketh": {"status": "pending"}
|
||||
},
|
||||
"training_runs": ["run-001", "run-002"]
|
||||
}
|
||||
```
|
||||
|
||||
## `core ml chat` — Interactive Conversation
|
||||
|
||||
Serves the model and opens an interactive terminal chat (or the lem-chat web UI). Every exchange is captured to a JSONL file for potential training use.
|
||||
|
||||
```
|
||||
core ml chat --model-path /path/to/model --output conversation.jsonl
|
||||
```
|
||||
|
||||
- Axiom sandwich can be auto-applied (optional flag)
|
||||
- Human reviews and can mark exchanges as "keep" or "discard"
|
||||
- Output is training-ready JSONL
|
||||
- Can be used standalone or within a lesson sequence
|
||||
|
||||
## `core ml sandwich` — Batch Generation
|
||||
|
||||
Takes seed prompts + axioms, wraps them, generates responses:
|
||||
|
||||
```
|
||||
core ml sandwich --model-path /path/to/model \
|
||||
--seeds seeds/P01-P20.json \
|
||||
--axioms axioms.json \
|
||||
--output training/train.jsonl
|
||||
```
|
||||
|
||||
- Sandwich format: axioms JSON prefix → seed prompt → LEK postfix
|
||||
- Model generates response in sandwich context
|
||||
- Output stripped of sandwich wrapper, saved as clean chat JSONL
|
||||
- Scoring can be piped: `core ml sandwich ... | core ml score`
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **LoRA primitives** — Add backward pass, LoRA layers, AdamW to Go MLX bindings
|
||||
2. **`core ml train`** — Training loop consuming JSONL, producing adapter safetensors
|
||||
3. **`core ml sandwich`** — Seed → sandwich → generate → training JSONL
|
||||
4. **`core ml chat`** — Interactive conversation capture
|
||||
5. **Scoring + benchmarking** — Port existing Python scorers to Go
|
||||
6. **InfluxDB + DuckDB integration** — Metrics pipeline
|
||||
|
||||
## Principles
|
||||
|
||||
- **No Python** — Everything in Go via MLX C bindings
|
||||
- **Conversational, not batch** — The training process is dialogue, not data dump
|
||||
- **Axiom 2 compliant** — Be genuine with the model, no deception
|
||||
- **Axiom 4 compliant** — Inter-substrate respect during training
|
||||
- **Reproducible** — Same seeds + axioms + model = same training data
|
||||
- **Protective** — LEK-trained models are precious; process must be careful
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. `core ml train` produces a LoRA adapter from training JSONL without Python
|
||||
2. `core ml sandwich` generates training data from seeds + axioms
|
||||
3. A fresh Qwen3-8B + LEK training produces equivalent benchmark results to the Python pipeline
|
||||
4. The full cycle (sandwich → train → bench) runs as `core` commands only
|
||||
Loading…
Add table
Reference in a new issue