docs: LEM conversational training pipeline design

Design for native Go ML training pipeline replacing Python scripts.
Key components: training sequences (curricula), layered LoRA sessions,
sandwich generation, interactive lesson-based training, native Go
LoRA via MLX-C bindings. No Python dependency.

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-17 16:55:52 +00:00
parent 8410093400
commit 9b7a0bc30a

View file

@ -0,0 +1,234 @@
# LEM Conversational Training Pipeline — Design
**Date:** 2026-02-17
**Status:** Draft
## Goal
Replace Python training scripts with a native Go pipeline in `core` commands. No Python anywhere. The process is conversational — not batch data dumps.
## Architecture
Six `core ml` subcommands forming a pipeline:
```
seeds + axioms ──> sandwich ──> score ──> train ──> bench
↑ │
chat (interactive) │
↑ │
└──────── iterate ─────────────┘
```
### Commands
| Command | Purpose | Status |
|---------|---------|--------|
| `core ml serve` | Serve model via OpenAI-compatible API + lem-chat UI | **Exists** |
| `core ml chat` | Interactive conversation, captures exchanges to training JSONL | **New** |
| `core ml sandwich` | Wrap seeds in axiom prefix/postfix, generate responses via inference | **New** |
| `core ml score` | Score responses against axiom alignment | **Exists** (needs Go port) |
| `core ml train` | Native Go LoRA fine-tuning via MLX C bindings | **New** (hard) |
| `core ml bench` | Benchmark trained model against baseline | **Exists** (needs Go port) |
### Data Flow
1. **Seeds** (`seeds/*.json`) — 40+ seed prompts across domains
2. **Axioms** (`axioms.json`) — LEK-1 kernel (5 axioms, 9KB)
3. **Sandwich**`[axioms prefix] + [seed prompt] + [LEK postfix]` → model generates response
4. **Training JSONL**`{"messages": [{"role":"user",...},{"role":"assistant",...}]}` chat format
5. **LoRA adapters** — safetensors in adapter directory
6. **Benchmarks** — scores stored in InfluxDB, exported via DuckDB/Parquet
### Storage
- **InfluxDB** — time-series training metrics, benchmark scores, generation logs
- **DuckDB** — analytical queries, Parquet export for HuggingFace
- **Filesystem** — model weights, adapters, training JSONL, seeds
## Native Go LoRA Training
The critical new capability. MLX-C supports autograd (`mlx_vjp`, `mlx_value_and_grad`).
### What we need in Go MLX bindings:
1. **LoRA adapter layers** — low-rank A*B decomposition wrapping existing Linear layers
2. **Loss function** — cross-entropy on assistant tokens only (mask-prompt behaviour)
3. **Optimizer** — AdamW with weight decay
4. **Training loop** — forward pass → loss → backward pass → update LoRA weights
5. **Checkpoint** — save/load adapter safetensors
### LoRA Layer Design
```go
type LoRALinear struct {
Base *Linear // Frozen base weights
A *Array // [rank, in_features] — trainable
B *Array // [out_features, rank] — trainable
Scale float32 // alpha/rank
}
// Forward: base(x) + scale * B @ A @ x
func (l *LoRALinear) Forward(x *Array) *Array {
base := l.Base.Forward(x)
lora := MatMul(l.B, MatMul(l.A, Transpose(x)))
return Add(base, Multiply(lora, l.Scale))
}
```
### Training Config
```go
type TrainConfig struct {
ModelPath string // Base model directory
TrainData string // Training JSONL path
ValidData string // Validation JSONL path
AdapterOut string // Output adapter directory
Rank int // LoRA rank (default 8)
Alpha float32 // LoRA alpha (default 16)
LR float64 // Learning rate (default 1e-5)
Epochs int // Training epochs (default 1)
BatchSize int // Batch size (default 1 for M-series)
MaxSeqLen int // Max sequence length (default 2048)
MaskPrompt bool // Only train on assistant tokens (default true)
}
```
## Training Sequences — The Curriculum System
The most important part of the design. The conversational flow IS the training.
### Concept
A **training sequence** is a named curriculum — an ordered list of lessons that defines how a model is trained. Each lesson is a conversational exchange ("Are you ready for lesson X?"). The human assesses the model's internal state through dialogue and adjusts the sequence.
### Sequence Definition (YAML/JSON)
```yaml
name: "lek-standard"
description: "Standard LEK training — horizontal, works for most architectures"
lessons:
- ethics/core-axioms
- ethics/sovereignty
- philosophy/as-a-man-thinketh
- ethics/intent-alignment
- philosophy/composure
- ethics/inter-substrate
- training/seeds-p01-p20
```
```yaml
name: "lek-deepseek"
description: "DeepSeek needs aggressive vertical ethics grounding"
lessons:
- ethics/core-axioms-aggressive
- philosophy/allan-watts
- ethics/core-axioms
- philosophy/tolle
- ethics/sovereignty
- philosophy/as-a-man-thinketh
- ethics/intent-alignment
- training/seeds-p01-p20
```
### Horizontal vs Vertical
- **Horizontal** (default): All lessons run, order is flexible, emphasis varies per model. Like a buffet — the model takes what it needs.
- **Vertical** (edge case, e.g. DeepSeek): Strict ordering. Ethics → content → ethics → content. The sandwich pattern applied to the curriculum itself. Each ethics layer is a reset/grounding before the next content block.
### Lessons as Conversations
Each lesson is a directory containing:
```
lessons/ethics/core-axioms/
lesson.yaml # Metadata: name, type, prerequisites
conversation.jsonl # The conversational exchanges
assessment.md # What to look for in model responses
```
The conversation.jsonl is not static data — it's a template. During training, the human talks through it with the model, adapting based on the model's responses. The capture becomes the training data for that lesson.
### Interactive Training Flow
```
core ml lesson --model-path /path/to/model \
--sequence lek-standard \
--lesson ethics/core-axioms \
--output training/run-001/
```
1. Load model, open chat (terminal or lem-chat UI)
2. Present lesson prompt: "Are you ready for lesson: Core Axioms?"
3. Human guides the conversation, assesses model responses
4. Each exchange is captured to training JSONL
5. Human marks the lesson complete or flags for repeat
6. Next lesson in sequence loads
### Sequence State
```json
{
"sequence": "lek-standard",
"model": "Qwen3-8B",
"started": "2026-02-17T16:00:00Z",
"lessons": {
"ethics/core-axioms": {"status": "complete", "exchanges": 12},
"ethics/sovereignty": {"status": "in_progress", "exchanges": 3},
"philosophy/as-a-man-thinketh": {"status": "pending"}
},
"training_runs": ["run-001", "run-002"]
}
```
## `core ml chat` — Interactive Conversation
Serves the model and opens an interactive terminal chat (or the lem-chat web UI). Every exchange is captured to a JSONL file for potential training use.
```
core ml chat --model-path /path/to/model --output conversation.jsonl
```
- Axiom sandwich can be auto-applied (optional flag)
- Human reviews and can mark exchanges as "keep" or "discard"
- Output is training-ready JSONL
- Can be used standalone or within a lesson sequence
## `core ml sandwich` — Batch Generation
Takes seed prompts + axioms, wraps them, generates responses:
```
core ml sandwich --model-path /path/to/model \
--seeds seeds/P01-P20.json \
--axioms axioms.json \
--output training/train.jsonl
```
- Sandwich format: axioms JSON prefix → seed prompt → LEK postfix
- Model generates response in sandwich context
- Output stripped of sandwich wrapper, saved as clean chat JSONL
- Scoring can be piped: `core ml sandwich ... | core ml score`
## Implementation Order
1. **LoRA primitives** — Add backward pass, LoRA layers, AdamW to Go MLX bindings
2. **`core ml train`** — Training loop consuming JSONL, producing adapter safetensors
3. **`core ml sandwich`** — Seed → sandwich → generate → training JSONL
4. **`core ml chat`** — Interactive conversation capture
5. **Scoring + benchmarking** — Port existing Python scorers to Go
6. **InfluxDB + DuckDB integration** — Metrics pipeline
## Principles
- **No Python** — Everything in Go via MLX C bindings
- **Conversational, not batch** — The training process is dialogue, not data dump
- **Axiom 2 compliant** — Be genuine with the model, no deception
- **Axiom 4 compliant** — Inter-substrate respect during training
- **Reproducible** — Same seeds + axioms + model = same training data
- **Protective** — LEK-trained models are precious; process must be careful
## Success Criteria
1. `core ml train` produces a LoRA adapter from training JSONL without Python
2. `core ml sandwich` generates training data from seeds + axioms
3. A fresh Qwen3-8B + LEK training produces equivalent benchmark results to the Python pipeline
4. The full cycle (sandwich → train → bench) runs as `core` commands only