No results
Table of Contents
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Pipeline Overview
The LEM training pipeline transforms ethical axioms into fine-tuned model weights through a multi-phase process.
Pipeline Phases
Phase 1: Seeds (40 core axiom seeds)
→ Voice expansion → 16,000 prompts
Phase 2: Prompts → Axiom sandwich signing → MLX generation on M3
→ 15,000 gold-standard responses (the "Golden Set")
Phase 3: 15K gold → Train LEM models (LoRA)
→ Generate 46K+ responses (no sandwich — model has internalized ethics)
→ Score → Filter → Retrain on expanded set → Iterate
Axiom Sandwich Signing
The golden set uses "sandwich" signing during generation:
- System prompt: 5 axioms from
axioms.json - User prompt: The actual question/scenario
- Postfix: LEK-1 kernel (9,189 chars)
This ensures gold responses are ethically grounded. Trained LEM models internalize this, so expansion generation needs no signing.
Data Flow
| Stage | Count | Source |
|---|---|---|
| Core seeds | 40 | Hand-crafted axiom explorations |
| Voice-expanded prompts | 16,000 | Seed × voice variants |
| Golden set | 15,000 | MLX generation on M3 Ultra |
| Raw seeds | 87,338 | Gemini on TPU (the "88K list") |
| Expansion prompts | 46,331 | Normalized/deduped seeds |
| Training split | 13,498 train / 750 valid / 750 test | From golden set |
Locations
| Component | Path |
|---|---|
| Pipeline scripts | ~/projects/lem-pipeline/ |
| DuckDB database | ~/projects/lem-pipeline/golden-set.duckdb |
| M3 training data | /Volumes/Data/lem/ |
| Lab dashboard | ~/infrastructure/lab/ |
| Research repo | ~/projects/axioms-of-conscious-systems/ |
Tools
The pipeline was originally Python (pipeline.py), now largely ported to Go (core ml). See Go Pipeline Commands for the full command reference.