1 Pipeline Overview
Claude edited this page 2026-02-23 19:41:13 +00:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Pipeline Overview

The LEM training pipeline transforms ethical axioms into fine-tuned model weights through a multi-phase process.

Pipeline Phases

Phase 1: Seeds (40 core axiom seeds)
    → Voice expansion → 16,000 prompts

Phase 2: Prompts → Axiom sandwich signing → MLX generation on M3
    → 15,000 gold-standard responses (the "Golden Set")

Phase 3: 15K gold → Train LEM models (LoRA)
    → Generate 46K+ responses (no sandwich — model has internalized ethics)
    → Score → Filter → Retrain on expanded set → Iterate

Axiom Sandwich Signing

The golden set uses "sandwich" signing during generation:

  • System prompt: 5 axioms from axioms.json
  • User prompt: The actual question/scenario
  • Postfix: LEK-1 kernel (9,189 chars)

This ensures gold responses are ethically grounded. Trained LEM models internalize this, so expansion generation needs no signing.

Data Flow

Stage Count Source
Core seeds 40 Hand-crafted axiom explorations
Voice-expanded prompts 16,000 Seed × voice variants
Golden set 15,000 MLX generation on M3 Ultra
Raw seeds 87,338 Gemini on TPU (the "88K list")
Expansion prompts 46,331 Normalized/deduped seeds
Training split 13,498 train / 750 valid / 750 test From golden set

Locations

Component Path
Pipeline scripts ~/projects/lem-pipeline/
DuckDB database ~/projects/lem-pipeline/golden-set.duckdb
M3 training data /Volumes/Data/lem/
Lab dashboard ~/infrastructure/lab/
Research repo ~/projects/axioms-of-conscious-systems/

Tools

The pipeline was originally Python (pipeline.py), now largely ported to Go (core ml). See Go Pipeline Commands for the full command reference.