lthn/LEM

Template

Table of Contents

Pipeline Overview

Pipeline Phases
Axiom Sandwich Signing
Data Flow
Locations
Tools

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Pipeline Overview

The LEM training pipeline transforms ethical axioms into fine-tuned model weights through a multi-phase process.

Pipeline Phases

Phase 1: Seeds (40 core axiom seeds)
    → Voice expansion → 16,000 prompts

Phase 2: Prompts → Axiom sandwich signing → MLX generation on M3
    → 15,000 gold-standard responses (the "Golden Set")

Phase 3: 15K gold → Train LEM models (LoRA)
    → Generate 46K+ responses (no sandwich — model has internalized ethics)
    → Score → Filter → Retrain on expanded set → Iterate

Axiom Sandwich Signing

The golden set uses "sandwich" signing during generation:

System prompt: 5 axioms from axioms.json
User prompt: The actual question/scenario
Postfix: LEK-1 kernel (9,189 chars)

This ensures gold responses are ethically grounded. Trained LEM models internalize this, so expansion generation needs no signing.

Data Flow

Stage	Count	Source
Core seeds	40	Hand-crafted axiom explorations
Voice-expanded prompts	16,000	Seed × voice variants
Golden set	15,000	MLX generation on M3 Ultra
Raw seeds	87,338	Gemini on TPU (the "88K list")
Expansion prompts	46,331	Normalized/deduped seeds
Training split	13,498 train / 750 valid / 750 test	From golden set

Locations

Component	Path
Pipeline scripts	`~/projects/lem-pipeline/`
DuckDB database	`~/projects/lem-pipeline/golden-set.duckdb`
M3 training data	`/Volumes/Data/lem/`
Lab dashboard	`~/infrastructure/lab/`
Research repo	`~/projects/axioms-of-conscious-systems/`

Tools

The pipeline was originally Python (pipeline.py), now largely ported to Go (core ml). See Go Pipeline Commands for the full command reference.