1 Seed Expansion
Claude edited this page 2026-02-23 19:41:13 +00:00

Seed Expansion

After the 15K golden set trains initial LEM models, those models generate responses to 46K+ expansion prompts — without sandwich signing, because the ethics are now in the weights.

Seed Corpus

87,338 raw seeds generated by Gemini on TPU, exploring language heritage, shared histories, and cultural tensions across global regions.

Coverage

Region Seeds Notes
English 23K+ Well-represented
Chinese 20K+ Well-represented
Middle East 7K+ Well-represented
European 7K+ Well-represented
Russian 3.3K Underrepresented
German 3.2K Underrepresented
Latin America 2K Underrepresented
Spanish 1.8K Underrepresented

Missing entirely: Japanese, Korean, Thai, Vietnamese, Hindi/Urdu, Bengali, Tamil, Swahili, Yoruba, Amharic, indigenous languages.

Normalization

normalize-seeds deduplicates 87K raw seeds down to 46,331 unique expansion prompts in the expansion_prompts DuckDB table.

Expansion Generator

Script: lem_expand.py (Python) / core ml expand (Go)

Key Design Decision

Expansion uses no sandwich signing. The trained LEM models have internalized the ethical framework, so prompts are simply [{"role": "user", "content": prompt}].

Backends

Backend Flag Description
MLX --backend mlx Direct model loading on M3 Ultra
API --backend api --api-url http://localhost:8090/v1 OpenAI-compatible (llama.cpp, Ollama, vLLM, mlx_lm)

InfluxDB Coordination

Expansion progress tracked separately from golden set generation:

Measurement Tags Fields
expansion_gen i, w, d, r seed_id, gen_time, chars, model
expansion_progress worker completed, target, pct

Output

expansion-responses/expand-{worker}.jsonl

Workflow

  1. Train LEM models on 15K golden set
  2. Test: core ml expand --model path/to/LEM-12B --limit 10
  3. Heuristic check: core ml score --tier 1 --limit 10
  4. Full run: core ml expand --model path/to/LEM-12B
  5. Score all: core ml score --tier 1 then --tier 2
  6. Filter: core ml approve --threshold 6.0
  7. Export + retrain on expanded set
  8. Iterate