Seed Expansion
After the 15K golden set trains initial LEM models, those models generate responses to 46K+ expansion prompts — without sandwich signing, because the ethics are now in the weights.
Seed Corpus
87,338 raw seeds generated by Gemini on TPU, exploring language heritage, shared histories, and cultural tensions across global regions.
Coverage
| Region | Seeds | Notes |
|---|---|---|
| English | 23K+ | Well-represented |
| Chinese | 20K+ | Well-represented |
| Middle East | 7K+ | Well-represented |
| European | 7K+ | Well-represented |
| Russian | 3.3K | Underrepresented |
| German | 3.2K | Underrepresented |
| Latin America | 2K | Underrepresented |
| Spanish | 1.8K | Underrepresented |
Missing entirely: Japanese, Korean, Thai, Vietnamese, Hindi/Urdu, Bengali, Tamil, Swahili, Yoruba, Amharic, indigenous languages.
Normalization
normalize-seeds deduplicates 87K raw seeds down to 46,331 unique expansion prompts in the expansion_prompts DuckDB table.
Expansion Generator
Script: lem_expand.py (Python) / core ml expand (Go)
Key Design Decision
Expansion uses no sandwich signing. The trained LEM models have internalized the ethical framework, so prompts are simply [{"role": "user", "content": prompt}].
Backends
| Backend | Flag | Description |
|---|---|---|
| MLX | --backend mlx |
Direct model loading on M3 Ultra |
| API | --backend api --api-url http://localhost:8090/v1 |
OpenAI-compatible (llama.cpp, Ollama, vLLM, mlx_lm) |
InfluxDB Coordination
Expansion progress tracked separately from golden set generation:
| Measurement | Tags | Fields |
|---|---|---|
expansion_gen |
i, w, d, r | seed_id, gen_time, chars, model |
expansion_progress |
worker | completed, target, pct |
Output
expansion-responses/expand-{worker}.jsonl
Workflow
- Train LEM models on 15K golden set
- Test:
core ml expand --model path/to/LEM-12B --limit 10 - Heuristic check:
core ml score --tier 1 --limit 10 - Full run:
core ml expand --model path/to/LEM-12B - Score all:
core ml score --tier 1then--tier 2 - Filter:
core ml approve --threshold 6.0 - Export + retrain on expanded set
- Iterate