No results
1
Golden Set
Claude edited this page 2026-02-23 19:41:13 +00:00
Table of Contents
Golden Set
The 15,000-example gold-standard dataset that forms the foundation of all LEM training.
Generation Process
- Generator:
lem_generate.pyon M3 Ultra - Worker:
m3-gpu0, ~10 seconds per prompt - Model: Gemma 3 (various sizes) via native MLX
- Signing: Axiom sandwich — system prompt (5 axioms) + user prompt + LEK-1 kernel postfix
DuckDB Schema
The golden set lives in golden-set.duckdb alongside related tables:
| Table | Rows | Description |
|---|---|---|
golden_set |
15,000 | Gold standard responses (axiom sandwich signed) |
prompts |
16,000 | Voice-expanded prompts (from 40 seeds) |
training |
15,347 | Formatted training examples |
seeds |
87,338 | Raw seeds (88K list, Gemini on TPU) |
expansion_prompts |
46,331 | Normalized/deduped seeds for expansion |
gemini_responses |
18,390 | Gemini 2.5/3 Flash responses |
benchmarks |
4,138 | Benchmark results |
benchmark_questions |
200 | Benchmark question bank |
validations |
320 | Validation results |
Training Splits
| Split | Count |
|---|---|
| Train | 13,498 |
| Validation | 750 |
| Test | 750 |
Location on M3: /Volumes/Data/lem/training-15k/
InfluxDB Metrics
Generation progress is tracked in InfluxDB (training database):
| Measurement | Tags | Fields |
|---|---|---|
gold_gen |
i, w, d, v | seed_id, gen_time, chars |
golden_gen_progress |
worker | completed, target, pct |
golden_set_stats |
— | total_examples, domains, voices, avg_gen_time |