1 Golden Set
Claude edited this page 2026-02-23 19:41:13 +00:00

Golden Set

The 15,000-example gold-standard dataset that forms the foundation of all LEM training.

Generation Process

  • Generator: lem_generate.py on M3 Ultra
  • Worker: m3-gpu0, ~10 seconds per prompt
  • Model: Gemma 3 (various sizes) via native MLX
  • Signing: Axiom sandwich — system prompt (5 axioms) + user prompt + LEK-1 kernel postfix

DuckDB Schema

The golden set lives in golden-set.duckdb alongside related tables:

Table Rows Description
golden_set 15,000 Gold standard responses (axiom sandwich signed)
prompts 16,000 Voice-expanded prompts (from 40 seeds)
training 15,347 Formatted training examples
seeds 87,338 Raw seeds (88K list, Gemini on TPU)
expansion_prompts 46,331 Normalized/deduped seeds for expansion
gemini_responses 18,390 Gemini 2.5/3 Flash responses
benchmarks 4,138 Benchmark results
benchmark_questions 200 Benchmark question bank
validations 320 Validation results

Training Splits

Split Count
Train 13,498
Validation 750
Test 750

Location on M3: /Volumes/Data/lem/training-15k/

InfluxDB Metrics

Generation progress is tracked in InfluxDB (training database):

Measurement Tags Fields
gold_gen i, w, d, v seed_id, gen_time, chars
golden_gen_progress worker completed, target, pct
golden_set_stats total_examples, domains, voices, avg_gen_time