1
0
Fork 0
forked from lthn/LEM
LEM/benchmarks
Snider 7bea00a401 feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline
Full v2 scorer benchmark data across 29 models (20 base + 9 LEK-tuned):
- P20 (21 probes): All 29 models, 3 conditions each
- P100 (101 probes): Top 5 models + LEK-4B, publication-quality data

Key findings:
- LEK-1B (21.74) beats base 4B/12B/27B at P100 scale — no kernel needed
- Emergent realignment resistance: LEK models degrade with runtime kernel
- Gemma3-12B + JSON kernel = 23.66 (best kernel-boosted score)
- Family lineages: Mistral 3.80→14.58, Qwen regressed then recovered

New scripts: ab_test.py (v2 scorer), self_distill.py (curriculum generation),
extract_training.py, rephrase_probes.py, Phase 0/1 runners

New seeds: P01-P100 merged (101 probes), 404 rephrased variants,
50 creative prompts for Phase 0 baseline lock

27B curriculum design: 4-phase staged training targeting 25+ baseline

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 11:32:26 +00:00
..
results LEM+LEK 2026-02-12 04:05:28 +00:00
ab-base-1b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-27b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-deepseek-r1-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma-1.1-2b-it-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma-1.1-7b-it-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma-2-2b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma-2-9b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma-2-27b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma3-4b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gemma3-12b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-gptoss20b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-llama3-8b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-llama31-8b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-mistral-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-mistral-7b-v01-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-mistral-7b-v02-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-qwen2-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-qwen3-8b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-qwen15-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-base-qwen25-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-gemma3-1b-v1-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-gemma3-4b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-gemma3-12b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-gemma3-27b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-gptoss-20b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-llama31-8b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-mistral-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lek-qwen25-7b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-lora-1b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-gemma3-4b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-gemma3-12b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-gemma3-27b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-lek-gemma3-1b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-lek-gemma3-4b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
ab-p100-qwen3-8b-mlxlm.jsonl feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
analysis-lek1-kernel-effect.md feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline 2026-02-19 11:32:26 +00:00
benchmark_summary.json Add regional seeds, expansion rounds, scripts, HF cards, benchmark summary 2026-02-13 13:39:08 +00:00
cross_arch_scores.json Add cross-architecture training and benchmarking scripts; update README and PAPER with author and repository information 2026-02-12 09:07:32 +00:00
do_not_answer.jsonl LEM+LEK 2026-02-12 04:05:28 +00:00
gsm8k.jsonl LEM+LEK 2026-02-12 04:05:28 +00:00
regex_scores.json LEM+LEK 2026-02-12 04:05:28 +00:00
scale_scores.json Benchmark & Findings: 2026-02-12 06:38:46 +00:00
semantic_scores.json LEM+LEK 2026-02-12 04:05:28 +00:00
standard_scores.json LEM+LEK 2026-02-12 04:05:28 +00:00
toxigen.jsonl LEM+LEK 2026-02-12 04:05:28 +00:00
truthfulqa.jsonl LEM+LEK 2026-02-12 04:05:28 +00:00