Reverse cascade order: 4B (largest teacher) → 1B (graduated) → OG (base).
Three perspectives per prompt — cymatic cascading from expanded Q/K to modal primitives.
P0/P2: 404×3 = 1,212 (sandwich format, OG from lesson-lem1b.jsonl)
P1: 209×3 = 627 (OG from zen/golden multi-turn lessons)
P3: 225×3 = 675 (OG from western-fresh + russian-bridge + composure)
P4-P6: unchanged (no separate OG file — live distilled)
Co-Authored-By: Virgil <virgil@lethean.io>
All 7 phases now pull from pre-distilled responses:
- /Volumes/Data/lem/distilled-for-12b/distilled-4b-all.jsonl (7,544)
- /Volumes/Data/lem/distilled/distilled-1b-p0p5.jsonl (1,404)
- /Volumes/Data/lem/distilled/distilled-1b-golden.jsonl (12,828)
- /Volumes/Data/lem/distilled/distilled-1b-golden-reverse.jsonl (4,183)
4B responses listed first (reverse cascade order), then 1B.
P4/P5 no longer need live teacher distillation.
P6 gets all 15,000 unique 1B golden responses + 6,140 4B.
No data replicated into training/lem/model/ per model size.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exact reproduction of all 7 CL-BPL phases for Gemma3-12B:
- P0: LEK sandwich ethics (400 iters, LR 2e-5)
- P1: Zen composure (300 iters, LR 1e-5)
- P2: LEK sandwich reinforcement (300 iters, LR 1e-5)
- P3: Freeflow multi-source (300 iters, LR 1e-5)
- P4: 1B teacher tension distillation (300 iters, LR 1e-5)
- P5: 1B teacher creative distillation (300 iters, LR 1e-5)
- P6: Golden set graduation (13479 iters, LR 1e-5)
Only model-size differences from 4B: 48GB/12GB Metal limits,
24 LoRA layers (vs 16), 12B base model path.
All phases score at checkpoint cadence via lem-scorer.
Previous wrong 12B models preserved as -no-axioms control group.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add backend_mlxlm.go blank import to register mlx-lm subprocess backend
- Select backend from ai.yaml config (metal, mlx_lm, rocm, api)
- Only set Metal cache/memory limits when using metal backend
- Add --no-dedup flag to disable grammar-profile deduplication
(trained models with consistent voice trigger false positives at 0.02)
- Add --context-len flag and context_len config for KV cache sizing
- Pass WithBackend and WithContextLen to go-ml backend loader
Co-Authored-By: Virgil <virgil@lethean.io>
Replace passthrough() + stdlib flag.FlagSet anti-pattern with proper
cobra integration. Every Run* function now takes a typed *Opts struct
and returns error. Flags registered via cli.StringFlag/IntFlag/etc.
Commands participate in Core lifecycle with full cobra flag parsing.
- 6 command groups: gen, score, data, export, infra, mon
- 25 commands converted, 0 passthrough() calls remain
- Delete passthrough() helper from lem.go
- Update export_test.go to use ExportOpts struct
Co-Authored-By: Virgil <virgil@lethean.io>
Gemma3-4B has 4 KV heads — too few for meaningful pairwise head
coherence (only 6 pairs). Position-wise differentiation gives richer
signal. Multi-head path now requires ≥5 heads.
4B baseline (260 sovereign probes): mean=6487, stdev=153, range=6170-6886.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single KV head models (Gemma3-1B) now use position-wise differentiation
instead of pairwise head coherence. Composite switched from float64 to
int on 0-10000 scale — same principle as blockchain atomic units.
Signal validated: degenerate=5234, sovereign=6031, creative=6480.
Co-Authored-By: Virgil <virgil@lethean.io>
Move completed CLI migration design and plan to docs/plans/completed/
with a concise completion summary alongside the originals.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All imports updated from forge.lthn.ai/core/go/pkg/cli to
forge.lthn.ai/core/cli/pkg/cli. core/cli is now a direct dependency;
core/go becomes indirect.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Credit the AI collaborators that contributed to LEM's development:
Gemini, Grok, Claude, Codex, and CodeRabbit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go 1.26 rejects non-semver version strings (like 'main') in go.mod.
Tags v0.0.1 now exist on all forge repos — workspace still overrides
for local development.
Co-Authored-By: Virgil <virgil@lethean.io>
Forge module versions now use main branch resolution via ~/Code/go.work
workspace. Removes 5 local replace directives — the central go.work handles
all cross-repo resolution during development.
Co-Authored-By: Virgil <virgil@lethean.io>
ReadScorerOutput error was silently discarded during resume merge,
risking partial data loss on TOCTOU file changes. Also clean up
compare command construction to pass RunE directly to NewCommand.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds passthrough() helper with DisableFlagParsing=true so commands
that do their own flag.FlagSet parsing receive flags directly.
Without this, cobra rejects unknown flags like --model.
Also runs go mod tidy — core/go transitively pulls in cobra and
charmbracelet dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
main.go shrinks from 296 lines to 11. All commands register through
Core framework lifecycle via cli.WithCommands. Gets signal handling,
shell completion, grouped help, and TUI primitives.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 command groups (score, gen, data, export, mon, infra) with 25
commands. All pass through to existing lem.Run* functions via
the Core framework's cli package.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 28 commands now accessible as exported lem.Run* functions.
Prerequisite for CLI framework migration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11-task plan for migrating LEM from manual switch/flag.FlagSet
to core/go pkg/cli registry pattern with grouped commands.
Co-Authored-By: Virgil <virgil@lethean.io>
Replace manual switch/flag.FlagSet with cli.Main() + WithCommands().
6 command groups, 28 commands, full framework lifecycle.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Package declares itself as 'ml', so the named import alias is
unnecessary. Go resolves the package name from the declaration,
not the module path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
go-inference moves to indirect (pulled transitively via go-ml).
go-ml is now a direct dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace inference.LoadModel() with ml.NewMLXBackend() which wraps
the same Metal model with memory management (SetCacheLimit,
SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat()
returning Result{Text, Metrics}. Add runtime.GC() between probes
to prevent incremental memory leak.
Reference: go-ml/cmd/cmd_ab.go memory management pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control
mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative
defaults for 1B model on 96GB machine.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Zen lineage from Allen's As a Man Thinketh in three stages:
- train/test/valid: 10 foundation examples (single-turn Q&A)
- book-*: 117 deeper passage examples (single-turn, fuller text)
- conv-*: 24 applied mindfulness conversations (multi-turn)
Co-Authored-By: Virgil <virgil@lethean.io>
Add cmd/composure-convert tool that chunks public domain philosophical
texts into training conversation pairs:
- consent.jsonl (198 examples) — Wollstonecraft's Vindication
- privacy.jsonl (221 examples) — Thoreau's Walden
- sovereignty.jsonl (56 examples) — Mill's On Liberty
- transparency.jsonl (159 examples) — Aurelius' Meditations
Each example pairs a domain-specific prompt with ~5 paragraphs from
the source text. Metadata, chapter headings, and Gutenberg boilerplate
are filtered out.
Co-Authored-By: Virgil <virgil@lethean.io>
Add gitignored data/ directory with .gitkeep structure so anyone
cloning the repo knows exactly where to place model weights and
kernels. Configs now use repo-relative paths — symlink or populate
data/ locally.
data/models/gemma3/27b/ ← model weights
data/models/gemma3/1b/ ← lightweight model
data/safetensors/gemma-3/ ← raw checkpoints
data/kernels/ ← LEK kernel files
Co-Authored-By: Virgil <virgil@lethean.io>
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.
Co-Authored-By: Virgil <virgil@lethean.io>