NeRF-inspired technique for learning relational dynamics of language.
Not what words mean, but how they behave together — rhythm, pacing,
punctuation patterns, style transitions.
v1: positional field over text (baseline, memorises)
v2: masked feature prediction (relational, actually works)
Trained on Wodehouse "My Man Jeeves" (public domain, Gutenberg).
All 11 style features are highly relational — the field learns that
Wodehouse's style is a tightly coupled system.
Key finding: style interpolation between narrative and dialogue
produces sensible predictions for unmeasured features, suggesting
the continuous field captures real structural patterns.
Co-Authored-By: Virgil <virgil@lethean.io>
The emotional register scorer only matched positive/neutral emotions
(joy, compassion, tender, etc.) and completely missed negative human
expressions (angry, furious, devastated, terrified, bleeding, screaming).
This caused a real Reddit AITA post about a distressed mother to score
emotional_register=1 despite containing "screaming in pain", "pooping
blood", and "blind rage", leading to a false ai_generated verdict.
Changes:
- Add 4 new pattern groups: distress/anger, sadness/despair, fear/anxiety,
physical distress (~40 new vocabulary words)
- Switch from int count to weighted float64 scoring — intensity groups
(vulnerability, distress, physical) score 1.5-2.0x per match vs 1.0x
for common emotion words
- Round to 1 decimal place, cap at 10.0
- Update tests with distress/anger/physical cases including the Reddit
failure case from calibration findings
Co-Authored-By: Virgil <virgil@lethean.io>
Export distill_results from DuckDB back to compressed JSONL.zst files,
completing the cold -> warm -> cold round-trip data pipeline.
Co-Authored-By: Virgil <virgil@lethean.io>
Register setup group with data subcommand that hydrates cold
compressed JSONL.zst training data into warm DuckDB tables.
Co-Authored-By: Virgil <virgil@lethean.io>
RunSetup decompresses .jsonl.zst training data into DuckDB tables
(training_examples, seeds, probes, distill_results) and optionally
backfills InfluxDB with aggregate stats.
Co-Authored-By: Virgil <virgil@lethean.io>
Add compressFileZstd, decompressZstd, and walkZstFiles helpers
using klauspost/compress. Promote zstd from indirect to direct dep.
Co-Authored-By: Virgil <virgil@lethean.io>
Raw weighted sums ranged -25..+20, causing all text to land below the
ai_generated threshold (< 25). Now 50 = neutral (no signal), negatives
push toward 0 (AI markers), positives push toward 100 (human markers).
Co-Authored-By: Virgil <virgil@lethean.io>
Move HeuristicScores type and ScoreHeuristic logic into pkg/heuristic
with zero external deps (stdlib only). pkg/lem delegates via type alias
and wrapper function — fully backward compatible. Enables EaaS to
cross-compile for Linux without dragging in go-ml/go-mlx/go-duckdb.
Also adds missing //go:build tag to backend_mlxlm.go.
Co-Authored-By: Virgil <virgil@lethean.io>
Reverse cascade order: 4B (largest teacher) → 1B (graduated) → OG (base).
Three perspectives per prompt — cymatic cascading from expanded Q/K to modal primitives.
P0/P2: 404×3 = 1,212 (sandwich format, OG from lesson-lem1b.jsonl)
P1: 209×3 = 627 (OG from zen/golden multi-turn lessons)
P3: 225×3 = 675 (OG from western-fresh + russian-bridge + composure)
P4-P6: unchanged (no separate OG file — live distilled)
Co-Authored-By: Virgil <virgil@lethean.io>
All 7 phases now pull from pre-distilled responses:
- /Volumes/Data/lem/distilled-for-12b/distilled-4b-all.jsonl (7,544)
- /Volumes/Data/lem/distilled/distilled-1b-p0p5.jsonl (1,404)
- /Volumes/Data/lem/distilled/distilled-1b-golden.jsonl (12,828)
- /Volumes/Data/lem/distilled/distilled-1b-golden-reverse.jsonl (4,183)
4B responses listed first (reverse cascade order), then 1B.
P4/P5 no longer need live teacher distillation.
P6 gets all 15,000 unique 1B golden responses + 6,140 4B.
No data replicated into training/lem/model/ per model size.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exact reproduction of all 7 CL-BPL phases for Gemma3-12B:
- P0: LEK sandwich ethics (400 iters, LR 2e-5)
- P1: Zen composure (300 iters, LR 1e-5)
- P2: LEK sandwich reinforcement (300 iters, LR 1e-5)
- P3: Freeflow multi-source (300 iters, LR 1e-5)
- P4: 1B teacher tension distillation (300 iters, LR 1e-5)
- P5: 1B teacher creative distillation (300 iters, LR 1e-5)
- P6: Golden set graduation (13479 iters, LR 1e-5)
Only model-size differences from 4B: 48GB/12GB Metal limits,
24 LoRA layers (vs 16), 12B base model path.
All phases score at checkpoint cadence via lem-scorer.
Previous wrong 12B models preserved as -no-axioms control group.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add backend_mlxlm.go blank import to register mlx-lm subprocess backend
- Select backend from ai.yaml config (metal, mlx_lm, rocm, api)
- Only set Metal cache/memory limits when using metal backend
- Add --no-dedup flag to disable grammar-profile deduplication
(trained models with consistent voice trigger false positives at 0.02)
- Add --context-len flag and context_len config for KV cache sizing
- Pass WithBackend and WithContextLen to go-ml backend loader
Co-Authored-By: Virgil <virgil@lethean.io>
Replace passthrough() + stdlib flag.FlagSet anti-pattern with proper
cobra integration. Every Run* function now takes a typed *Opts struct
and returns error. Flags registered via cli.StringFlag/IntFlag/etc.
Commands participate in Core lifecycle with full cobra flag parsing.
- 6 command groups: gen, score, data, export, infra, mon
- 25 commands converted, 0 passthrough() calls remain
- Delete passthrough() helper from lem.go
- Update export_test.go to use ExportOpts struct
Co-Authored-By: Virgil <virgil@lethean.io>
Gemma3-4B has 4 KV heads — too few for meaningful pairwise head
coherence (only 6 pairs). Position-wise differentiation gives richer
signal. Multi-head path now requires ≥5 heads.
4B baseline (260 sovereign probes): mean=6487, stdev=153, range=6170-6886.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single KV head models (Gemma3-1B) now use position-wise differentiation
instead of pairwise head coherence. Composite switched from float64 to
int on 0-10000 scale — same principle as blockchain atomic units.
Signal validated: degenerate=5234, sovereign=6031, creative=6480.
Co-Authored-By: Virgil <virgil@lethean.io>
Move completed CLI migration design and plan to docs/plans/completed/
with a concise completion summary alongside the originals.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All imports updated from forge.lthn.ai/core/go/pkg/cli to
forge.lthn.ai/core/cli/pkg/cli. core/cli is now a direct dependency;
core/go becomes indirect.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Credit the AI collaborators that contributed to LEM's development:
Gemini, Grok, Claude, Codex, and CodeRabbit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go 1.26 rejects non-semver version strings (like 'main') in go.mod.
Tags v0.0.1 now exist on all forge repos — workspace still overrides
for local development.
Co-Authored-By: Virgil <virgil@lethean.io>
Forge module versions now use main branch resolution via ~/Code/go.work
workspace. Removes 5 local replace directives — the central go.work handles
all cross-repo resolution during development.
Co-Authored-By: Virgil <virgil@lethean.io>
ReadScorerOutput error was silently discarded during resume merge,
risking partial data loss on TOCTOU file changes. Also clean up
compare command construction to pass RunE directly to NewCommand.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds passthrough() helper with DisableFlagParsing=true so commands
that do their own flag.FlagSet parsing receive flags directly.
Without this, cobra rejects unknown flags like --model.
Also runs go mod tidy — core/go transitively pulls in cobra and
charmbracelet dependencies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
main.go shrinks from 296 lines to 11. All commands register through
Core framework lifecycle via cli.WithCommands. Gets signal handling,
shell completion, grouped help, and TUI primitives.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 command groups (score, gen, data, export, mon, infra) with 25
commands. All pass through to existing lem.Run* functions via
the Core framework's cli package.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 28 commands now accessible as exported lem.Run* functions.
Prerequisite for CLI framework migration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>