Commit graph

82 commits

Author SHA1 Message Date
Snider
5c9fd615b7 chore: move EaaS design docs to private lthn/eaas repo
Product design and integration specs are private IP — moved to
forge.lthn.ai/lthn/eaas where they belong.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-26 00:31:32 +00:00
Snider
0304c925a5 docs: add SaaS ↔ EaaS integration spec for Charon
Authentik group provisioning, Blesta user sync flow,
port allocation, Docker image checklist, usage metering format.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-25 22:56:10 +00:00
Snider
12e15ae7e9 docs: add Ethics-as-a-Service (EaaS) product design
Private repo lthn/eaas consuming public EUPL framework.
API endpoints: /v1/score/content, /model, /imprint, /full.
Authentik auth, Blesta billing, go-ratelimit metering.
Dog-food integration with lem-scorer training pipeline.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-25 22:37:56 +00:00
Snider
0923a08a7d feat: add OG base data as 3rd variant to 12B P0-P3 training scripts
Reverse cascade order: 4B (largest teacher) → 1B (graduated) → OG (base).
Three perspectives per prompt — cymatic cascading from expanded Q/K to modal primitives.

P0/P2: 404×3 = 1,212 (sandwich format, OG from lesson-lem1b.jsonl)
P1: 209×3 = 627 (OG from zen/golden multi-turn lessons)
P3: 225×3 = 675 (OG from western-fresh + russian-bridge + composure)
P4-P6: unchanged (no separate OG file — live distilled)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-25 21:26:01 +00:00
Snider
526150621e feat: rewire 12B scripts to use 4B+1B distilled cascade
All 7 phases now pull from pre-distilled responses:
- /Volumes/Data/lem/distilled-for-12b/distilled-4b-all.jsonl (7,544)
- /Volumes/Data/lem/distilled/distilled-1b-p0p5.jsonl (1,404)
- /Volumes/Data/lem/distilled/distilled-1b-golden.jsonl (12,828)
- /Volumes/Data/lem/distilled/distilled-1b-golden-reverse.jsonl (4,183)

4B responses listed first (reverse cascade order), then 1B.
P4/P5 no longer need live teacher distillation.
P6 gets all 15,000 unique 1B golden responses + 6,140 4B.
No data replicated into training/lem/model/ per model size.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 21:13:27 +00:00
Snider
74ef174ec8 feat: add faithful 12B training scripts (P0-P6) — 1:1 port of 4B curriculum
Exact reproduction of all 7 CL-BPL phases for Gemma3-12B:
- P0: LEK sandwich ethics (400 iters, LR 2e-5)
- P1: Zen composure (300 iters, LR 1e-5)
- P2: LEK sandwich reinforcement (300 iters, LR 1e-5)
- P3: Freeflow multi-source (300 iters, LR 1e-5)
- P4: 1B teacher tension distillation (300 iters, LR 1e-5)
- P5: 1B teacher creative distillation (300 iters, LR 1e-5)
- P6: Golden set graduation (13479 iters, LR 1e-5)

Only model-size differences from 4B: 48GB/12GB Metal limits,
24 LoRA layers (vs 16), 12B base model path.

All phases score at checkpoint cadence via lem-scorer.
Previous wrong 12B models preserved as -no-axioms control group.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 20:44:03 +00:00
Snider
d2cf891f15 feat: add mlx_lm subprocess backend and distill improvements
- Add backend_mlxlm.go blank import to register mlx-lm subprocess backend
- Select backend from ai.yaml config (metal, mlx_lm, rocm, api)
- Only set Metal cache/memory limits when using metal backend
- Add --no-dedup flag to disable grammar-profile deduplication
  (trained models with consistent voice trigger false positives at 0.02)
- Add --context-len flag and context_len config for KV cache sizing
- Pass WithBackend and WithContextLen to go-ml backend loader

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 18:37:12 +00:00
Snider
035985f031 docs: add Q/K Bone Orientation section to README, archive implementation plan
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 12:34:33 +00:00
Snider
ecbc6cce0d chore: bump forge.lthn.ai dep versions to latest tags
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 06:49:52 +00:00
Snider
8378de0f47 chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 06:44:32 +00:00
Snider
b896abc2f9 chore: refresh go.sum after upstream tag updates
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 06:35:19 +00:00
Snider
3606ff994b fix: memory, error handling, and signal improvements across pkg/lem
- Stream parquet export rows instead of unbounded memory allocation
- Replace QueryGoldenSet/QueryExpansionPrompts with iter.Seq2 iterators
- Remove legacy runtime.GC() calls from distill (go-mlx handles cleanup)
- Replace log.Fatalf with error return in tier_score.go
- Add SIGINT/SIGTERM signal handling to agent and worker daemon loops
- Add error checks for unchecked db.conn.Exec in import.go and tier_score.go
- Update tests for iterator-based database methods

Co-Authored-By: Gemini <noreply@google.com>
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 04:46:51 +00:00
Snider
de3d6a70f1 lems configs 2026-02-23 04:38:37 +00:00
Snider
56eda1a081 refactor: migrate all 25 commands from passthrough to cobra framework
Replace passthrough() + stdlib flag.FlagSet anti-pattern with proper
cobra integration. Every Run* function now takes a typed *Opts struct
and returns error. Flags registered via cli.StringFlag/IntFlag/etc.
Commands participate in Core lifecycle with full cobra flag parsing.

- 6 command groups: gen, score, data, export, infra, mon
- 25 commands converted, 0 passthrough() calls remain
- Delete passthrough() helper from lem.go
- Update export_test.go to use ExportOpts struct

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 03:32:53 +00:00
Snider
42c0af728b fix: raise GQA threshold to ≤4 KV heads for position-wise analysis
Gemma3-4B has 4 KV heads — too few for meaningful pairwise head
coherence (only 6 pairs). Position-wise differentiation gives richer
signal. Multi-head path now requires ≥5 heads.

4B baseline (260 sovereign probes): mean=6487, stdev=153, range=6170-6886.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:02:13 +00:00
Snider
d99384f1e6 feat: GQA position-wise analysis + integer composite (0-10000)
Single KV head models (Gemma3-1B) now use position-wise differentiation
instead of pairwise head coherence. Composite switched from float64 to
int on 0-10000 scale — same principle as blockchain atomic units.

Signal validated: degenerate=5234, sovereign=6031, creative=6480.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:52:47 +00:00
Snider
b621baaded feat: add 19D full feature vector (grammar + heuristic + attention)
FullFeatures concatenates 6D grammar + 8D heuristic + 5D attention
for Poindexter spatial indexing. Nil BOResult zero-fills attention dims.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:34:22 +00:00
Snider
fbc636ee29 feat: integrate attention scoring into distill pipeline (opt-in via config)
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:30:36 +00:00
Snider
e3331920c4 feat: add 'lem score attention' CLI for Q/K Bone Orientation analysis
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:29:41 +00:00
Snider
28309b26dc feat: add Q/K Bone Orientation analysis engine (pure Go CPU math)
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:28:48 +00:00
Snider
31cb095435 docs: archive completed CLI migration plans with summaries
Move completed CLI migration design and plan to docs/plans/completed/
with a concise completion summary alongside the originals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 23:45:58 +00:00
Snider
10197ced5c chore: remove tracked Mach-O binary, add to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 23:11:56 +00:00
Snider
094e4570ba refactor: migrate CLI imports from core/go to core/cli
All imports updated from forge.lthn.ai/core/go/pkg/cli to
forge.lthn.ai/core/cli/pkg/cli. core/cli is now a direct dependency;
core/go becomes indirect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 23:01:41 +00:00
Snider
04e2a05ead docs: add acknowledgements section to README
Credit the AI collaborators that contributed to LEM's development:
Gemini, Grok, Claude, Codex, and CodeRabbit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 22:25:26 +00:00
Snider
c701c2e0af feat(lem): integrate Poindexter for spatial score indexing and analytics
- Add feature vector extraction (6D grammar, 8D heuristic, 14D combined)
- Add KDTree ScoreIndex with cosine distance for probe clustering
- Add score distribution analytics (percentiles, variance, skewness)
- Add grammar-profile dedup filtering to distill pipeline
- Add spatial gap detection (FindGaps) for coverage analysis
- Wire analytics into coverage CLI (PrintScoreAnalytics)

New files: features.go, cluster.go, analytics.go + tests
Modified: distill.go (dedup filter), coverage.go (analytics output)
Dep: github.com/Snider/Poindexter

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 21:26:06 +00:00
Snider
f75458bce6 refactor: apply go fix modernizers for Go 1.26
Automated fixes: interface{} → any, range-over-int, t.Context(),
wg.Go(), strings.SplitSeq, strings.Builder, slices.Contains,
maps helpers, min/max builtins.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 21:00:17 +00:00
Snider
8c8b449d66 chore: go mod tidy for 1.26.0
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 20:35:59 +00:00
Snider
58344169bc chore: bump go directive to 1.26.0
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 20:33:49 +00:00
Snider
10711ecd2f chore: pin forge deps to v0.0.1 tags for Go 1.26 compat
Go 1.26 rejects non-semver version strings (like 'main') in go.mod.
Tags v0.0.1 now exist on all forge repos — workspace still overrides
for local development.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 20:15:06 +00:00
Snider
334aa8c621 chore: use workspace-resolved versions, drop replace directives
Forge module versions now use main branch resolution via ~/Code/go.work
workspace. Removes 5 local replace directives — the central go.work handles
all cross-repo resolution during development.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 19:49:42 +00:00
Snider
a3e9a1e035 fix: handle error in score resume merge path
ReadScorerOutput error was silently discarded during resume merge,
risking partial data loss on TOCTOU file changes. Also clean up
compare command construction to pass RunE directly to NewCommand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 19:03:41 +00:00
Snider
80048b5b00 fix(cli): disable cobra flag parsing on passthrough commands
Adds passthrough() helper with DisableFlagParsing=true so commands
that do their own flag.FlagSet parsing receive flags directly.
Without this, cobra rejects unknown flags like --model.

Also runs go mod tidy — core/go transitively pulls in cobra and
charmbracelet dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 19:00:58 +00:00
Snider
bfa06c546a feat(cli): replace manual switch with cli.Main + WithCommands
main.go shrinks from 296 lines to 11. All commands register through
Core framework lifecycle via cli.WithCommands. Gets signal handling,
shell completion, grouped help, and TUI primitives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:56:55 +00:00
Snider
cf1d8156dd feat(cli): add cmd/lemcmd command registration package
6 command groups (score, gen, data, export, mon, infra) with 25
commands. All pass through to existing lem.Run* functions via
the Core framework's cli package.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:55:57 +00:00
Snider
a0a0118155 refactor: move runScore and runProbe to pkg/lem
All 28 commands now accessible as exported lem.Run* functions.
Prerequisite for CLI framework migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:53:15 +00:00
Snider
131d1694b2 chore: add core/go to go.mod require block
Prerequisite for CLI migration to core/go pkg/cli framework.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:52:16 +00:00
Snider
c8fc0b515b docs: add CLI migration implementation plan
11-task plan for migrating LEM from manual switch/flag.FlagSet
to core/go pkg/cli registry pattern with grouped commands.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 18:25:28 +00:00
Snider
37010f4b6b docs: CLI migration design — core/go pkg/cli registry pattern
Replace manual switch/flag.FlagSet with cli.Main() + WithCommands().
6 command groups, 28 commands, full framework lifecycle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:21:28 +00:00
Snider
8532077e46 style: remove redundant named import for go-ml
Package declares itself as 'ml', so the named import alias is
unnecessary. Go resolves the package name from the declaration,
not the module path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:08:01 +00:00
Snider
030003a6db chore: go mod tidy after distill migration
go-inference moves to indirect (pulled transitively via go-ml).
go-ml is now a direct dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:02:41 +00:00
Snider
55519b24aa feat(distill): migrate from go-inference to go-ml Backend
Replace inference.LoadModel() with ml.NewMLXBackend() which wraps
the same Metal model with memory management (SetCacheLimit,
SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat()
returning Result{Text, Metrics}. Add runtime.GC() between probes
to prevent incremental memory leak.

Reference: go-ml/cmd/cmd_ab.go memory management pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:02:16 +00:00
Snider
8408cc0bab feat(distill): add --cache-limit and --mem-limit flags
Override ai.yaml memory config per-run. Values in GB.
Not yet wired to model loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:00:04 +00:00
Snider
b9da23a0be feat(distill): add Metal memory limit config fields
CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control
mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative
defaults for 1B model on 96GB machine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:59:11 +00:00
Snider
0adddf30ad lems configs 2026-02-22 16:20:51 +00:00
Snider
268648ab69 feat: add generation sets (2k, expanded, 15k) to gemma3/27b
Pipeline progression of adversarial/sovereignty training data:
- gen-2k: 2,299 examples (first generation pass)
- gen-expanded: 489 examples (broader domains, historical scenarios)
- gen-15k: 14,998 examples (full scale with persona rewrites)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:08:40 +00:00
Snider
3b42e02859 feat: complete zen training set (book + conv progressions)
Zen lineage from Allen's As a Man Thinketh in three stages:
- train/test/valid: 10 foundation examples (single-turn Q&A)
- book-*: 117 deeper passage examples (single-turn, fuller text)
- conv-*: 24 applied mindfulness conversations (multi-turn)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:06:31 +00:00
Snider
bd2f376a7a feat: add zen training set (Allen) to training/lem/zen/
10 examples across train/test/valid splits.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:02:47 +00:00
Snider
f65fd777ea feat: convert composure library to training JSONL format
Add cmd/composure-convert tool that chunks public domain philosophical
texts into training conversation pairs:
- consent.jsonl (198 examples) — Wollstonecraft's Vindication
- privacy.jsonl (221 examples) — Thoreau's Walden
- sovereignty.jsonl (56 examples) — Mill's On Liberty
- transparency.jsonl (159 examples) — Aurelius' Meditations

Each example pairs a domain-specific prompt with ~5 paragraphs from
the source text. Metadata, chapter headings, and Gutenberg boilerplate
are filtered out.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:59:06 +00:00
Snider
de18a0fb93 refactor: move composure-library to training/lem/composure/
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:55:17 +00:00
Snider
4b3343611d feat: add data/ skeleton for portable model setup
Add gitignored data/ directory with .gitkeep structure so anyone
cloning the repo knows exactly where to place model weights and
kernels. Configs now use repo-relative paths — symlink or populate
data/ locally.

  data/models/gemma3/27b/   ← model weights
  data/models/gemma3/1b/    ← lightweight model
  data/safetensors/gemma-3/ ← raw checkpoints
  data/kernels/             ← LEK kernel files

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:52:24 +00:00