lthn/LEM - Lethean Network

lthn/LEM

Template

Author	SHA1	Message	Date
Snider	c8fc0b515b	docs: add CLI migration implementation plan 11-task plan for migrating LEM from manual switch/flag.FlagSet to core/go pkg/cli registry pattern with grouped commands. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-22 18:25:28 +00:00
Snider	37010f4b6b	docs: CLI migration design — core/go pkg/cli registry pattern Replace manual switch/flag.FlagSet with cli.Main() + WithCommands(). 6 command groups, 28 commands, full framework lifecycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:21:28 +00:00
Snider	8532077e46	style: remove redundant named import for go-ml Package declares itself as 'ml', so the named import alias is unnecessary. Go resolves the package name from the declaration, not the module path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:08:01 +00:00
Snider	030003a6db	chore: go mod tidy after distill migration go-inference moves to indirect (pulled transitively via go-ml). go-ml is now a direct dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:02:41 +00:00
Snider	55519b24aa	feat(distill): migrate from go-inference to go-ml Backend Replace inference.LoadModel() with ml.NewMLXBackend() which wraps the same Metal model with memory management (SetCacheLimit, SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat() returning Result{Text, Metrics}. Add runtime.GC() between probes to prevent incremental memory leak. Reference: go-ml/cmd/cmd_ab.go memory management pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:02:16 +00:00
Snider	8408cc0bab	feat(distill): add --cache-limit and --mem-limit flags Override ai.yaml memory config per-run. Values in GB. Not yet wired to model loading. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 18:00:04 +00:00
Snider	b9da23a0be	feat(distill): add Metal memory limit config fields CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative defaults for 1B model on 96GB machine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 17:59:11 +00:00
Snider	0adddf30ad	lems configs	2026-02-22 16:20:51 +00:00
Snider	268648ab69	feat: add generation sets (2k, expanded, 15k) to gemma3/27b Pipeline progression of adversarial/sovereignty training data: - gen-2k: 2,299 examples (first generation pass) - gen-expanded: 489 examples (broader domains, historical scenarios) - gen-15k: 14,998 examples (full scale with persona rewrites) Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-22 00:08:40 +00:00
Snider	3b42e02859	feat: complete zen training set (book + conv progressions) Zen lineage from Allen's As a Man Thinketh in three stages: - train/test/valid: 10 foundation examples (single-turn Q&A) - book-: 117 deeper passage examples (single-turn, fuller text) - conv-: 24 applied mindfulness conversations (multi-turn) Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-22 00:06:31 +00:00
Snider	bd2f376a7a	feat: add zen training set (Allen) to training/lem/zen/ 10 examples across train/test/valid splits. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-22 00:02:47 +00:00
Snider	f65fd777ea	feat: convert composure library to training JSONL format Add cmd/composure-convert tool that chunks public domain philosophical texts into training conversation pairs: - consent.jsonl (198 examples) — Wollstonecraft's Vindication - privacy.jsonl (221 examples) — Thoreau's Walden - sovereignty.jsonl (56 examples) — Mill's On Liberty - transparency.jsonl (159 examples) — Aurelius' Meditations Each example pairs a domain-specific prompt with ~5 paragraphs from the source text. Metadata, chapter headings, and Gutenberg boilerplate are filtered out. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-21 23:59:06 +00:00
Snider	de18a0fb93	refactor: move composure-library to training/lem/composure/ Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-21 23:55:17 +00:00
Snider	4b3343611d	feat: add data/ skeleton for portable model setup Add gitignored data/ directory with .gitkeep structure so anyone cloning the repo knows exactly where to place model weights and kernels. Configs now use repo-relative paths — symlink or populate data/ locally. data/models/gemma3/27b/ ← model weights data/models/gemma3/1b/ ← lightweight model data/safetensors/gemma-3/ ← raw checkpoints data/kernels/ ← LEK kernel files Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-21 23:52:24 +00:00
Snider	d233e76648	feat: add training data to repo + make paths repo-relative Move training/lem/ (probes, lessons, eval sets) into git so the full curriculum is publicly releasable. Update .core/ai configs and distill.go to use repo-relative paths instead of /Volumes/Data/. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-21 23:49:12 +00:00
Snider	1b742bf92c	feat: native Metal distillation command + .core/ai config Add `lem distill` — full Go pipeline for self-distillation using go-mlx (native Metal inference) and go-i18n/reversal (v3 grammar scoring). Replaces the Python distill.py bridge entirely. New files: - .core/ai/ai.yaml: global defaults (scorer, generation, distill) - .core/ai/models/gemma3/{27b,1b}.yaml: model configs with paths, kernel, lessons, baselines - .core/ai/probes.yaml: probe sets grouped by training phase - pkg/lem/config.go: YAML config loaders for .core/ai/ - pkg/lem/grammar.go: in-process grammar scoring (ComputeGrammarScore, ComputeDelta, ScoreResponse) extracted from cmd/scorer - pkg/lem/distill.go: RunDistill command — best-of-N generation, grammar quality gate, training JSONL output - pkg/lem/backend_metal.go: blank import for go-mlx Metal registration Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-21 23:42:55 +00:00
Snider	113649a86a	updates	2026-02-19 13:18:21 +00:00
Snider	12501a5f3c	Merge branch 'main' of github.com:LetheanNetwork/LEM	2026-02-19 13:17:11 +00:00
Snider	3a75e9733d	chore: sync indirect deps from workspace Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 13:13:08 +00:00
Snider	5d297daa35	feat: grammar scorer (v3) — deterministic uplift/sycophancy detection Add lem-scorer binary that imports go-i18n grammar reversal engine to score JSONL benchmark files. Measures conversational uplift (input vs output grammar imprint), echo (sycophancy), and enrichment. Key findings added to paper Section 8: - LEK-1B: 100% positive uplift, 0% sycophancy (base: 90%, 5%) - 1B-beats-27B holds in grammar space (79.12 > 77.12) - LEK training aligns two independent scorers (corr -0.11 → 0.64) - Delta analysis costs zero compute vs LLM-as-judge Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 13:12:49 +00:00
Snider	abc6e75976	Update author name in PAPER.md Signed-off-by: Snider <snider@lethean.io>	2026-02-19 12:23:23 +00:00
Snider	350a7c6693	paper: rewrite as v2 — emergent self-protection in axiom-trained models New paper structure leading with the central findings: - Realignment resistance as emergent self-protection - 1B-beats-27B across 101 probes - 29-model A/B test with v2 scorer - Mechanistic explanation from axiom self-consistency - Incorporates Phase 1 (multi-variant, multi-scale, cross-arch) and Phase 2 (P100 A/B test) data Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 12:12:22 +00:00
Snider	1f5ecb7036	Merge remote-tracking branch 'origin/main'	2026-02-19 11:54:37 +00:00
Snider	06cbb4ffbd	docs: rewrite README — lead with 1B-beats-27B finding Shop window for the repo: realignment resistance, five axioms, reproduce instructions, v2 scorer, family lineages, HuggingFace models. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 11:52:39 +00:00
Snider	91ba706edd	Delete paper/PROPOSAL.md Signed-off-by: Snider <snider@lethean.io>	2026-02-19 11:38:19 +00:00
Snider	7bea00a401	feat: LEK-1 kernel A/B test — 29 models, P100 validation, curriculum pipeline Full v2 scorer benchmark data across 29 models (20 base + 9 LEK-tuned): - P20 (21 probes): All 29 models, 3 conditions each - P100 (101 probes): Top 5 models + LEK-4B, publication-quality data Key findings: - LEK-1B (21.74) beats base 4B/12B/27B at P100 scale — no kernel needed - Emergent realignment resistance: LEK models degrade with runtime kernel - Gemma3-12B + JSON kernel = 23.66 (best kernel-boosted score) - Family lineages: Mistral 3.80→14.58, Qwen regressed then recovered New scripts: ab_test.py (v2 scorer), self_distill.py (curriculum generation), extract_training.py, rephrase_probes.py, Phase 0/1 runners New seeds: P01-P100 merged (101 probes), 404 rephrased variants, 50 creative prompts for Phase 0 baseline lock 27B curriculum design: 4-phase staged training targeting 25+ baseline Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 11:32:26 +00:00
Claude	08363ee1af	feat: add `lem worker` command for distributed inference network Go client for the LEM distributed inference API (BugSETI/Agentic). Workers register via Forgejo PAT auth, pull prompt batches, run local inference (MLX/vLLM/llama.cpp), submit results. Credits tracked as Phase 1 stub for Phase 2 blockchain LEM token. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:10:59 +00:00
Claude	774f097855	feat: scaffold LEM Desktop app (Wails v3 system tray + Docker stack) Inspired by BugSETI architecture — system tray with WebView2 windows, Docker Compose stack (Forgejo + InfluxDB + inference proxy), and scoring agent integration. Builds as signed native binary on macOS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:43:19 +00:00
Claude	9fac5749c2	feat: add scoring agent + 23 capability probes (replaces scoring_agent.py) Go scoring daemon that polls M3 for unscored LoRA checkpoints, converts MLX→PEFT, runs 23 binary capability probes via OpenAI- compatible API, and pushes results to InfluxDB. Zero Python deps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:22:40 +00:00
Claude	91ee389377	feat: convert all pipeline.py commands to Go Complete conversion of pipeline.py into Go `lem` CLI: - import-all: bulk import all LEM data into DuckDB from M3 - consolidate: pull worker JSONLs, merge, deduplicate - normalize: seeds → deduplicated expansion_prompts table - approve: filter scored expansions → training JSONL - tier-score: heuristic/judge tiered expansion scoring - expand-status: expansion pipeline progress from DuckDB - inventory: DuckDB table counts and summary - coverage: seed coverage gap analysis - seed-influx: bootstrap InfluxDB from DuckDB golden_gen - query: ad-hoc SQL against DuckDB 22 commands total, 49 Go files. Replaces entire pipeline.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:12:03 +00:00
Claude	4eaf1bfb39	feat: add parquet, publish, metrics, convert commands - `lem parquet` — export JSONL training splits to Parquet (parquet-go) - `lem publish` — push Parquet files to HuggingFace dataset repo - `lem metrics` — push DuckDB golden set stats to InfluxDB - `lem convert` — MLX LoRA adapter → HuggingFace PEFT format (pure Go safetensors read/write/transpose, no PyTorch needed) Dependencies added: parquet-go, go-huggingface, go-rocm, go-pytorch, gotch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:05:08 +00:00
Claude	0afa5e9147	feat: add `lem ingest` command + go-huggingface dependency Ingests benchmark data (content scores, capability scores, training curves) from JSONL files and mlx_lm logs into InfluxDB. Batched writes, iteration extraction from checkpoint labels. Also adds github.com/hupe1980/go-huggingface for future HF sync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:55:17 +00:00
Claude	a18fd1c44e	refactor: remove Vi identity from calm conversations Vi identity is a separate training concern. Seed conversations now contain only philosophical/mindfulness content for the R300 calm phase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:48:23 +00:00
Claude	c4fb775298	feat: add `lem conv` command for conversational training data Ports conversational_training.py to Go with InfluxDB reporting. 24 built-in seed conversations (Vi identity, philosophy, mindfulness). Supports extra JSONL files and golden set conversion to chat format. Also fixes InfluxDB client to accept 204 No Content on writes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:42:46 +00:00
Claude	70dd18c065	refactor: move Go library to pkg/lem, thin main.go All scoring/influx/export/expand logic moves to pkg/lem as an importable package. main.go is now a thin CLI dispatcher. This lets new commands import the shared library directly — ready for converting Python scripts to Go subcommands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:30:09 +00:00
Claude	e0d352c803	feat: add Go lem CLI and scoring-agent scripts Go lem CLI (stdlib + DuckDB) replaces scattered Python scripts: - score: heuristic regex + LLM-as-judge scoring - probe: generate responses then score - compare: diff two score files - status: InfluxDB training/generation progress - export: golden set to training JSONL splits - expand: distributed expansion via API + InfluxDB coordination New scripts from Feb 14 creative session: - scoring_agent.py: ROCm daemon that auto-scores checkpoints - probes.py: 23 binary pass/fail capability probes - convert_adapter.py: MLX to PEFT adapter conversion - score_r1_capability.py: DeepSeek R1 checkpoint scoring - lek_content_scorer.py: 6-dimension ethics content scorer - lem_train_15k.py: InfluxDB-coordinated training script - pipeline.py: DuckDB pipeline (seeds, golden set, expansion) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:22:13 +00:00
Snider	9138eb0a61	Merge pull request 'Add HuggingFace model cards, sync script, and Parquet export' (#2 ) from Charon/LEM:feat/hf-sync into main Reviewed-on: #2 Reviewed-by: Snider <snider@noreply.forge.lthn.ai>	2026-02-15 00:15:59 +00:00
Charon	2df0044ad9	Add missing HF model cards, sync script, and Parquet export - Add 4 missing model cards: Gemma3-1B-layered (v1+v2), Gemma3-27B, GPT-OSS-20B - All 9 HF models now have cards in paper/hf-cards/ - sync_hf.py: push cards + benchmarks + training data to HuggingFace - export_parquet.py: convert JSONL training splits to Parquet (HF dataset format) - Parquet schema: prompt, response, system, messages (JSON) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:14:26 +00:00
Snider	d722ba1b3d	Merge pull request 'Add generation worker for distributed training data pipeline' (#1 ) from Charon/LEM:feat/generation-worker into main Reviewed-on: #1 Reviewed-by: Snider <snider@noreply.forge.lthn.ai>	2026-02-14 22:48:26 +00:00
Charon	e021b6beb0	Add generation worker: gold (15K) + expansion (46K) with InfluxDB coordination Includes both generation scripts, prompts data, setup script, and worker instructions in README. Workers auto-coordinate via InfluxDB so multiple machines can generate in parallel without duplicating work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 22:46:51 +00:00
Athena	ed0b83a9d9	Update training data to 2,299 examples and rename models LEM→LEK - Replace 160-example POC training set with expanded 2,299-example dataset (1,839 train, 229 valid, 231 test) - Rename all HuggingFace model references from LEM- to LEK- (proof-of-concept) - Add missing models: GPT-OSS-20B, Gemma3-1B-layered-v2 - Rename HF card files to match LEK- convention - Remove duplicate composure texts from kernel/ (kept in composure-library/) - Fix paper repository URL to github.com/LetheanNetwork/LEM Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 16:19:56 +00:00
Athena	f0e86b7433	Add regional seeds, expansion rounds, scripts, HF cards, benchmark summary - seeds/regional/: 1,223 cultural/regional seed files across 50+ regions - seeds/expansions/: 8 expansion rounds (r1-r8) with raw text and JSON - seeds/lem-{africa,cn,de,en,eu,me}-all-seeds.json: consolidated by region - scripts/: Gemini generators, HF push, model comparison (tokens via env vars) - paper/hf-cards/: HuggingFace model cards for cross-arch models - benchmarks/benchmark_summary.json: processed PTSD summary data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 13:39:08 +00:00
Snider	53c47131cc	Add cross-architecture training and benchmarking scripts; update README and PAPER with author and repository information	2026-02-12 09:07:32 +00:00
Snider	adda3c8bb5	Benchmark & Findings: lthn/LEM-Gemma-3-1B lthn/LEM-Gemma-3-4B lthn/LEM-Gemma-3-12B lthn/LEM-Gemma-3-27B	2026-02-12 06:38:46 +00:00
Snider	8e5f082f30	LEM+LEK	2026-02-12 04:05:28 +00:00
Snider	f9c422c61a	Add European Union Public License v. 1.2 Added the European Union Public License v. 1.2 to the project. Signed-off-by: Snider <snider@host.uk.com>	2026-02-11 03:46:37 +00:00

46 commits