Commit graph

11 commits

Author SHA1 Message Date
Snider
de3d6a70f1 lems configs 2026-02-23 04:38:37 +00:00
Snider
f75458bce6 refactor: apply go fix modernizers for Go 1.26
Automated fixes: interface{} → any, range-over-int, t.Context(),
wg.Go(), strings.SplitSeq, strings.Builder, slices.Contains,
maps helpers, min/max builtins.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 21:00:17 +00:00
Snider
b9da23a0be feat(distill): add Metal memory limit config fields
CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control
mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative
defaults for 1B model on 96GB machine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:59:11 +00:00
Snider
268648ab69 feat: add generation sets (2k, expanded, 15k) to gemma3/27b
Pipeline progression of adversarial/sovereignty training data:
- gen-2k: 2,299 examples (first generation pass)
- gen-expanded: 489 examples (broader domains, historical scenarios)
- gen-15k: 14,998 examples (full scale with persona rewrites)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:08:40 +00:00
Snider
3b42e02859 feat: complete zen training set (book + conv progressions)
Zen lineage from Allen's As a Man Thinketh in three stages:
- train/test/valid: 10 foundation examples (single-turn Q&A)
- book-*: 117 deeper passage examples (single-turn, fuller text)
- conv-*: 24 applied mindfulness conversations (multi-turn)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:06:31 +00:00
Snider
bd2f376a7a feat: add zen training set (Allen) to training/lem/zen/
10 examples across train/test/valid splits.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:02:47 +00:00
Snider
f65fd777ea feat: convert composure library to training JSONL format
Add cmd/composure-convert tool that chunks public domain philosophical
texts into training conversation pairs:
- consent.jsonl (198 examples) — Wollstonecraft's Vindication
- privacy.jsonl (221 examples) — Thoreau's Walden
- sovereignty.jsonl (56 examples) — Mill's On Liberty
- transparency.jsonl (159 examples) — Aurelius' Meditations

Each example pairs a domain-specific prompt with ~5 paragraphs from
the source text. Metadata, chapter headings, and Gutenberg boilerplate
are filtered out.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:59:06 +00:00
Snider
de18a0fb93 refactor: move composure-library to training/lem/composure/
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:55:17 +00:00
Snider
d233e76648 feat: add training data to repo + make paths repo-relative
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:49:12 +00:00
Athena
ed0b83a9d9 Update training data to 2,299 examples and rename models LEM→LEK
- Replace 160-example POC training set with expanded 2,299-example dataset
  (1,839 train, 229 valid, 231 test)
- Rename all HuggingFace model references from LEM- to LEK- (proof-of-concept)
- Add missing models: GPT-OSS-20B, Gemma3-1B-layered-v2
- Rename HF card files to match LEK- convention
- Remove duplicate composure texts from kernel/ (kept in composure-library/)
- Fix paper repository URL to github.com/LetheanNetwork/LEM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:19:56 +00:00
Snider
8e5f082f30 LEM+LEK 2026-02-12 04:05:28 +00:00