CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control
mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative
defaults for 1B model on 96GB machine.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Zen lineage from Allen's As a Man Thinketh in three stages:
- train/test/valid: 10 foundation examples (single-turn Q&A)
- book-*: 117 deeper passage examples (single-turn, fuller text)
- conv-*: 24 applied mindfulness conversations (multi-turn)
Co-Authored-By: Virgil <virgil@lethean.io>
Add cmd/composure-convert tool that chunks public domain philosophical
texts into training conversation pairs:
- consent.jsonl (198 examples) — Wollstonecraft's Vindication
- privacy.jsonl (221 examples) — Thoreau's Walden
- sovereignty.jsonl (56 examples) — Mill's On Liberty
- transparency.jsonl (159 examples) — Aurelius' Meditations
Each example pairs a domain-specific prompt with ~5 paragraphs from
the source text. Metadata, chapter headings, and Gutenberg boilerplate
are filtered out.
Co-Authored-By: Virgil <virgil@lethean.io>
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.
Co-Authored-By: Virgil <virgil@lethean.io>
- Replace 160-example POC training set with expanded 2,299-example dataset
(1,839 train, 229 valid, 231 test)
- Rename all HuggingFace model references from LEM- to LEK- (proof-of-concept)
- Add missing models: GPT-OSS-20B, Gemma3-1B-layered-v2
- Rename HF card files to match LEK- convention
- Remove duplicate composure texts from kernel/ (kept in composure-library/)
- Fix paper repository URL to github.com/LetheanNetwork/LEM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>