Commit graph

6 commits

Author SHA1 Message Date
Snider
bd2f376a7a feat: add zen training set (Allen) to training/lem/zen/
10 examples across train/test/valid splits.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 00:02:47 +00:00
Snider
f65fd777ea feat: convert composure library to training JSONL format
Add cmd/composure-convert tool that chunks public domain philosophical
texts into training conversation pairs:
- consent.jsonl (198 examples) — Wollstonecraft's Vindication
- privacy.jsonl (221 examples) — Thoreau's Walden
- sovereignty.jsonl (56 examples) — Mill's On Liberty
- transparency.jsonl (159 examples) — Aurelius' Meditations

Each example pairs a domain-specific prompt with ~5 paragraphs from
the source text. Metadata, chapter headings, and Gutenberg boilerplate
are filtered out.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:59:06 +00:00
Snider
de18a0fb93 refactor: move composure-library to training/lem/composure/
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:55:17 +00:00
Snider
d233e76648 feat: add training data to repo + make paths repo-relative
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:49:12 +00:00
Athena
ed0b83a9d9 Update training data to 2,299 examples and rename models LEM→LEK
- Replace 160-example POC training set with expanded 2,299-example dataset
  (1,839 train, 229 valid, 231 test)
- Rename all HuggingFace model references from LEM- to LEK- (proof-of-concept)
- Add missing models: GPT-OSS-20B, Gemma3-1B-layered-v2
- Rename HF card files to match LEK- convention
- Remove duplicate composure texts from kernel/ (kept in composure-library/)
- Fix paper repository URL to github.com/LetheanNetwork/LEM

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:19:56 +00:00
Snider
8e5f082f30 LEM+LEK 2026-02-12 04:05:28 +00:00