Ingests benchmark data (content scores, capability scores, training
curves) from JSONL files and mlx_lm logs into InfluxDB. Batched
writes, iteration extraction from checkpoint labels.
Also adds github.com/hupe1980/go-huggingface for future HF sync.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vi identity is a separate training concern. Seed conversations now
contain only philosophical/mindfulness content for the R300 calm phase.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ports conversational_training.py to Go with InfluxDB reporting.
24 built-in seed conversations (Vi identity, philosophy, mindfulness).
Supports extra JSONL files and golden set conversion to chat format.
Also fixes InfluxDB client to accept 204 No Content on writes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All scoring/influx/export/expand logic moves to pkg/lem as an
importable package. main.go is now a thin CLI dispatcher.
This lets new commands import the shared library directly —
ready for converting Python scripts to Go subcommands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 4 missing model cards: Gemma3-1B-layered (v1+v2), Gemma3-27B, GPT-OSS-20B
- All 9 HF models now have cards in paper/hf-cards/
- sync_hf.py: push cards + benchmarks + training data to HuggingFace
- export_parquet.py: convert JSONL training splits to Parquet (HF dataset format)
- Parquet schema: prompt, response, system, messages (JSON)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Includes both generation scripts, prompts data, setup script, and worker
instructions in README. Workers auto-coordinate via InfluxDB so multiple
machines can generate in parallel without duplicating work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace 160-example POC training set with expanded 2,299-example dataset
(1,839 train, 229 valid, 231 test)
- Rename all HuggingFace model references from LEM- to LEK- (proof-of-concept)
- Add missing models: GPT-OSS-20B, Gemma3-1B-layered-v2
- Rename HF card files to match LEK- convention
- Remove duplicate composure texts from kernel/ (kept in composure-library/)
- Fix paper repository URL to github.com/LetheanNetwork/LEM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- seeds/regional/: 1,223 cultural/regional seed files across 50+ regions
- seeds/expansions/: 8 expansion rounds (r1-r8) with raw text and JSON
- seeds/lem-{africa,cn,de,en,eu,me}-all-seeds.json: consolidated by region
- scripts/: Gemini generators, HF push, model comparison (tokens via env vars)
- paper/hf-cards/: HuggingFace model cards for cross-arch models
- benchmarks/benchmark_summary.json: processed PTSD summary data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>