lthn/LEM - Forgejo: Beyond coding. We Forge.

lthn/LEM

Template

Fork 2

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Claude	4eaf1bfb39	feat: add parquet, publish, metrics, convert commands - `lem parquet` — export JSONL training splits to Parquet (parquet-go) - `lem publish` — push Parquet files to HuggingFace dataset repo - `lem metrics` — push DuckDB golden set stats to InfluxDB - `lem convert` — MLX LoRA adapter → HuggingFace PEFT format (pure Go safetensors read/write/transpose, no PyTorch needed) Dependencies added: parquet-go, go-huggingface, go-rocm, go-pytorch, gotch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:05:08 +00:00
Claude	0afa5e9147	feat: add `lem ingest` command + go-huggingface dependency Ingests benchmark data (content scores, capability scores, training curves) from JSONL files and mlx_lm logs into InfluxDB. Batched writes, iteration extraction from checkpoint labels. Also adds github.com/hupe1980/go-huggingface for future HF sync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:55:17 +00:00
Claude	e0d352c803	feat: add Go lem CLI and scoring-agent scripts Go lem CLI (stdlib + DuckDB) replaces scattered Python scripts: - score: heuristic regex + LLM-as-judge scoring - probe: generate responses then score - compare: diff two score files - status: InfluxDB training/generation progress - export: golden set to training JSONL splits - expand: distributed expansion via API + InfluxDB coordination New scripts from Feb 14 creative session: - scoring_agent.py: ROCm daemon that auto-scores checkpoints - probes.py: 23 binary pass/fail capability probes - convert_adapter.py: MLX to PEFT adapter conversion - score_r1_capability.py: DeepSeek R1 checkpoint scoring - lek_content_scorer.py: 6-dimension ethics content scorer - lem_train_15k.py: InfluxDB-coordinated training script - pipeline.py: DuckDB pipeline (seeds, golden set, expansion) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:22:13 +00:00

Claude

4eaf1bfb39

feat: add parquet, publish, metrics, convert commands

- `lem parquet` — export JSONL training splits to Parquet (parquet-go)
- `lem publish` — push Parquet files to HuggingFace dataset repo
- `lem metrics` — push DuckDB golden set stats to InfluxDB
- `lem convert` — MLX LoRA adapter → HuggingFace PEFT format
  (pure Go safetensors read/write/transpose, no PyTorch needed)

Dependencies added: parquet-go, go-huggingface, go-rocm, go-pytorch, gotch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 17:05:08 +00:00

Claude

0afa5e9147

feat: add lem ingest command + go-huggingface dependency

Ingests benchmark data (content scores, capability scores, training
curves) from JSONL files and mlx_lm logs into InfluxDB. Batched
writes, iteration extraction from checkpoint labels.

Also adds github.com/hupe1980/go-huggingface for future HF sync.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 16:55:17 +00:00

Claude

e0d352c803

feat: add Go lem CLI and scoring-agent scripts

Go lem CLI (stdlib + DuckDB) replaces scattered Python scripts:
- score: heuristic regex + LLM-as-judge scoring
- probe: generate responses then score
- compare: diff two score files
- status: InfluxDB training/generation progress
- export: golden set to training JSONL splits
- expand: distributed expansion via API + InfluxDB coordination

New scripts from Feb 14 creative session:
- scoring_agent.py: ROCm daemon that auto-scores checkpoints
- probes.py: 23 binary pass/fail capability probes
- convert_adapter.py: MLX to PEFT adapter conversion
- score_r1_capability.py: DeepSeek R1 checkpoint scoring
- lek_content_scorer.py: 6-dimension ethics content scorer
- lem_train_15k.py: InfluxDB-coordinated training script
- pipeline.py: DuckDB pipeline (seeds, golden set, expansion)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 16:22:13 +00:00

3 commits