1
0
Fork 0
forked from lthn/LEM
Commit graph

6 commits

Author SHA1 Message Date
abd63d3342
Add standard benchmark suite using EleutherAI lm-evaluation-harness
- run_benchmarks.sh: wrapper for lm-eval with suite presets (quick, classic, leaderboard-v2, full)
- compare_models.py: compare base vs LEK results with delta table
- Supports HF transformers, local-chat-completions (MLX/Ollama), and vLLM backends
- Results comparable to HuggingFace Open LLM Leaderboard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 00:05:48 +00:00
b8f9191b05
Add missing HF model cards, sync script, and Parquet export
- Add 4 missing model cards: Gemma3-1B-layered (v1+v2), Gemma3-27B, GPT-OSS-20B
- All 9 HF models now have cards in paper/hf-cards/
- sync_hf.py: push cards + benchmarks + training data to HuggingFace
- export_parquet.py: convert JSONL training splits to Parquet (HF dataset format)
- Parquet schema: prompt, response, system, messages (JSON)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:50:18 +00:00
Athena
f0e86b7433 Add regional seeds, expansion rounds, scripts, HF cards, benchmark summary
- seeds/regional/: 1,223 cultural/regional seed files across 50+ regions
- seeds/expansions/: 8 expansion rounds (r1-r8) with raw text and JSON
- seeds/lem-{africa,cn,de,en,eu,me}-all-seeds.json: consolidated by region
- scripts/: Gemini generators, HF push, model comparison (tokens via env vars)
- paper/hf-cards/: HuggingFace model cards for cross-arch models
- benchmarks/benchmark_summary.json: processed PTSD summary data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:39:08 +00:00
Snider
53c47131cc Add cross-architecture training and benchmarking scripts; update README and PAPER with author and repository information 2026-02-12 09:07:32 +00:00
Snider
adda3c8bb5 Benchmark & Findings:
lthn/LEM-Gemma-3-1B
lthn/LEM-Gemma-3-4B
lthn/LEM-Gemma-3-12B
lthn/LEM-Gemma-3-27B
2026-02-12 06:38:46 +00:00
Snider
8e5f082f30 LEM+LEK 2026-02-12 04:05:28 +00:00