LEM/.gitignore at abd63d3342311477b688d194fe0611e2d4c642e1 - Charon/LEM - Forgejo: Beyond coding. We Forge.

Charon/LEM

forked from lthn/LEM

Charon abd63d3342

Add standard benchmark suite using EleutherAI lm-evaluation-harness

- run_benchmarks.sh: wrapper for lm-eval with suite presets (quick, classic, leaderboard-v2, full)
- compare_models.py: compare base vs LEK results with delta table
- Supports HF transformers, local-chat-completions (MLX/Ollama), and vLLM backends
- Results comparable to HuggingFace Open LLM Leaderboard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 00:05:48 +00:00

13 lines

266 B

Text

Raw Blame History

 .DS_Store
 .idea/
 __pycache__/
 *.pyc
 # Worker output (generated locally, not committed)
 worker/output/
 # Parquet exports (generated, sync to HF via scripts/sync_hf.py)
 training/parquet/
 # lm-eval-harness results (large, stored locally)
 benchmarks/lm-eval-results/