Charon/LEM - Forgejo: Beyond coding. We Forge.

Charon/LEM

Fork 0

forked from lthn/LEM

Commit graph

Author	SHA1	Message	Date
Charon	abd63d3342	Add standard benchmark suite using EleutherAI lm-evaluation-harness - run_benchmarks.sh: wrapper for lm-eval with suite presets (quick, classic, leaderboard-v2, full) - compare_models.py: compare base vs LEK results with delta table - Supports HF transformers, local-chat-completions (MLX/Ollama), and vLLM backends - Results comparable to HuggingFace Open LLM Leaderboard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 00:05:48 +00:00

Author

SHA1

Message

Date

Charon

abd63d3342

Add standard benchmark suite using EleutherAI lm-evaluation-harness

- run_benchmarks.sh: wrapper for lm-eval with suite presets (quick, classic, leaderboard-v2, full)
- compare_models.py: compare base vs LEK results with delta table
- Supports HF transformers, local-chat-completions (MLX/Ollama), and vLLM backends
- Results comparable to HuggingFace Open LLM Leaderboard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 00:05:48 +00:00

1 commit