core/cli - Forgejo: Beyond coding. We Forge.

core/cli

Fork 0

Commit graph

Author	SHA1	Message	Date
Claude	e6ada25bd8	fix: add Metal cache management to prevent memory growth - Add ClearCache() wrapping mlx_clear_cache - Clear Metal allocator cache every 8 tokens during generation - Set 16GB cache limit on backend init - Prevents GPU memory from growing unbounded during inference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	8ee0c4bc4e	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

Author

SHA1

Message

Date

Claude

e6ada25bd8

fix: add Metal cache management to prevent memory growth

- Add ClearCache() wrapping mlx_clear_cache
- Clear Metal allocator cache every 8 tokens during generation
- Set 16GB cache limit on backend init
- Prevents GPU memory from growing unbounded during inference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

Claude

8ee0c4bc4e

feat: add native MLX backend for Apple Silicon inference (pkg/mlx)

CGo wrapper for mlx-c providing zero-Python Metal GPU inference.
Includes Gemma 3 model architecture, BPE tokenizer, KV cache,
composable sampling, and OpenAI-compatible serve command.

Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

2 commits