go/pkg/mlx
Claude 9688e086ca fix: add Metal cache management to prevent memory growth
- Add ClearCache() wrapping mlx_clear_cache
- Clear Metal allocator cache every 8 tokens during generation
- Set 16GB cache limit on backend init
- Prevents GPU memory from growing unbounded during inference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 05:53:52 +00:00
..
cache feat: support quantized inference (4-bit) for Gemma 3 2026-02-16 05:53:52 +00:00
model fix: correct SDPA mask mode and slice logits to last position 2026-02-16 05:53:52 +00:00
sample fix: remove unused vars in TopP sampler placeholder 2026-02-16 05:53:52 +00:00
tokenizer fix: handle both string and array merge formats in tokenizer 2026-02-16 05:53:52 +00:00
array.go fix: correct 20 mlx-c API mismatches for v0.4.1 2026-02-16 05:53:52 +00:00
CMakeLists.txt chore: target macOS 26.0, fix duplicate -lstdc++ linker warning 2026-02-16 05:53:52 +00:00
compile.go fix: correct 20 mlx-c API mismatches for v0.4.1 2026-02-16 05:53:52 +00:00
dtype.go feat: add native MLX backend for Apple Silicon inference (pkg/mlx) 2026-02-16 05:53:52 +00:00
fast.go fix: correct SDPA mask mode and slice logits to last position 2026-02-16 05:53:52 +00:00
io.go feat: add native MLX backend for Apple Silicon inference (pkg/mlx) 2026-02-16 05:53:52 +00:00
mlx.go debug: add shape logging and stderr error handler for inference debugging 2026-02-16 05:53:52 +00:00
mlx_stub.go feat: add native MLX backend for Apple Silicon inference (pkg/mlx) 2026-02-16 05:53:52 +00:00
nn.go feat: support quantized inference (4-bit) for Gemma 3 2026-02-16 05:53:52 +00:00
ops.go fix: use affine quantization mode and infer head_dim from weights 2026-02-16 05:53:52 +00:00
random.go fix: correct 20 mlx-c API mismatches for v0.4.1 2026-02-16 05:53:52 +00:00
slice.go fix: correct 20 mlx-c API mismatches for v0.4.1 2026-02-16 05:53:52 +00:00
stream.go fix: add Metal cache management to prevent memory growth 2026-02-16 05:53:52 +00:00