Commit graph

2 commits

Author SHA1 Message Date
Claude
56c6e2fa8d feat: support quantized inference (4-bit) for Gemma 3
- Add QuantizedLinear with QuantizedMatmul for packed uint32 weights
- Add quantized Embedding with Dequantize before lookup
- Parse quantization config (group_size, bits) from config.json
- Detect .scales/.biases weight tensors and auto-select quantized path
- Add Dequantize op wrapping mlx_dequantize
- Add safety guard to KVCache.Update for malformed shapes
- Handle tied embeddings with quantization (AsLinear helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 05:53:52 +00:00
Claude
bc28aad526 feat: add native MLX backend for Apple Silicon inference (pkg/mlx)
CGo wrapper for mlx-c providing zero-Python Metal GPU inference.
Includes Gemma 3 model architecture, BPE tokenizer, KV cache,
composable sampling, and OpenAI-compatible serve command.

Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 05:53:52 +00:00