go/pkg/mlx/cache at 098f4963643d4c07da3dcb291c83e5dd950c20e7 - core/go

core/go

generated from lthn/LEM

History

Claude 56c6e2fa8d feat: support quantized inference (4-bit) for Gemma 3 - Add QuantizedLinear with QuantizedMatmul for packed uint32 weights - Add quantized Embedding with Dequantize before lookup - Parse quantization config (group_size, bits) from config.json - Detect .scales/.biases weight tensors and auto-select quantized path - Add Dequantize op wrapping mlx_dequantize - Add safety guard to KVCache.Update for malformed shapes - Handle tied embeddings with quantization (AsLinear helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
..
cache.go	feat: support quantized inference (4-bit) for Gemma 3	2026-02-16 05:53:52 +00:00

Claude 56c6e2fa8d feat: support quantized inference (4-bit) for Gemma 3

- Add QuantizedLinear with QuantizedMatmul for packed uint32 weights
- Add quantized Embedding with Dequantize before lookup
- Parse quantization config (group_size, bits) from config.json
- Detect .scales/.biases weight tensors and auto-select quantized path
- Add Dequantize op wrapping mlx_dequantize
- Add safety guard to KVCache.Update for malformed shapes
- Handle tied embeddings with quantization (AsLinear helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

cache.go

feat: support quantized inference (4-bit) for Gemma 3

2026-02-16 05:53:52 +00:00