- Add QuantizedLinear with QuantizedMatmul for packed uint32 weights - Add quantized Embedding with Dequantize before lookup - Parse quantization config (group_size, bits) from config.json - Detect .scales/.biases weight tensors and auto-select quantized path - Add Dequantize op wrapping mlx_dequantize - Add safety guard to KVCache.Update for malformed shapes - Handle tied embeddings with quantization (AsLinear helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| gemma3.go | ||