core/cli - Forgejo: Beyond coding. We Forge.

core/cli

Author	SHA1	Message	Date
Claude	098f496364	fix: correct SDPA mask mode and slice logits to last position	2026-02-16 05:53:52 +00:00
Claude	09da05d799	fix: use affine quantization mode and infer head_dim from weights	2026-02-16 05:53:52 +00:00
Claude	d3c31aa5a6	debug: add shape logging and stderr error handler for inference debugging	2026-02-16 05:53:52 +00:00
Claude	56c6e2fa8d	feat: support quantized inference (4-bit) for Gemma 3 - Add QuantizedLinear with QuantizedMatmul for packed uint32 weights - Add quantized Embedding with Dequantize before lookup - Parse quantization config (group_size, bits) from config.json - Detect .scales/.biases weight tensors and auto-select quantized path - Add Dequantize op wrapping mlx_dequantize - Add safety guard to KVCache.Update for malformed shapes - Handle tied embeddings with quantization (AsLinear helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	2a67653bf7	feat: handle nested text_config and language_model weight prefix Supports both multimodal (Gemma3ForConditionalGeneration) and text-only configs. Resolves weights with language_model. prefix fallback. Computes head_dim from hidden_size when missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	bc28aad526	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

6 commits