core/cli - Lethean Network

core/cli

Author	SHA1	Message	Date
Claude	b1f76ce7db	fix: use affine quantization mode and infer head_dim from weights	2026-02-16 05:53:52 +00:00
Claude	f416f7e529	debug: add shape logging and stderr error handler for inference debugging	2026-02-16 05:53:52 +00:00
Claude	d92d097a7f	feat: support quantized inference (4-bit) for Gemma 3 - Add QuantizedLinear with QuantizedMatmul for packed uint32 weights - Add quantized Embedding with Dequantize before lookup - Parse quantization config (group_size, bits) from config.json - Detect .scales/.biases weight tensors and auto-select quantized path - Add Dequantize op wrapping mlx_dequantize - Add safety guard to KVCache.Update for malformed shapes - Handle tied embeddings with quantization (AsLinear helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	7fc1571f93	fix: handle both string and array merge formats in tokenizer Gemma 3 tokenizer.json uses [["a","b"],...] format for merges instead of the ["a b",...] format. Support both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	f4303fada2	feat: handle nested text_config and language_model weight prefix Supports both multimodal (Gemma3ForConditionalGeneration) and text-only configs. Resolves weights with language_model. prefix fallback. Computes head_dim from hidden_size when missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	97ffe91cde	chore: target macOS 26.0, fix duplicate -lstdc++ linker warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	e4467dd977	fix: remove unused vars in TopP sampler placeholder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	828a5c0853	fix: resolve CGo type conflict in error handler Use pure C callback instead of //export to avoid const char* vs GoString type mismatch in cgo-generated headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	a0435a84ea	fix: correct 20 mlx-c API mismatches for v0.4.1 - Use _axis/_axes variants for softmax, argmax, topk, sum, mean, squeeze, concatenate, argpartition - Fix size_t vs int for count parameters throughout - Fix int64_t strides in as_strided - Add mlx_optional_int + mode param to quantized_matmul - Use mlx_array_new() for null arrays (freqs, key, mask, sinks) - Fix expand_dims to single-axis signature - Fix compile callback signature (size_t index) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	c25e1a633c	fix: correct mlx_closure_new_func_payload signature for mlx-c v0.4.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	8ee0c4bc4e	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

11 commits