core/cli - Lethean Network

core/cli

Author	SHA1	Message	Date
Claude	d3c31aa5a6	debug: add shape logging and stderr error handler for inference debugging	2026-02-16 05:53:52 +00:00
Claude	56c6e2fa8d	feat: support quantized inference (4-bit) for Gemma 3 - Add QuantizedLinear with QuantizedMatmul for packed uint32 weights - Add quantized Embedding with Dequantize before lookup - Parse quantization config (group_size, bits) from config.json - Detect .scales/.biases weight tensors and auto-select quantized path - Add Dequantize op wrapping mlx_dequantize - Add safety guard to KVCache.Update for malformed shapes - Handle tied embeddings with quantization (AsLinear helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	a4fde16998	fix: handle both string and array merge formats in tokenizer Gemma 3 tokenizer.json uses [["a","b"],...] format for merges instead of the ["a b",...] format. Support both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	2a67653bf7	feat: handle nested text_config and language_model weight prefix Supports both multimodal (Gemma3ForConditionalGeneration) and text-only configs. Resolves weights with language_model. prefix fallback. Computes head_dim from hidden_size when missing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	9ae86017f4	chore: target macOS 26.0, fix duplicate -lstdc++ linker warning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	e9d9a3c3a0	fix: remove unused vars in TopP sampler placeholder Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	a0f77960a1	fix: resolve CGo type conflict in error handler Use pure C callback instead of //export to avoid const char* vs GoString type mismatch in cgo-generated headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	5e2d941b4d	fix: correct 20 mlx-c API mismatches for v0.4.1 - Use _axis/_axes variants for softmax, argmax, topk, sum, mean, squeeze, concatenate, argpartition - Fix size_t vs int for count parameters throughout - Fix int64_t strides in as_strided - Add mlx_optional_int + mode param to quantized_matmul - Use mlx_array_new() for null arrays (freqs, key, mask, sinks) - Fix expand_dims to single-axis signature - Fix compile callback signature (size_t index) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	c6597691bb	fix: correct mlx_closure_new_func_payload signature for mlx-c v0.4.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	bc28aad526	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

10 commits