Supports both multimodal (Gemma3ForConditionalGeneration) and
text-only configs. Resolves weights with language_model. prefix
fallback. Computes head_dim from hidden_size when missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CGo wrapper for mlx-c providing zero-Python Metal GPU inference.
Includes Gemma 3 model architecture, BPE tokenizer, KV cache,
composable sampling, and OpenAI-compatible serve command.
Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>