Split mlx/ directory from forge.lthn.ai/core/go-ai into its own module. Rewrites import paths, adds CLAUDE.md/TODO.md/FINDINGS.md for dedicated Claude sessions. Zero external Go deps — pure CGO + mlx-c v0.4.1. Co-Authored-By: Virgil <virgil@lethean.io>
4.7 KiB
4.7 KiB
CLAUDE.md
What This Is
Native Apple Metal GPU inference via mlx-c bindings. Module: forge.lthn.ai/core/go-mlx
Pure Go + CGO package that wraps Apple's MLX framework through the mlx-c C API. Runs LLM inference on Apple Silicon GPUs (M1-M4) using Metal compute shaders.
Platform
darwin/arm64 only. All files carry //go:build darwin && arm64. A stub (mlx_stub.go) provides MetalAvailable() bool returning false on other platforms.
Build
# Step 1: Build mlx-c C library via CMake (fetches mlx-c v0.4.1)
cd /tmp/core-go-mlx && go generate ./...
# Step 2: Run tests (must be on Apple Silicon)
go test ./...
# Step 3: Use in another module
# Import "forge.lthn.ai/core/go-mlx" and "forge.lthn.ai/core/go-mlx/model"
CGO Flags (auto-set via go:generate)
CMake installs to dist/ inside the package directory. The #cgo directives in mlx.go reference:
CPPFLAGS: -I${SRCDIR}/dist/includeLDFLAGS: -L${SRCDIR}/dist/lib -lmlxc -lmlx- Frameworks: Foundation, Metal, Accelerate
CMake Config
CMakeLists.txt fetches mlx-c v0.4.1 from GitHub. Key settings:
MLX_BUILD_SAFETENSORS=ON(model loading)MLX_BUILD_GGUF=OFF(GGUF is for llama.cpp backend in go-ai/ml/)BUILD_SHARED_LIBS=ON- macOS deployment target: 26.0
Architecture
go-mlx/
├── Core Layer (Metal GPU)
│ ├── mlx.go — Init, Materialize, MetalAvailable (CGO bridge)
│ ├── mlx_stub.go — Non-Apple fallback
│ ├── array.go — MLX array wrapper (create, reshape, data access)
│ ├── dtype.go — Data types (Float16, Float32, BFloat16, Int32, etc.)
│ ├── stream.go — Metal stream/queue management
│ └── CMakeLists.txt — Fetches mlx-c v0.4.1
│
├── Operations
│ ├── ops.go — Element-wise: Add, Multiply, MatMul, Softmax, etc.
│ ├── fast.go — Fused Metal kernels: RMSNorm, RoPE, ScaledDotProductAttention
│ ├── nn.go — Neural network layers: Linear, Embedding, RMSNorm
│ ├── compile.go — Compiled function closures for kernel fusion
│ ├── slice.go — Array slicing/indexing
│ └── random.go — Categorical sampling
│
├── Training
│ ├── grad.go — VJP (Vector-Jacobian Product) gradient computation
│ ├── lora.go — LoRA adapter (rank decomposition fine-tuning)
│ └── optim.go — AdamW optimiser
│
├── Model Support
│ ├── model/
│ │ ├── model.go — Base model interface (LoadModel, Generate)
│ │ ├── gemma3.go — Google Gemma3 decoder
│ │ └── qwen3.go — Alibaba Qwen3 decoder
│ ├── tokenizer/
│ │ └── tokenizer.go — BPE tokenizer (sentencepiece format)
│ ├── sample/
│ │ └── sample.go — Sampling strategies (temperature, top-k, top-p)
│ └── cache/
│ └── cache.go — KV cache for autoregressive inference
│
└── I/O
└── io.go — Safetensors model loading
Key Interfaces
// model/model.go — base interface for all models
type Model interface {
Forward(x *mlx.Array, cache *cache.KVCache) *mlx.Array
}
// Top-level — GPU materialisation
mlx.Materialize(outputs...) // Sync GPU eval
mlx.MaterializeAsync(outputs...) // Async GPU eval
mlx.MetalAvailable() bool // Check Metal GPU
// Array operations
mlx.NewArray(data, shape, dtype)
mlx.MatMul(a, b)
mlx.Softmax(x, axis)
mlx.Add(a, b), mlx.Multiply(a, b), mlx.Divide(a, b)
Dependencies
- mlx-c v0.4.1 (fetched by CMake at build time)
- Apple frameworks: Foundation, Metal, Accelerate
- Go stdlib only — no external Go dependencies
Downstream Consumers
forge.lthn.ai/core/go-ai/ml—backend_mlx.gouses this for Metal inferenceforge.lthn.ai/core/go-i18n— Phase 2a needs Gemma3-1B inference for domain classification (blocked on this package being importable)
Model Format
Safetensors (HuggingFace format). NOT GGUF.
- Example:
/Volumes/Data/lem/safetensors/gemma-3/ - Models must be in safetensors format with matching tokenizer config
Coding Standards
- UK English (colour, organisation, centre)
go test ./...must pass before commit- Conventional commits:
type(scope): description - Co-Author:
Co-Authored-By: Virgil <virgil@lethean.io> - Licence: EUPL-1.2
Task Queue
See TODO.md for prioritised work.
See FINDINGS.md for research notes.
See the wiki for architecture docs.