Snider cae7ef05e8 feat: extract go-mlx from go-ai as standalone Metal inference package

Split mlx/ directory from forge.lthn.ai/core/go-ai into its own module.
Rewrites import paths, adds CLAUDE.md/TODO.md/FINDINGS.md for dedicated
Claude sessions. Zero external Go deps — pure CGO + mlx-c v0.4.1.

Co-Authored-By: Virgil <virgil@lethean.io>

2026-02-19 17:57:37 +00:00

4.7 KiB

Raw Blame History

CLAUDE.md

What This Is

Native Apple Metal GPU inference via mlx-c bindings. Module: forge.lthn.ai/core/go-mlx

Pure Go + CGO package that wraps Apple's MLX framework through the mlx-c C API. Runs LLM inference on Apple Silicon GPUs (M1-M4) using Metal compute shaders.

Platform

darwin/arm64 only. All files carry //go:build darwin && arm64. A stub (mlx_stub.go) provides MetalAvailable() bool returning false on other platforms.

Build

# Step 1: Build mlx-c C library via CMake (fetches mlx-c v0.4.1)
cd /tmp/core-go-mlx && go generate ./...

# Step 2: Run tests (must be on Apple Silicon)
go test ./...

# Step 3: Use in another module
# Import "forge.lthn.ai/core/go-mlx" and "forge.lthn.ai/core/go-mlx/model"

CGO Flags (auto-set via go:generate)

CMake installs to dist/ inside the package directory. The #cgo directives in mlx.go reference:

CPPFLAGS: -I${SRCDIR}/dist/include
LDFLAGS: -L${SRCDIR}/dist/lib -lmlxc -lmlx
Frameworks: Foundation, Metal, Accelerate

CMake Config

CMakeLists.txt fetches mlx-c v0.4.1 from GitHub. Key settings:

MLX_BUILD_SAFETENSORS=ON (model loading)
MLX_BUILD_GGUF=OFF (GGUF is for llama.cpp backend in go-ai/ml/)
BUILD_SHARED_LIBS=ON
macOS deployment target: 26.0

Architecture

go-mlx/
├── Core Layer (Metal GPU)
│   ├── mlx.go              — Init, Materialize, MetalAvailable (CGO bridge)
│   ├── mlx_stub.go         — Non-Apple fallback
│   ├── array.go            — MLX array wrapper (create, reshape, data access)
│   ├── dtype.go            — Data types (Float16, Float32, BFloat16, Int32, etc.)
│   ├── stream.go           — Metal stream/queue management
│   └── CMakeLists.txt      — Fetches mlx-c v0.4.1
│
├── Operations
│   ├── ops.go              — Element-wise: Add, Multiply, MatMul, Softmax, etc.
│   ├── fast.go             — Fused Metal kernels: RMSNorm, RoPE, ScaledDotProductAttention
│   ├── nn.go               — Neural network layers: Linear, Embedding, RMSNorm
│   ├── compile.go          — Compiled function closures for kernel fusion
│   ├── slice.go            — Array slicing/indexing
│   └── random.go           — Categorical sampling
│
├── Training
│   ├── grad.go             — VJP (Vector-Jacobian Product) gradient computation
│   ├── lora.go             — LoRA adapter (rank decomposition fine-tuning)
│   └── optim.go            — AdamW optimiser
│
├── Model Support
│   ├── model/
│   │   ├── model.go        — Base model interface (LoadModel, Generate)
│   │   ├── gemma3.go       — Google Gemma3 decoder
│   │   └── qwen3.go        — Alibaba Qwen3 decoder
│   ├── tokenizer/
│   │   └── tokenizer.go    — BPE tokenizer (sentencepiece format)
│   ├── sample/
│   │   └── sample.go       — Sampling strategies (temperature, top-k, top-p)
│   └── cache/
│       └── cache.go        — KV cache for autoregressive inference
│
└── I/O
    └── io.go               — Safetensors model loading

Key Interfaces

// model/model.go — base interface for all models
type Model interface {
    Forward(x *mlx.Array, cache *cache.KVCache) *mlx.Array
}

// Top-level — GPU materialisation
mlx.Materialize(outputs...)      // Sync GPU eval
mlx.MaterializeAsync(outputs...) // Async GPU eval
mlx.MetalAvailable() bool        // Check Metal GPU

// Array operations
mlx.NewArray(data, shape, dtype)
mlx.MatMul(a, b)
mlx.Softmax(x, axis)
mlx.Add(a, b), mlx.Multiply(a, b), mlx.Divide(a, b)

Dependencies

mlx-c v0.4.1 (fetched by CMake at build time)
Apple frameworks: Foundation, Metal, Accelerate
Go stdlib only — no external Go dependencies

Downstream Consumers

forge.lthn.ai/core/go-ai/ml — backend_mlx.go uses this for Metal inference
forge.lthn.ai/core/go-i18n — Phase 2a needs Gemma3-1B inference for domain classification (blocked on this package being importable)

Model Format

Safetensors (HuggingFace format). NOT GGUF.

Example: /Volumes/Data/lem/safetensors/gemma-3/
Models must be in safetensors format with matching tokenizer config

Coding Standards

UK English (colour, organisation, centre)
go test ./... must pass before commit
Conventional commits: type(scope): description
Co-Author: Co-Authored-By: Virgil <virgil@lethean.io>
Licence: EUPL-1.2

Task Queue

See TODO.md for prioritised work. See FINDINGS.md for research notes. See the wiki for architecture docs.

4.7 KiB Raw Blame History