- CLAUDE.md: new architecture diagram, public API examples - TODO.md: Phase 4 marked complete, remaining items noted - FINDINGS.md: migration completion notes, import cycle resolution Co-Authored-By: Virgil <virgil@lethean.io>
5.3 KiB
CLAUDE.md
What This Is
Native Apple Metal GPU inference via mlx-c bindings. Module: forge.lthn.ai/core/go-mlx
Pure Go + CGO package that wraps Apple's MLX framework through the mlx-c C API. Runs LLM inference on Apple Silicon GPUs (M1-M4) using Metal compute shaders.
Platform
darwin/arm64 only. All CGO files carry //go:build darwin && arm64. A stub (mlx_stub.go) provides MetalAvailable() bool returning false on other platforms.
Build
# Step 1: Build mlx-c C library via CMake (fetches mlx-c v0.4.1)
go generate ./...
# Step 2: Run tests (must be on Apple Silicon)
go test ./...
CGO Flags (auto-set via go:generate)
CMake installs to dist/ inside the package directory. The #cgo directives in internal/metal/metal.go reference:
CPPFLAGS: -I${SRCDIR}/../../dist/includeLDFLAGS: -L${SRCDIR}/../../dist/lib -lmlxc -lmlx- Frameworks: Foundation, Metal, Accelerate
CMake Config
CMakeLists.txt fetches mlx-c v0.4.1 from GitHub. Key settings:
MLX_BUILD_SAFETENSORS=ON(model loading)MLX_BUILD_GGUF=OFFBUILD_SHARED_LIBS=ON- macOS deployment target: 26.0
Architecture
go-mlx/
├── Public API (compiles on all platforms)
│ ├── mlx.go — Package doc + go:generate CMake directives
│ ├── textmodel.go — TextModel interface, Token, Message types
│ ├── options.go — GenerateOption, LoadOption functional options
│ ├── backend.go — Backend interface, Register/Get/Default/LoadModel
│ ├── register_metal.go — //go:build darwin && arm64 — auto-registers metal
│ ├── mlx_stub.go — //go:build !darwin || !arm64 — MetalAvailable() false
│ └── mlx_test.go — Integration tests (public API)
│
├── internal/metal/ — All CGO code (darwin/arm64 only)
│ ├── metal.go — Init, Materialize, error handler, stream
│ ├── array.go — Array type, creation, data access
│ ├── dtype.go — DType constants
│ ├── stream.go — Metal stream/queue, memory controls
│ ├── ops.go — Element-wise, reduction, shape ops
│ ├── fast.go — Fused Metal kernels (RMSNorm, RoPE, SDPA)
│ ├── nn.go — Linear, Embedding, RMSNormModule
│ ├── compile.go — CompiledFunc
│ ├── slice.go — Array slicing
│ ├── random.go — RandomCategorical, RandomUniform
│ ├── io.go — Safetensors loading
│ ├── model.go — InternalModel interface + architecture dispatch
│ ├── gemma3.go — Gemma3 decoder
│ ├── qwen3.go — Qwen3 decoder
│ ├── cache.go — KVCache + RotatingKVCache
│ ├── sample.go — Sampling chain (greedy, temp, topK, topP)
│ ├── tokenizer.go — BPE tokenizer
│ ├── grad.go — VJP
│ ├── lora.go — LoRA adapters
│ ├── optim.go — AdamW
│ ├── generate.go — Autoregressive generation loop + chat templates
│ └── backend.go — LoadAndInit entry point
│
├── cpp/ — CLion Claude workspace (C++ research)
│ ├── CMakeLists.txt
│ ├── CLAUDE.md
│ ├── TODO.md
│ └── FINDINGS.md
│
└── docs/plans/ — Design and implementation docs
Public API
import "forge.lthn.ai/core/go-mlx"
// Load and generate
m, err := mlx.LoadModel("/path/to/model/")
if err != nil { log.Fatal(err) }
defer m.Close()
ctx := context.Background()
for tok := range m.Generate(ctx, "What is 2+2?", mlx.WithMaxTokens(128)) {
fmt.Print(tok.Text)
}
if err := m.Err(); err != nil { log.Fatal(err) }
// Chat with template formatting
for tok := range m.Chat(ctx, []mlx.Message{
{Role: "user", Content: "Hello"},
}, mlx.WithMaxTokens(64)) {
fmt.Print(tok.Text)
}
// Memory controls
mlx.SetCacheLimit(4 * 1024 * 1024 * 1024) // 4GB
mlx.ClearCache()
Dependencies
- mlx-c v0.4.1 (fetched by CMake at build time)
- Apple frameworks: Foundation, Metal, Accelerate
- Go stdlib only — no external Go dependencies
Downstream Consumers
forge.lthn.ai/core/go-ml—backend_mlx.gousesmlx.LoadModel()+m.Generate()forge.lthn.ai/core/go-i18n— Phase 2a needs Gemma3-1B inference for domain classification
The old API (Array, MatMul, model.LoadModel, etc.) is no longer public — all CGO is in internal/metal/. Consumers use the clean TextModel interface.
Model Format
Safetensors (HuggingFace format). NOT GGUF.
- Example:
/Volumes/Data/lem/safetensors/gemma-3/ - Models must be in safetensors format with matching tokenizer config
Coding Standards
- UK English (colour, organisation, centre)
go test ./...must pass before commit- Conventional commits:
type(scope): description - Co-Author:
Co-Authored-By: Virgil <virgil@lethean.io> - Licence: EUPL-1.2
Task Queue
See TODO.md for prioritised work.
See FINDINGS.md for research notes.
See the wiki for architecture docs.