go-mlx/docs/development.md
Snider 1ea90b03b4 docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking with structured docs covering CGO/mlx-c
architecture, 4 model architectures, training pipeline, mlxlm backend,
development guide, and full project history across 5 phases.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:03:39 +00:00

8 KiB

Development Guide

Module: forge.lthn.ai/core/go-mlx


Prerequisites

Platform

macOS on Apple Silicon only. All CGO source files carry //go:build darwin && arm64. The package will not build for native Metal inference on any other platform; a stub (mlx_stub.go) provides MetalAvailable() bool returning false elsewhere.

Required Tools

Tool Version Purpose
Go 1.25.5+ Module toolchain
CMake 3.24+ Builds mlx-c from source
AppleClang 17.0+ C/C++ compiler for mlx-c
macOS SDK 26.2+ Metal framework headers
Xcode Command Line Tools Current Provides xcrun, frameworks

Install CMake if absent:

brew install cmake

Go Workspace

go-mlx participates in a Go workspace alongside go-inference. The go.mod uses a replace directive for local development:

replace forge.lthn.ai/core/go-inference => ../go-inference

After adding modules or changing dependencies: go work sync


Build

Step 1: Build mlx-c

Run from the module root:

go generate ./...

This executes the //go:generate directives in mlx.go:

cmake -S . -B build -DCMAKE_INSTALL_PREFIX=dist -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
cmake --install build

CMake fetches mlx-c v0.4.1 from GitHub, builds it with:

  • MLX_BUILD_SAFETENSORS=ON (model loading)
  • MLX_BUILD_GGUF=OFF
  • BUILD_SHARED_LIBS=ON
  • macOS deployment target: 13.3 (minimum required by MLX)

The built library installs to dist/include/ and dist/lib/. Build time is approximately 2 minutes on M3 Ultra.

The dist/ directory is gitignored and must be rebuilt on each fresh checkout.

Step 2: Run Tests

go test ./...

Tests require a working mlx-c build. Integration tests that load model files are skipped automatically when model paths are absent (/Volumes/Data/lem/safetensors/...).


CGO Flags

The #cgo directives in internal/metal/metal.go set all required flags automatically when building on darwin/arm64:

#cgo CXXFLAGS: -std=c++17
#cgo CFLAGS: -mmacosx-version-min=13.3
#cgo CPPFLAGS: -I${SRCDIR}/../../dist/include
#cgo LDFLAGS: -L${SRCDIR}/../../dist/lib -lmlxc -lmlx
#cgo darwin LDFLAGS: -framework Foundation -framework Metal -framework Accelerate
#cgo darwin LDFLAGS: -Wl,-rpath,${SRCDIR}/../../dist/lib

${SRCDIR} is the directory containing metal.go at build time (internal/metal/), so the ../../dist/ path resolves to the module root dist/.

No manual environment variables are needed for go build or go test.


Test Patterns

Tests use the _Good, _Bad, _Ugly suffix convention:

Suffix Meaning
_Good Happy path; expected to succeed
_Bad Expected error conditions
_Ugly Panic / edge cases

Example:

func TestMatmul_Good(t *testing.T) { ... }
func TestMatmul_Bad(t *testing.T) { ... }

Tests that require model files on disk use t.Skip() when the path is absent:

const modelPath = "/Volumes/Data/lem/safetensors/gemma-3/"
if _, err := os.Stat(modelPath); err != nil {
    t.Skip("model not available:", modelPath)
}

All 180+ tests in internal/metal/ are unit or integration tests that exercise the CGO layer directly. The 11 tests in the root package (mlx_test.go) exercise the public API via go-inference.

Running a Single Test

go test -run TestRMSNorm_Good ./internal/metal/

Running with Race Detector

go test -race ./...

Benchmarks

29 benchmarks in internal/metal/bench_test.go. Run with:

go test -bench=. -benchtime=2s ./internal/metal/

Key benchmarks:

Benchmark group What it measures
BenchmarkMatmul_* Matrix multiply at 128² through 4096², plus token projection
BenchmarkSoftmax_* Softmax at 1K through 128K vocab
BenchmarkElementWise_* Add, Mul, SiLU at 1M elements
BenchmarkRMSNorm_* Fused RMSNorm at decode and prefill shapes
BenchmarkRoPE_* RoPE at single-token and 512-token shapes
BenchmarkSDPA_* Scaled dot-product attention at 1, 32, 512 sequence lengths
BenchmarkLinear_* Linear layer forward at decode and prefill shapes
BenchmarkSampler_* Greedy, TopK, TopP, and full chain on 32K vocab

Model-level benchmarks (model.Forward, tokenizer) require model files on disk and are not included in the automated suite.


Code Structure

Adding a New Operation

  1. Add the C binding to the appropriate file in internal/metal/:
    • ops.go — element-wise, reduction, matrix, shape operations
    • fast.go — fused Metal kernel wrappers
    • slice.go — slicing and scatter operations
  2. Follow the newArray("OP_NAME", inputs...) pattern for tracking
  3. Add tests in the corresponding _test.go file using _Good/_Bad suffixes
  4. Add a benchmark in bench_test.go for any operation on the hot path

Adding a New Model Architecture

  1. Read config.json model_type and add a case in model.go:loadModel
  2. Create architecture.go in internal/metal/ implementing InternalModel
  3. Add ApplyLoRA to the new model
  4. Add a close* helper in close.go for deterministic resource cleanup
  5. Add formatXyzChat in generate.go for the chat template
  6. Add tokeniser BOS/EOS detection in tokenizer.go:LoadTokenizer
  7. Write tests: config parsing, missing weights, end-to-end inference

Coding Standards

Language

UK English throughout: colour, organisation, centre, initialise, behaviour. Never American spellings.

Go Style

  • declare(strict_types=1) equivalent: all parameters and return types must be explicitly typed
  • PSR-12 equivalent: gofmt + goimports; run before committing
  • go test ./... must pass before every commit; no red tests in main

Licence Header

Every new source file must carry the EUPL-1.2 licence identifier:

// SPDX-Licence-Identifier: EUPL-1.2

Conventional Commits

Format: type(scope): description

Types:

  • feat — new capability
  • fix — bug fix
  • test — test additions or changes
  • bench — benchmark additions or changes
  • refactor — code restructuring without behaviour change
  • docs — documentation only
  • chore — maintenance (gitignore, go.mod, CMake)

Scopes: metal, api, mlxlm, cpp, docs

Examples:

feat(metal): add TopP nucleus sampling
fix(metal): auto-contiguous data access for non-contiguous arrays
test(metal): add model loading robustness tests
bench(metal): add 29 benchmarks baselined on M3 Ultra

Co-Author

All commits must include:

Co-Authored-By: Virgil <virgil@lethean.io>

Build Tags

  • All CGO files: //go:build darwin && arm64
  • Stub file: //go:build !darwin || !arm64
  • mlxlm opt-out: //go:build !nomlxlm

CMake Configuration

CMakeLists.txt at the module root. Key settings:

set(MLX_BUILD_SAFETENSORS ON)   # Required for model loading
set(MLX_BUILD_GGUF OFF)         # GGUF not supported
set(BUILD_SHARED_LIBS ON)       # Shared .dylib for rpath loading
set(CMAKE_OSX_DEPLOYMENT_TARGET 13.3)  # MLX minimum

To force a clean rebuild:

rm -rf build dist
go generate ./...

mlxlm Backend Development

The mlxlm/ package has no CGO dependency and tests run on any platform where Python 3 is available. Tests use testdata/mock_bridge.py instead of the real bridge.py, so no mlx-lm installation is required.

Run mlxlm tests:

go test ./mlxlm/

The mock bridge responds to all commands with fixed fake data, enabling full subprocess protocol testing without GPU or Python ML dependencies.

To opt out of building the mlxlm backend:

go build -tags nomlxlm ./...

Dependency Graph

go-mlx
├── forge.lthn.ai/core/go-inference  (shared interfaces, zero dependencies)
└── mlx-c v0.4.1                     (CMake, fetched from GitHub at generate time)
    └── Apple MLX (Metal GPU compute)
        └── Foundation, Metal, Accelerate frameworks

The root package and mlxlm/ have no CGO dependency. Only internal/metal/ links against mlx-c.