Snider 1ea90b03b4 docs: graduate TODO/FINDINGS into production documentation

Replace internal task tracking with structured docs covering CGO/mlx-c
architecture, 4 model architectures, training pipeline, mlxlm backend,
development guide, and full project history across 5 phases.

Co-Authored-By: Virgil <virgil@lethean.io>

2026-02-20 15:03:39 +00:00

8 KiB

Raw Blame History

Development Guide

Module: forge.lthn.ai/core/go-mlx

Prerequisites

Platform

macOS on Apple Silicon only. All CGO source files carry //go:build darwin && arm64. The package will not build for native Metal inference on any other platform; a stub (mlx_stub.go) provides MetalAvailable() bool returning false elsewhere.

Required Tools

Tool	Version	Purpose
Go	1.25.5+	Module toolchain
CMake	3.24+	Builds mlx-c from source
AppleClang	17.0+	C/C++ compiler for mlx-c
macOS SDK	26.2+	Metal framework headers
Xcode Command Line Tools	Current	Provides `xcrun`, frameworks

Install CMake if absent:

brew install cmake

Go Workspace

go-mlx participates in a Go workspace alongside go-inference. The go.mod uses a replace directive for local development:

replace forge.lthn.ai/core/go-inference => ../go-inference

After adding modules or changing dependencies: go work sync

Build

Step 1: Build mlx-c

Run from the module root:

go generate ./...

This executes the //go:generate directives in mlx.go:

cmake -S . -B build -DCMAKE_INSTALL_PREFIX=dist -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
cmake --install build

CMake fetches mlx-c v0.4.1 from GitHub, builds it with:

MLX_BUILD_SAFETENSORS=ON (model loading)
MLX_BUILD_GGUF=OFF
BUILD_SHARED_LIBS=ON
macOS deployment target: 13.3 (minimum required by MLX)

The built library installs to dist/include/ and dist/lib/. Build time is approximately 2 minutes on M3 Ultra.

The dist/ directory is gitignored and must be rebuilt on each fresh checkout.

Step 2: Run Tests

go test ./...

Tests require a working mlx-c build. Integration tests that load model files are skipped automatically when model paths are absent (/Volumes/Data/lem/safetensors/...).

CGO Flags

The #cgo directives in internal/metal/metal.go set all required flags automatically when building on darwin/arm64:

#cgo CXXFLAGS: -std=c++17
#cgo CFLAGS: -mmacosx-version-min=13.3
#cgo CPPFLAGS: -I${SRCDIR}/../../dist/include
#cgo LDFLAGS: -L${SRCDIR}/../../dist/lib -lmlxc -lmlx
#cgo darwin LDFLAGS: -framework Foundation -framework Metal -framework Accelerate
#cgo darwin LDFLAGS: -Wl,-rpath,${SRCDIR}/../../dist/lib

${SRCDIR} is the directory containing metal.go at build time (internal/metal/), so the ../../dist/ path resolves to the module root dist/.

No manual environment variables are needed for go build or go test.

Test Patterns

Tests use the _Good, _Bad, _Ugly suffix convention:

Suffix	Meaning
`_Good`	Happy path; expected to succeed
`_Bad`	Expected error conditions
`_Ugly`	Panic / edge cases

Example:

func TestMatmul_Good(t *testing.T) { ... }
func TestMatmul_Bad(t *testing.T) { ... }

Tests that require model files on disk use t.Skip() when the path is absent:

const modelPath = "/Volumes/Data/lem/safetensors/gemma-3/"
if _, err := os.Stat(modelPath); err != nil {
    t.Skip("model not available:", modelPath)
}

All 180+ tests in internal/metal/ are unit or integration tests that exercise the CGO layer directly. The 11 tests in the root package (mlx_test.go) exercise the public API via go-inference.

Running a Single Test

go test -run TestRMSNorm_Good ./internal/metal/

Running with Race Detector

go test -race ./...

Benchmarks

29 benchmarks in internal/metal/bench_test.go. Run with:

go test -bench=. -benchtime=2s ./internal/metal/

Key benchmarks:

Benchmark group	What it measures
`BenchmarkMatmul_*`	Matrix multiply at 128² through 4096², plus token projection
`BenchmarkSoftmax_*`	Softmax at 1K through 128K vocab
`BenchmarkElementWise_*`	Add, Mul, SiLU at 1M elements
`BenchmarkRMSNorm_*`	Fused RMSNorm at decode and prefill shapes
`BenchmarkRoPE_*`	RoPE at single-token and 512-token shapes
`BenchmarkSDPA_*`	Scaled dot-product attention at 1, 32, 512 sequence lengths
`BenchmarkLinear_*`	Linear layer forward at decode and prefill shapes
`BenchmarkSampler_*`	Greedy, TopK, TopP, and full chain on 32K vocab

Model-level benchmarks (model.Forward, tokenizer) require model files on disk and are not included in the automated suite.

Code Structure

Adding a New Operation

Add the C binding to the appropriate file in internal/metal/:
- ops.go — element-wise, reduction, matrix, shape operations
- fast.go — fused Metal kernel wrappers
- slice.go — slicing and scatter operations
Follow the newArray("OP_NAME", inputs...) pattern for tracking
Add tests in the corresponding _test.go file using _Good/_Bad suffixes
Add a benchmark in bench_test.go for any operation on the hot path

Adding a New Model Architecture

Read config.json model_type and add a case in model.go:loadModel
Create architecture.go in internal/metal/ implementing InternalModel
Add ApplyLoRA to the new model
Add a close* helper in close.go for deterministic resource cleanup
Add formatXyzChat in generate.go for the chat template
Add tokeniser BOS/EOS detection in tokenizer.go:LoadTokenizer
Write tests: config parsing, missing weights, end-to-end inference

Coding Standards

Language

UK English throughout: colour, organisation, centre, initialise, behaviour. Never American spellings.

Go Style

declare(strict_types=1) equivalent: all parameters and return types must be explicitly typed
PSR-12 equivalent: gofmt + goimports; run before committing
go test ./... must pass before every commit; no red tests in main

Licence Header

Every new source file must carry the EUPL-1.2 licence identifier:

// SPDX-Licence-Identifier: EUPL-1.2

Conventional Commits

Format: type(scope): description

Types:

feat — new capability
fix — bug fix
test — test additions or changes
bench — benchmark additions or changes
refactor — code restructuring without behaviour change
docs — documentation only
chore — maintenance (gitignore, go.mod, CMake)

Scopes: metal, api, mlxlm, cpp, docs

Examples:

feat(metal): add TopP nucleus sampling
fix(metal): auto-contiguous data access for non-contiguous arrays
test(metal): add model loading robustness tests
bench(metal): add 29 benchmarks baselined on M3 Ultra

Co-Author

All commits must include:

Co-Authored-By: Virgil <virgil@lethean.io>

Build Tags

All CGO files: //go:build darwin && arm64
Stub file: //go:build !darwin || !arm64
mlxlm opt-out: //go:build !nomlxlm

CMake Configuration

CMakeLists.txt at the module root. Key settings:

set(MLX_BUILD_SAFETENSORS ON)   # Required for model loading
set(MLX_BUILD_GGUF OFF)         # GGUF not supported
set(BUILD_SHARED_LIBS ON)       # Shared .dylib for rpath loading
set(CMAKE_OSX_DEPLOYMENT_TARGET 13.3)  # MLX minimum

To force a clean rebuild:

rm -rf build dist
go generate ./...

mlxlm Backend Development

The mlxlm/ package has no CGO dependency and tests run on any platform where Python 3 is available. Tests use testdata/mock_bridge.py instead of the real bridge.py, so no mlx-lm installation is required.

Run mlxlm tests:

go test ./mlxlm/

The mock bridge responds to all commands with fixed fake data, enabling full subprocess protocol testing without GPU or Python ML dependencies.

To opt out of building the mlxlm backend:

go build -tags nomlxlm ./...

Dependency Graph

go-mlx
├── forge.lthn.ai/core/go-inference  (shared interfaces, zero dependencies)
└── mlx-c v0.4.1                     (CMake, fetched from GitHub at generate time)
    └── Apple MLX (Metal GPU compute)
        └── Foundation, Metal, Accelerate frameworks

The root package and mlxlm/ have no CGO dependency. Only internal/metal/ links against mlx-c.

8 KiB Raw Blame History