Replace internal task tracking with structured docs covering CGO/mlx-c architecture, 4 model architectures, training pipeline, mlxlm backend, development guide, and full project history across 5 phases. Co-Authored-By: Virgil <virgil@lethean.io>
298 lines
8 KiB
Markdown
298 lines
8 KiB
Markdown
# Development Guide
|
|
|
|
Module: `forge.lthn.ai/core/go-mlx`
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Platform
|
|
|
|
**macOS on Apple Silicon only.** All CGO source files carry `//go:build darwin && arm64`. The package will not build for native Metal inference on any other platform; a stub (`mlx_stub.go`) provides `MetalAvailable() bool` returning false elsewhere.
|
|
|
|
### Required Tools
|
|
|
|
| Tool | Version | Purpose |
|
|
|------|---------|---------|
|
|
| Go | 1.25.5+ | Module toolchain |
|
|
| CMake | 3.24+ | Builds mlx-c from source |
|
|
| AppleClang | 17.0+ | C/C++ compiler for mlx-c |
|
|
| macOS SDK | 26.2+ | Metal framework headers |
|
|
| Xcode Command Line Tools | Current | Provides `xcrun`, frameworks |
|
|
|
|
Install CMake if absent:
|
|
|
|
```bash
|
|
brew install cmake
|
|
```
|
|
|
|
### Go Workspace
|
|
|
|
go-mlx participates in a Go workspace alongside go-inference. The `go.mod` uses a `replace` directive for local development:
|
|
|
|
```
|
|
replace forge.lthn.ai/core/go-inference => ../go-inference
|
|
```
|
|
|
|
After adding modules or changing dependencies: `go work sync`
|
|
|
|
---
|
|
|
|
## Build
|
|
|
|
### Step 1: Build mlx-c
|
|
|
|
Run from the module root:
|
|
|
|
```bash
|
|
go generate ./...
|
|
```
|
|
|
|
This executes the `//go:generate` directives in `mlx.go`:
|
|
|
|
```
|
|
cmake -S . -B build -DCMAKE_INSTALL_PREFIX=dist -DCMAKE_BUILD_TYPE=Release
|
|
cmake --build build --parallel
|
|
cmake --install build
|
|
```
|
|
|
|
CMake fetches mlx-c v0.4.1 from GitHub, builds it with:
|
|
- `MLX_BUILD_SAFETENSORS=ON` (model loading)
|
|
- `MLX_BUILD_GGUF=OFF`
|
|
- `BUILD_SHARED_LIBS=ON`
|
|
- macOS deployment target: 13.3 (minimum required by MLX)
|
|
|
|
The built library installs to `dist/include/` and `dist/lib/`. Build time is approximately 2 minutes on M3 Ultra.
|
|
|
|
The `dist/` directory is gitignored and must be rebuilt on each fresh checkout.
|
|
|
|
### Step 2: Run Tests
|
|
|
|
```bash
|
|
go test ./...
|
|
```
|
|
|
|
Tests require a working mlx-c build. Integration tests that load model files are skipped automatically when model paths are absent (`/Volumes/Data/lem/safetensors/...`).
|
|
|
|
---
|
|
|
|
## CGO Flags
|
|
|
|
The `#cgo` directives in `internal/metal/metal.go` set all required flags automatically when building on darwin/arm64:
|
|
|
|
```c
|
|
#cgo CXXFLAGS: -std=c++17
|
|
#cgo CFLAGS: -mmacosx-version-min=13.3
|
|
#cgo CPPFLAGS: -I${SRCDIR}/../../dist/include
|
|
#cgo LDFLAGS: -L${SRCDIR}/../../dist/lib -lmlxc -lmlx
|
|
#cgo darwin LDFLAGS: -framework Foundation -framework Metal -framework Accelerate
|
|
#cgo darwin LDFLAGS: -Wl,-rpath,${SRCDIR}/../../dist/lib
|
|
```
|
|
|
|
`${SRCDIR}` is the directory containing `metal.go` at build time (`internal/metal/`), so the `../../dist/` path resolves to the module root `dist/`.
|
|
|
|
No manual environment variables are needed for `go build` or `go test`.
|
|
|
|
---
|
|
|
|
## Test Patterns
|
|
|
|
Tests use the `_Good`, `_Bad`, `_Ugly` suffix convention:
|
|
|
|
| Suffix | Meaning |
|
|
|--------|---------|
|
|
| `_Good` | Happy path; expected to succeed |
|
|
| `_Bad` | Expected error conditions |
|
|
| `_Ugly` | Panic / edge cases |
|
|
|
|
Example:
|
|
|
|
```go
|
|
func TestMatmul_Good(t *testing.T) { ... }
|
|
func TestMatmul_Bad(t *testing.T) { ... }
|
|
```
|
|
|
|
Tests that require model files on disk use `t.Skip()` when the path is absent:
|
|
|
|
```go
|
|
const modelPath = "/Volumes/Data/lem/safetensors/gemma-3/"
|
|
if _, err := os.Stat(modelPath); err != nil {
|
|
t.Skip("model not available:", modelPath)
|
|
}
|
|
```
|
|
|
|
All 180+ tests in `internal/metal/` are unit or integration tests that exercise the CGO layer directly. The 11 tests in the root package (`mlx_test.go`) exercise the public API via go-inference.
|
|
|
|
### Running a Single Test
|
|
|
|
```bash
|
|
go test -run TestRMSNorm_Good ./internal/metal/
|
|
```
|
|
|
|
### Running with Race Detector
|
|
|
|
```bash
|
|
go test -race ./...
|
|
```
|
|
|
|
---
|
|
|
|
## Benchmarks
|
|
|
|
29 benchmarks in `internal/metal/bench_test.go`. Run with:
|
|
|
|
```bash
|
|
go test -bench=. -benchtime=2s ./internal/metal/
|
|
```
|
|
|
|
Key benchmarks:
|
|
|
|
| Benchmark group | What it measures |
|
|
|----------------|-----------------|
|
|
| `BenchmarkMatmul_*` | Matrix multiply at 128² through 4096², plus token projection |
|
|
| `BenchmarkSoftmax_*` | Softmax at 1K through 128K vocab |
|
|
| `BenchmarkElementWise_*` | Add, Mul, SiLU at 1M elements |
|
|
| `BenchmarkRMSNorm_*` | Fused RMSNorm at decode and prefill shapes |
|
|
| `BenchmarkRoPE_*` | RoPE at single-token and 512-token shapes |
|
|
| `BenchmarkSDPA_*` | Scaled dot-product attention at 1, 32, 512 sequence lengths |
|
|
| `BenchmarkLinear_*` | Linear layer forward at decode and prefill shapes |
|
|
| `BenchmarkSampler_*` | Greedy, TopK, TopP, and full chain on 32K vocab |
|
|
|
|
Model-level benchmarks (`model.Forward`, tokenizer) require model files on disk and are not included in the automated suite.
|
|
|
|
---
|
|
|
|
## Code Structure
|
|
|
|
### Adding a New Operation
|
|
|
|
1. Add the C binding to the appropriate file in `internal/metal/`:
|
|
- `ops.go` — element-wise, reduction, matrix, shape operations
|
|
- `fast.go` — fused Metal kernel wrappers
|
|
- `slice.go` — slicing and scatter operations
|
|
2. Follow the `newArray("OP_NAME", inputs...)` pattern for tracking
|
|
3. Add tests in the corresponding `_test.go` file using `_Good`/`_Bad` suffixes
|
|
4. Add a benchmark in `bench_test.go` for any operation on the hot path
|
|
|
|
### Adding a New Model Architecture
|
|
|
|
1. Read `config.json` `model_type` and add a case in `model.go`:`loadModel`
|
|
2. Create `architecture.go` in `internal/metal/` implementing `InternalModel`
|
|
3. Add `ApplyLoRA` to the new model
|
|
4. Add a `close*` helper in `close.go` for deterministic resource cleanup
|
|
5. Add `formatXyzChat` in `generate.go` for the chat template
|
|
6. Add tokeniser BOS/EOS detection in `tokenizer.go`:`LoadTokenizer`
|
|
7. Write tests: config parsing, missing weights, end-to-end inference
|
|
|
|
---
|
|
|
|
## Coding Standards
|
|
|
|
### Language
|
|
|
|
UK English throughout: colour, organisation, centre, initialise, behaviour. Never American spellings.
|
|
|
|
### Go Style
|
|
|
|
- `declare(strict_types=1)` equivalent: all parameters and return types must be explicitly typed
|
|
- PSR-12 equivalent: `gofmt` + `goimports`; run before committing
|
|
- `go test ./...` must pass before every commit; no red tests in main
|
|
|
|
### Licence Header
|
|
|
|
Every new source file must carry the EUPL-1.2 licence identifier:
|
|
|
|
```go
|
|
// SPDX-Licence-Identifier: EUPL-1.2
|
|
```
|
|
|
|
### Conventional Commits
|
|
|
|
Format: `type(scope): description`
|
|
|
|
Types:
|
|
- `feat` — new capability
|
|
- `fix` — bug fix
|
|
- `test` — test additions or changes
|
|
- `bench` — benchmark additions or changes
|
|
- `refactor` — code restructuring without behaviour change
|
|
- `docs` — documentation only
|
|
- `chore` — maintenance (gitignore, go.mod, CMake)
|
|
|
|
Scopes: `metal`, `api`, `mlxlm`, `cpp`, `docs`
|
|
|
|
Examples:
|
|
```
|
|
feat(metal): add TopP nucleus sampling
|
|
fix(metal): auto-contiguous data access for non-contiguous arrays
|
|
test(metal): add model loading robustness tests
|
|
bench(metal): add 29 benchmarks baselined on M3 Ultra
|
|
```
|
|
|
|
### Co-Author
|
|
|
|
All commits must include:
|
|
|
|
```
|
|
Co-Authored-By: Virgil <virgil@lethean.io>
|
|
```
|
|
|
|
### Build Tags
|
|
|
|
- All CGO files: `//go:build darwin && arm64`
|
|
- Stub file: `//go:build !darwin || !arm64`
|
|
- mlxlm opt-out: `//go:build !nomlxlm`
|
|
|
|
---
|
|
|
|
## CMake Configuration
|
|
|
|
`CMakeLists.txt` at the module root. Key settings:
|
|
|
|
```cmake
|
|
set(MLX_BUILD_SAFETENSORS ON) # Required for model loading
|
|
set(MLX_BUILD_GGUF OFF) # GGUF not supported
|
|
set(BUILD_SHARED_LIBS ON) # Shared .dylib for rpath loading
|
|
set(CMAKE_OSX_DEPLOYMENT_TARGET 13.3) # MLX minimum
|
|
```
|
|
|
|
To force a clean rebuild:
|
|
|
|
```bash
|
|
rm -rf build dist
|
|
go generate ./...
|
|
```
|
|
|
|
---
|
|
|
|
## mlxlm Backend Development
|
|
|
|
The `mlxlm/` package has no CGO dependency and tests run on any platform where Python 3 is available. Tests use `testdata/mock_bridge.py` instead of the real `bridge.py`, so no `mlx-lm` installation is required.
|
|
|
|
Run mlxlm tests:
|
|
|
|
```bash
|
|
go test ./mlxlm/
|
|
```
|
|
|
|
The mock bridge responds to all commands with fixed fake data, enabling full subprocess protocol testing without GPU or Python ML dependencies.
|
|
|
|
To opt out of building the mlxlm backend:
|
|
|
|
```bash
|
|
go build -tags nomlxlm ./...
|
|
```
|
|
|
|
---
|
|
|
|
## Dependency Graph
|
|
|
|
```
|
|
go-mlx
|
|
├── forge.lthn.ai/core/go-inference (shared interfaces, zero dependencies)
|
|
└── mlx-c v0.4.1 (CMake, fetched from GitHub at generate time)
|
|
└── Apple MLX (Metal GPU compute)
|
|
└── Foundation, Metal, Accelerate frameworks
|
|
```
|
|
|
|
The root package and `mlxlm/` have no CGO dependency. Only `internal/metal/` links against mlx-c.
|