go-mlx/FINDINGS.md
Snider cae7ef05e8 feat: extract go-mlx from go-ai as standalone Metal inference package
Split mlx/ directory from forge.lthn.ai/core/go-ai into its own module.
Rewrites import paths, adds CLAUDE.md/TODO.md/FINDINGS.md for dedicated
Claude sessions. Zero external Go deps — pure CGO + mlx-c v0.4.1.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 17:57:37 +00:00

4.9 KiB

FINDINGS.md — go-mlx Research & Discovery

Record findings, gaps, and architectural decisions here as work progresses.


2026-02-19: Split from go-ai (Virgil)

Origin

This package was extracted from forge.lthn.ai/core/go-ai/mlx/. The split was motivated by:

  1. Platform isolation — mlx is darwin/arm64 only with CGO + CMake build. Keeping it in go-ai forces the entire AI package to deal with platform-specific build complexity.
  2. Dependency chain — go-i18n Phase 2a needs MLX inference for Gemma3-1B domain classification. A standalone go-mlx module can be imported directly without pulling in all of go-ai (DuckDB, Parquet, gRPC, Ollama, etc.).
  3. Build tag simplicity — Every file is //go:build darwin && arm64. As a standalone module, this is clean. Inside go-ai, it was a special case that required careful handling.

What Was Extracted

Directory Files LOC Purpose
Root (mlx/) 16 ~2,500 Core MLX bindings, ops, training
model/ 3 ~800 Gemma3, Qwen3 model implementations
tokenizer/ 1 ~324 BPE tokenizer
sample/ 1 ~150 Sampling strategies
cache/ 1 ~201 KV cache for inference
Total 22 ~4,354

Import Path Changes

All internal imports rewritten:

  • forge.lthn.ai/core/go-ai/mlxforge.lthn.ai/core/go-mlx
  • forge.lthn.ai/core/go-ai/mlx/cacheforge.lthn.ai/core/go-mlx/cache
  • forge.lthn.ai/core/go-ai/mlx/tokenizerforge.lthn.ai/core/go-mlx/tokenizer
  • forge.lthn.ai/core/go-ai/mlx/modelforge.lthn.ai/core/go-mlx/model
  • forge.lthn.ai/core/go-ai/mlx/sampleforge.lthn.ai/core/go-mlx/sample

Upstream Consumer

go-ai/ml/backend_mlx.go is the only file outside mlx/ that imports it. After split, go-ai needs either:

  • A replace directive: replace forge.lthn.ai/core/go-mlx => ../go-mlx
  • Or a published module version

What Stayed in go-ai

  • ml/backend_mlx.go (253 LOC) — the Backend adapter that calls go-mlx. This stays in go-ai because it implements the go-ai-specific Backend interface.
  • test-mlx.go — integration test utility (go-ai root). Needs updating to import from go-mlx.
  • TEST-RESULTS.md — comprehensive test report (stays as historical record).

2026-02-19: Test Coverage Assessment

Tested (3 test files)

File Tests Coverage
grad_test.go VJP/gradient computation Good — tests forward+backward pass
lora_test.go LoRA adapter Good — tests apply/merge/save
optim_test.go AdamW optimiser Good — tests step/state

Not Tested (critical gaps)

File LOC Risk Notes
ops.go 353 High MatMul, Softmax, element-wise ops — core of everything
array.go 261 High Array creation, reshape, data access — foundational
nn.go ~150 Medium Linear, Embedding, RMSNorm layers
fast.go ~100 Medium Fused Metal kernels (RoPE, ScaledDotProduct)
model/*.go ~800 High No tests for Gemma3/Qwen3 forward pass
tokenizer/ 324 High No BPE encode/decode tests
sample/ ~150 Medium No sampling tests
cache/ 201 Medium No KV cache tests
io.go ~100 Medium No safetensors load tests

Error Handling

The error handler in mlx.go stores the last error in a C static variable and logs it via slog.Error. This is not propagated to Go callers. Functions like MatMul, Softmax, etc. return *Array with no error — if the C operation fails, the caller gets a nil/invalid array with no indication why.

Memory Management

Arrays use runtime.SetFinalizer for C-side deallocation. Under sustained inference (1000+ tokens), this relies on GC pressure to trigger finalizers. No explicit Close() or Free() method exists on Array — could leak under high throughput if GC doesn't keep up.


2026-02-19: Dependency Chain

go-i18n (Phase 2a: domain classification)
    └── needs Gemma3-1B inference
        └── go-mlx (this package)
            └── mlx-c v0.4.1 (CMake, fetched from GitHub)
                └── Apple MLX (Metal GPU compute)

go-ai/ml/backend_mlx.go
    └── imports go-mlx
        └── implements go-ai Backend interface

LEM Lab Connection

LEM Lab (the native MLX chat UI at localhost:8090) also uses this code path. Currently working with Qwen3-8B streaming. The model/ directory supports both Gemma3 and Qwen3.


2026-02-19: Hardware Test Results (from go-ai TEST-RESULTS.md)

Tested on Mac Studio M3 Ultra (32-core CPU, 60-core GPU, 96GB unified memory):

  • All 84 go-ai tests pass (including 3 mlx tests)
  • MLX grad, lora, optim tests all pass
  • Go 1.25.7, mlx-c v0.4.1

Model Inventory (safetensors)

Available on /Volumes/Data/lem/safetensors/:

  • Gemma3-1B, Gemma3-4B, Gemma3-27B
  • Qwen3-8B (used by LEM Lab)