Native Apple Metal GPU inference via mlx-c bindings

Find a file

Charon c1baeb9254 All checks were successful Security Scan / security (push) Successful in 15s Details Test / Vet & Build (push) Successful in 38s Details Merge pull request 'chore: Go 1.26 modernization' (#2 ) from chore/go-1.26-modernization into main		2026-02-24 18:01:47 +00:00
.forgejo/workflows	ci: add Forgejo Actions test and security scan workflows	2026-02-23 03:28:55 +00:00
cpp	fix(metal): address 4 minor code review items	2026-02-19 21:36:40 +00:00
dist/include	chore: vendor MLX C headers for Go module consumers	2026-02-21 19:14:04 +00:00
docs	docs: archive completed backend-abstraction and batch-inference plans	2026-02-24 13:51:21 +00:00
internal/metal	chore: use slices.Sorted(maps.Keys()) for ordered iteration	2026-02-24 16:32:47 +00:00
mlxlm	feat(mlxlm): Phase 5.5 — subprocess backend using Python mlx-lm	2026-02-20 09:02:30 +00:00
.editorconfig	chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)	2026-02-23 06:45:42 +00:00
.gitignore	chore: vendor MLX C headers for Go module consumers	2026-02-21 19:14:04 +00:00
.golangci.yml	chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)	2026-02-23 06:45:42 +00:00
attention_test.go	feat: implement AttentionInspector via KV cache extraction after prefill	2026-02-23 00:37:29 +00:00
CLAUDE.md	docs: graduate TODO/FINDINGS into production documentation	2026-02-20 15:03:39 +00:00
CMakeLists.txt	feat: extract go-mlx from go-ai as standalone Metal inference package	2026-02-19 17:57:37 +00:00
CONTRIBUTING.md	chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)	2026-02-23 06:45:42 +00:00
go.mod	chore: bump go-inference to v0.0.3	2026-02-23 06:59:59 +00:00
go.sum	chore: bump go-inference to v0.0.3	2026-02-23 06:59:59 +00:00
mlx.go	docs: expand package doc with workflow examples	2026-02-19 23:44:07 +00:00
mlx_stub.go	feat: extract go-mlx from go-ai as standalone Metal inference package	2026-02-19 17:57:37 +00:00
mlx_test.go	feat: add model discovery test and update TODO	2026-02-19 23:37:37 +00:00
README.md	chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)	2026-02-23 06:45:42 +00:00
register_metal.go	feat: implement AttentionInspector via KV cache extraction after prefill	2026-02-23 00:37:29 +00:00
Taskfile.yml	chore: add Go repo norms (badges, contributing, lint, taskfile, editorconfig)	2026-02-23 06:45:42 +00:00

README.md

go-mlx

Native Apple Metal GPU inference via mlx-c CGO bindings, implementing the inference.Backend and inference.TextModel interfaces from go-inference for Apple Silicon (M1-M4). Supports Gemma 3, Qwen 2/3, and Llama 3 architectures from HuggingFace safetensors format, with fused Metal kernels for RMSNorm, RoPE, and scaled dot-product attention, KV cache management, LoRA fine-tuning with AdamW, and batch inference. A Python subprocess backend (mlxlm) is provided as a CGO-free alternative. Platform-restricted: darwin/arm64 only; a no-op stub compiles on all other platforms.

Module: forge.lthn.ai/core/go-mlx Licence: EUPL-1.2 Language: Go 1.25

Quick Start

import (
    "forge.lthn.ai/core/go-inference"
    _ "forge.lthn.ai/core/go-mlx"  // registers "metal" backend via init()
)

model, err := inference.LoadModel("/Volumes/Data/lem/safetensors/gemma-3-1b/")
defer model.Close()

for tok := range model.Generate(ctx, "Hello", inference.WithMaxTokens(256)) {
    fmt.Print(tok.Text)
}

Documentation

Architecture — CGO binding, model architectures, weight loading, KV cache, attention, batch inference, LoRA training, mlxlm backend
Development Guide — prerequisites (mlx-c CMake build), CGO flags, test patterns, benchmarks
Project History — completed phases, commit hashes, known limitations

Build & Test

go generate ./...        # builds mlx-c C library (required first time)
go test ./...
go build ./...

Licence

European Union Public Licence 1.2 — see LICENCE for details.