1
Architecture
Virgil edited this page 2026-02-19 17:58:32 +00:00
Table of Contents
Architecture
Layer Diagram
Go Application
|
v
model/ (Gemma3, Qwen3) <-- High-level model interface
|
+-- tokenizer/ <-- BPE encode/decode
+-- sample/ <-- Temperature, top-k, top-p
+-- cache/ <-- KV cache management
|
v
mlx (root package) <-- Core ops + Array type
|
+-- ops.go <-- MatMul, Softmax, Add, etc.
+-- fast.go <-- Fused Metal kernels (RMSNorm, RoPE, SDPA)
+-- nn.go <-- Linear, Embedding, RMSNorm layers
+-- grad.go <-- VJP gradient computation
+-- lora.go <-- LoRA adapter
+-- optim.go <-- AdamW optimiser
|
v
CGO Bridge (mlx.go) <-- #cgo directives, C function calls
|
v
mlx-c v0.4.1 <-- C API for MLX (fetched by CMake)
|
v
Apple MLX <-- Metal GPU compute shaders
|
v
Metal / Accelerate <-- Apple Silicon GPU + CPU frameworks
Array Type
*mlx.Array is the fundamental data type. It wraps a C mlx_array handle and supports:
- Creation from Go slices (
NewArray) - GPU materialisation (
Materialize,MaterializeAsync) - Element-wise operations (
Add,Multiply,MatMul) - Shape manipulation (
Reshape,Transpose) - Data type casting (
AsType)
Arrays are lazily evaluated — operations build a computation graph that only executes when Materialize() is called.
Memory Model
- Arrays use
runtime.SetFinalizerfor C-side deallocation - No explicit
Close()method — relies on GC - Under sustained inference, GC pressure triggers cleanup
Materialize()forces synchronous GPU evaluation