Model Support

Supported Architectures

Model	File	Parameters	Notes
Gemma3	`model/gemma3.go`	1B, 4B, 27B	Google's open model family
Qwen3	`model/qwen3.go`	8B+	Alibaba's open model family

Model Interface

All models implement:

type Model interface {
    Forward(x *mlx.Array, cache *cache.KVCache) *mlx.Array
}

model.LoadModel(path) auto-detects the architecture from the config file and returns the appropriate implementation.

Generation

m, err := model.LoadModel("/path/to/model/")
if err != nil {
    log.Fatal(err)
}

// Token-by-token generation with sampling
tokens := m.Generate(promptTokens, model.GenerateOptions{
    MaxTokens:   512,
    Temperature: 0.7,
    TopK:        40,
})

Adding New Models

Create model/{name}.go with //go:build darwin && arm64
Implement the Model interface
Register architecture detection in model/model.go
The model must use safetensors format

Tokenizer

BPE tokenizer in tokenizer/tokenizer.go reads sentencepiece-format vocab files:

tok, err := tokenizer.Load("/path/to/tokenizer.model")
ids := tok.Encode("Hello world")
text := tok.Decode(ids)