diff --git a/Models.md b/Models.md new file mode 100644 index 0000000..e314322 --- /dev/null +++ b/Models.md @@ -0,0 +1,53 @@ +# Model Support + +## Supported Architectures + +| Model | File | Parameters | Notes | +|-------|------|-----------|-------| +| Gemma3 | `model/gemma3.go` | 1B, 4B, 27B | Google's open model family | +| Qwen3 | `model/qwen3.go` | 8B+ | Alibaba's open model family | + +## Model Interface + +All models implement: + +```go +type Model interface { + Forward(x *mlx.Array, cache *cache.KVCache) *mlx.Array +} +``` + +`model.LoadModel(path)` auto-detects the architecture from the config file and returns the appropriate implementation. + +## Generation + +```go +m, err := model.LoadModel("/path/to/model/") +if err != nil { + log.Fatal(err) +} + +// Token-by-token generation with sampling +tokens := m.Generate(promptTokens, model.GenerateOptions{ + MaxTokens: 512, + Temperature: 0.7, + TopK: 40, +}) +``` + +## Adding New Models + +1. Create `model/{name}.go` with `//go:build darwin && arm64` +2. Implement the `Model` interface +3. Register architecture detection in `model/model.go` +4. The model must use safetensors format + +## Tokenizer + +BPE tokenizer in `tokenizer/tokenizer.go` reads sentencepiece-format vocab files: + +```go +tok, err := tokenizer.Load("/path/to/tokenizer.model") +ids := tok.Encode("Hello world") +text := tok.Decode(ids) +```