diff --git a/Models.md b/Models.md
new file mode 100644
index 0000000..e314322
--- /dev/null
+++ b/Models.md
@@ -0,0 +1,53 @@
+# Model Support
+
+## Supported Architectures
+
+| Model | File | Parameters | Notes |
+|-------|------|-----------|-------|
+| Gemma3 | `model/gemma3.go` | 1B, 4B, 27B | Google's open model family |
+| Qwen3 | `model/qwen3.go` | 8B+ | Alibaba's open model family |
+
+## Model Interface
+
+All models implement:
+
+```go
+type Model interface {
+    Forward(x *mlx.Array, cache *cache.KVCache) *mlx.Array
+}
+```
+
+`model.LoadModel(path)` auto-detects the architecture from the config file and returns the appropriate implementation.
+
+## Generation
+
+```go
+m, err := model.LoadModel("/path/to/model/")
+if err != nil {
+    log.Fatal(err)
+}
+
+// Token-by-token generation with sampling
+tokens := m.Generate(promptTokens, model.GenerateOptions{
+    MaxTokens:   512,
+    Temperature: 0.7,
+    TopK:        40,
+})
+```
+
+## Adding New Models
+
+1. Create `model/{name}.go` with `//go:build darwin && arm64`
+2. Implement the `Model` interface
+3. Register architecture detection in `model/model.go`
+4. The model must use safetensors format
+
+## Tokenizer
+
+BPE tokenizer in `tokenizer/tokenizer.go` reads sentencepiece-format vocab files:
+
+```go
+tok, err := tokenizer.Load("/path/to/tokenizer.model")
+ids := tok.Encode("Hello world")
+text := tok.Decode(ids)
+```