docs: mark embed-friendly model loading complete in TODO

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Snider 2026-02-19 23:37:58 +00:00
parent dd49b4afb6
commit d7c8f176f0

View file

@ -48,7 +48,7 @@ Implementation plan: `docs/plans/2026-02-19-backend-abstraction-plan.md`
- [x] **Batch inference API** — ✅ `Classify` (prefill-only, fast path) and `BatchGenerate` (autoregressive) implemented. Added `ForwardMasked` to InternalModel interface, threaded attention masks through Gemma3 and Qwen3 decoders. Mask: `[N, 1, L, L]` combining causal + padding (0=attend, -inf=ignore). Right-padded, sorted by length. Gemma3-1B 4-bit: **152 prompts/s** classify (4 prompts), BatchGenerate produces coherent per-prompt output. Types (`ClassifyResult`, `BatchResult`, `WithLogits`) in go-inference. 6 new tests (3 mask unit, 3 model). Design doc: `docs/plans/2026-02-19-batch-inference-design.md`.
- [x] **Inference metrics** — ✅ `GenerateMetrics` type in go-inference with `Metrics()` on `TextModel`. Captures: prefill/decode timing, token counts, throughput (tok/s), peak and active GPU memory. Instrumented Generate, Classify, and BatchGenerate. Gemma3-1B 4-bit: prefill 246 tok/s, decode 82 tok/s, peak 6.2 GB. 1 new test.
- [x] **Model quantisation awareness** — ✅ `ModelInfo` type in go-inference with `Info()` on `TextModel`. Exposes architecture, vocab size, layer count, hidden dimension, quantisation bits and group size. Loader already handles quantised safetensors transparently. 1 new test.
- [ ] **Embed-friendly model loading** — Add `Discover(baseDir)` that scans for available models and returns metadata.
- [x] **Embed-friendly model loading** — ✅ `Discover(baseDir)` in go-inference scans for model directories (config.json + *.safetensors). Returns `DiscoveredModel` with path, architecture, quantisation, file count. Finds 20 models across the lab. 1 new test.
- [ ] **mlxlm/ backend** — Python subprocess wrapper via `core/go/pkg/process`. Implements `mlx.Backend` for mlx_lm compatibility.
## Phase 6: Go 1.26 Modernisation