go-mlx/docs/plans
Snider ce1acef462 docs: batch inference API design (Phase 5)
Two new TextModel methods: Classify (prefill-only, fast path for
classification) and BatchGenerate (autoregressive, multi-prompt).
Adds attention masking for padded batches. Primary consumer: go-i18n
Phase 2a domain classification at ~5K sentences/sec.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:18:38 +00:00
..
2026-02-19-backend-abstraction-design.md fix(metal): address 4 minor code review items 2026-02-19 21:36:40 +00:00
2026-02-19-backend-abstraction-plan.md fix(metal): address 4 minor code review items 2026-02-19 21:36:40 +00:00
2026-02-19-batch-inference-design.md docs: batch inference API design (Phase 5) 2026-02-19 23:18:38 +00:00