go-mlx/docs
Snider ce1acef462 docs: batch inference API design (Phase 5)
Two new TextModel methods: Classify (prefill-only, fast path for
classification) and BatchGenerate (autoregressive, multi-prompt).
Adds attention masking for padded batches. Primary consumer: go-i18n
Phase 2a domain classification at ~5K sentences/sec.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:18:38 +00:00
..
plans docs: batch inference API design (Phase 5) 2026-02-19 23:18:38 +00:00