Two new TextModel methods: Classify (prefill-only, fast path for
classification) and BatchGenerate (autoregressive, multi-prompt).
Adds attention masking for padded batches. Primary consumer: go-i18n
Phase 2a domain classification at ~5K sentences/sec.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename New() → newArray() to signal internal-only intent (112 usages)
- Remove unused Collect() function and its test
- Fix discarded json.Unmarshal error in qwen3.go
- Document AsStrided stride formula in gemma3.go
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Virgil review items integrated:
- context.Context on Generate/Chat (required for HTTP cancellation)
- Err() error on TextModel (distinguish EOS from OOM)
- Chat() on TextModel (model owns its chat template)
- Memory control functions exposed at root package level
- Functional options convention confirmed
- pkg/process confirmed — no changes needed for mlxlm
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Approved design for restructuring go-mlx:
- Root package becomes clean interface (TextModel, LoadModel, Token)
- All CGO code moves to internal/metal/
- Deterministic memory management (Close + per-step cleanup)
- Error propagation instead of silent logging
- mlxlm/ backend placeholder for Python subprocess support
Includes API breaking change communication in FINDINGS.md and
memory management research tasks in cpp/TODO.md.
See: docs/plans/2026-02-19-backend-abstraction-design.md
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>