Implements inference.Backend via a Python subprocess communicating
over JSON Lines (stdin/stdout). No CGO required — pure Go + os/exec.
- bridge.py: embedded Python script wrapping mlx_lm.load() and
mlx_lm.stream_generate() with load/generate/chat/info/cancel/quit
commands. Flushes stdout after every JSON line for streaming.
- backend.go: Go subprocess manager. Extracts bridge.py from
go:embed to temp file, spawns python3, pipes JSON requests.
mlxlmModel implements full TextModel interface with mutex-
serialised Generate/Chat, context cancellation with drain,
and 2-second graceful Close with kill fallback.
Auto-registers as "mlx_lm" via init(). Build tag: !nomlxlm.
- backend_test.go: 15 tests using mock_bridge.py (no mlx_lm needed):
name, load, generate, cancel, chat, close, error propagation,
invalid path, auto-register, concurrent serialisation, classify/
batch unsupported, info, metrics, max_tokens limiting.
All tests pass with -race. go vet clean.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>