Move both plans to docs/plans/completed/ with summaries. MLX backend implements shared interfaces and batch inference at 5K sentences/sec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
860 B
860 B
Backend Abstraction — Completion Summary
Completed: 19 February 2026
Module: forge.lthn.ai/core/go-mlx
Status: Complete — shared go-inference interfaces, Metal auto-registration
What Was Built
Migrated go-mlx to implement shared go-inference interfaces so it
plugs into the unified ML backend system alongside HTTP and Llama backends.
Key changes
InferenceAdapterimplementsinference.Backendinterface- Metal backend auto-registers via
init()when CGo is available Resultstruct carries text +Metrics(tokens, latency, tokens/sec)- Model loading, tokenization, and generation all behind interface methods
Architecture
go-ml (orchestrator)
→ go-inference (interfaces)
→ go-mlx (Metal/MLX backend, auto-registered)
→ llama (llama.cpp backend)
→ http (Ollama/OpenAI backend)