Security Scan / security (push) Successful in 15s

Details

Test / Vet & Build (push) Successful in 50s

Details

docs: archive completed backend-abstraction and batch-inference plans

Move both plans to docs/plans/completed/ with summaries. MLX backend
implements shared interfaces and batch inference at 5K sentences/sec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-24 13:51:21 +00:00

860 B

Raw Blame History

Backend Abstraction — Completion Summary

Completed: 19 February 2026 Module: forge.lthn.ai/core/go-mlx Status: Complete — shared go-inference interfaces, Metal auto-registration

What Was Built

Migrated go-mlx to implement shared go-inference interfaces so it plugs into the unified ML backend system alongside HTTP and Llama backends.

Key changes

InferenceAdapter implements inference.Backend interface
Metal backend auto-registers via init() when CGo is available
Result struct carries text + Metrics (tokens, latency, tokens/sec)
Model loading, tokenization, and generation all behind interface methods

Architecture

go-ml (orchestrator)
  → go-inference (interfaces)
    → go-mlx (Metal/MLX backend, auto-registered)
    → llama (llama.cpp backend)
    → http (Ollama/OpenAI backend)

860 B Raw Blame History

Backend Abstraction — Completion Summary

What Was Built

Key changes

Architecture

860 B

Raw Blame History