From 421d0c42ffe68c6ebad720e86aa3f3af61d03c46 Mon Sep 17 00:00:00 2001
From: Claude <developers@lethean.io>
Date: Tue, 24 Feb 2026 13:51:21 +0000
Subject: [PATCH] docs: archive completed backend-abstraction and
 batch-inference plans

Move both plans to docs/plans/completed/ with summaries. MLX backend
implements shared interfaces and batch inference at 5K sentences/sec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 ...19-backend-abstraction-design-original.md} |  0
 ...2-19-backend-abstraction-plan-original.md} |  0
 ...-02-19-batch-inference-design-original.md} |  0
 docs/plans/completed/backend-abstraction.md   | 27 +++++++++++++++++
 docs/plans/completed/batch-inference.md       | 30 +++++++++++++++++++
 5 files changed, 57 insertions(+)
 rename docs/plans/{2026-02-19-backend-abstraction-design.md => completed/2026-02-19-backend-abstraction-design-original.md} (100%)
 rename docs/plans/{2026-02-19-backend-abstraction-plan.md => completed/2026-02-19-backend-abstraction-plan-original.md} (100%)
 rename docs/plans/{2026-02-19-batch-inference-design.md => completed/2026-02-19-batch-inference-design-original.md} (100%)
 create mode 100644 docs/plans/completed/backend-abstraction.md
 create mode 100644 docs/plans/completed/batch-inference.md

diff --git a/docs/plans/2026-02-19-backend-abstraction-design.md b/docs/plans/completed/2026-02-19-backend-abstraction-design-original.md
similarity index 100%
rename from docs/plans/2026-02-19-backend-abstraction-design.md
rename to docs/plans/completed/2026-02-19-backend-abstraction-design-original.md
diff --git a/docs/plans/2026-02-19-backend-abstraction-plan.md b/docs/plans/completed/2026-02-19-backend-abstraction-plan-original.md
similarity index 100%
rename from docs/plans/2026-02-19-backend-abstraction-plan.md
rename to docs/plans/completed/2026-02-19-backend-abstraction-plan-original.md
diff --git a/docs/plans/2026-02-19-batch-inference-design.md b/docs/plans/completed/2026-02-19-batch-inference-design-original.md
similarity index 100%
rename from docs/plans/2026-02-19-batch-inference-design.md
rename to docs/plans/completed/2026-02-19-batch-inference-design-original.md
diff --git a/docs/plans/completed/backend-abstraction.md b/docs/plans/completed/backend-abstraction.md
new file mode 100644
index 0000000..a5228a7
--- /dev/null
+++ b/docs/plans/completed/backend-abstraction.md
@@ -0,0 +1,27 @@
+# Backend Abstraction — Completion Summary
+
+**Completed:** 19 February 2026
+**Module:** `forge.lthn.ai/core/go-mlx`
+**Status:** Complete — shared go-inference interfaces, Metal auto-registration
+
+## What Was Built
+
+Migrated go-mlx to implement shared `go-inference` interfaces so it
+plugs into the unified ML backend system alongside HTTP and Llama backends.
+
+### Key changes
+
+- `InferenceAdapter` implements `inference.Backend` interface
+- Metal backend auto-registers via `init()` when CGo is available
+- `Result` struct carries text + `Metrics` (tokens, latency, tokens/sec)
+- Model loading, tokenization, and generation all behind interface methods
+
+### Architecture
+
+```
+go-ml (orchestrator)
+  → go-inference (interfaces)
+    → go-mlx (Metal/MLX backend, auto-registered)
+    → llama (llama.cpp backend)
+    → http (Ollama/OpenAI backend)
+```
diff --git a/docs/plans/completed/batch-inference.md b/docs/plans/completed/batch-inference.md
new file mode 100644
index 0000000..8b5989e
--- /dev/null
+++ b/docs/plans/completed/batch-inference.md
@@ -0,0 +1,30 @@
+# Batch Inference — Completion Summary
+
+**Completed:** 19 February 2026
+**Module:** `forge.lthn.ai/core/go-mlx`
+**Status:** Complete — 5K sentences/sec classification, integrated with go-i18n
+
+## What Was Built
+
+Added batch inference capabilities to the MLX backend for high-throughput
+classification and generation.
+
+### Components
+
+- **`Classify()`** — prefill-only mode for single-token classification
+  (domain labelling). No autoregressive generation needed.
+- **`BatchGenerate()`** — autoregressive batch generation with attention
+  masking for padded sequences in variable-length batches.
+- **Attention masking** — correct handling of padded batches so shorter
+  sequences don't attend to padding tokens.
+
+### Performance
+
+- 5,000 sentences/sec for classification on M3 Ultra (prefill-only)
+- Native Metal execution via Go→CGo→mlx-c pipeline
+
+### Integration
+
+Used by go-i18n 1B Pre-Sort Pipeline (Phase 2a) to batch-classify 88K
+seeds through Gemma3-1B at 80 prompts/sec (constrained by prompt
+construction, not inference).