Rewrites CLAUDE.md with full interface mapping (ml.Backend → inference.TextModel), adapter design pattern, broken import context, and local dependency paths. Expands TODO.md Phase 1 into 5 concrete steps with code patterns for InferenceAdapter, backend_mlx.go rewrite, and downstream verification. Updates FINDINGS.md with type correspondence table and before/after comparison. Fixes go.mod replace directives for ~/Code/ layout and adds go-inference. Co-Authored-By: Virgil <virgil@lethean.io>
6.3 KiB
TODO.md — go-ml Task Queue
Dispatched from Virgil in core/go. Pick up tasks in phase order.
Phase 1: go-inference Migration (CRITICAL PATH)
Everything downstream is blocked on this. The old backend_mlx.go imports go-mlx subpackages that no longer exist after Phase 4 refactoring.
Step 1.1: Add go-inference dependency
- Add
forge.lthn.ai/core/go-inferenceto go.mod — Already has areplacedirective pointing to../go-inference. Rungo get forge.lthn.ai/core/go-inferencethengo mod tidy. Verify the module resolves.
Step 1.2: Write the InferenceAdapter
-
Create
adapter.go— Bridge betweengo-inference.TextModel(returnsiter.Seq[Token]) andml.Backend+ml.StreamingBackend(returnsstring/callback). Must implement:Generate()— collect tokens from iterator into stringChat()— same, usingTextModel.Chat()GenerateStream()— forward tokens toTokenCallbackChatStream()— same for chatName()— delegate toTextModel.ModelType()Available()— always true (model already loaded)convertOpts(GenOpts) []inference.GenerateOption— mapGenOptsfields to functional options
Key mapping:
GenOpts.Temperature → inference.WithTemperature(float32(t)) GenOpts.MaxTokens → inference.WithMaxTokens(n) GenOpts.Model → (ignored, model already loaded)Error handling: After the iterator completes, check
model.Err()to distinguish EOS from errors (OOM, ctx cancelled). -
Test adapter.go — Test with a mock
inference.TextModelthat yields predetermined tokens. Test cases:- Normal generation (collect tokens → string)
- Streaming (each token hits callback)
- Callback error stops iteration
- Context cancellation propagates
- Empty output (EOS immediately)
- Model error after partial output
Step 1.3: Rewrite backend_mlx.go
-
Replace backend_mlx.go — Delete the 253 LOC that manually handle tokenisation, KV cache, sampling, and memory cleanup. Replace with ~60 LOC:
//go:build darwin && arm64 package ml import ( "forge.lthn.ai/core/go-inference" _ "forge.lthn.ai/core/go-mlx" // registers "metal" backend ) func NewMLXBackend(modelPath string) (*InferenceAdapter, error) { m, err := inference.LoadModel(modelPath) if err != nil { return nil, fmt.Errorf("mlx: %w", err) } return &InferenceAdapter{model: m, name: "mlx"}, nil }The
InferenceAdapterfrom Step 1.2 handles all the Generate/Chat/Stream logic. -
Preserve memory controls — The old
MLXBackendset cache/memory limits (16GB/24GB). These should be configurable. Options:- Accept memory limits in
NewMLXBackendparams - Or set them in
InferenceAdapterwrapper - go-mlx exposes
SetCacheLimit()/SetMemoryLimit()at package level
- Accept memory limits in
-
Test backend_mlx.go — Verify the new backend can:
- Load a model via go-inference registry
- Generate text (smoke test, requires model on disk)
- Stream tokens via callback
- Handle Metal availability check (build tag gating)
Step 1.4: HTTPBackend and LlamaBackend wrappers
-
HTTPBackend go-inference wrapper — HTTPBackend already works fine as
ml.Backend. For go-inference compatibility, write a thin wrapper that implementsinference.TextModel:Generate()calls HTTP API, yields entire response as single TokenChat()same- This is lower priority than MLX — HTTP backends don't need the full iter.Seq pattern
- Consider SSE streaming:
/v1/chat/completionswith"stream": truereturns SSE events that CAN be yielded asiter.Seq[Token]
-
LlamaBackend go-inference wrapper — LlamaBackend delegates to HTTPBackend already. Same treatment.
Step 1.5: Verify downstream consumers
- Service.Generate() still works —
service.gocallsBackend.Generate(). After migration, backends wrapped inInferenceAdaptermust still satisfyml.Backend. - Judge still works —
judge.gousesBackend.Generate()for LLM-as-judge. Verify scoring pipeline runs end-to-end. - go-ai tools_ml.go — Uses
ml.Servicedirectly. No code changes needed in go-ai ifml.Backendinterface is preserved.
Phase 2: Backend Consolidation
After Phase 1, both ml.Backend (string) and inference.TextModel (iterator) coexist. Reconcile.
- Audit StreamingBackend usage — Find all callers of
GenerateStream/ChatStream. Determine which can migrate toiter.Seq[Token]. - Deprecate StreamingBackend — Once all callers use go-inference iterators, mark StreamingBackend as deprecated.
- Unify GenOpts —
ml.GenOptsandinference.GenerateConfigoverlap. AddconvertOpts()in Phase 1, consolidate into one struct later. - Unify Message types —
ml.Messageandinference.Messageare identical structs. Consider type alias or shared import.
Phase 3: Agent Loop Modernisation
agent.go (1,070 LOC) is the largest file. Decompose.
- Split agent.go — Into:
agent_config.go(config, model maps),agent_execute.go(run loop, checkpoint processing),agent_eval.go(probe evaluation, result publishing),agent_influx.go(InfluxDB streaming, JSONL buffer). - Abstract SSH transport — Extract SSH checkpoint discovery into interface. Current M3 homelab SSH may change to Linux (go-rocm).
- Configurable endpoints —
10.69.69.165:8181and M3 SSH details hardcoded. Move to config/environment. - InfluxDB client — Hand-rolled line protocol. Evaluate official InfluxDB Go client.
Phase 4: Test Coverage
- backend_llama_test.go — Mock llama-server subprocess. Test: model loading, health checks, process lifecycle.
- backend_mlx_test.go — After Phase 1 rewrite, test with mock go-inference TextModel.
- score.go race tests —
go test -race ./.... Concurrent scoring, semaphore boundaries, context cancellation. - Benchmark suite —
BenchmarkHeuristic,BenchmarkJudge,BenchmarkExactfor various input sizes.
Workflow
- Virgil in core/go writes tasks here after research
- This repo's session picks up tasks in phase order
- Mark
[x]when done, note commit hash - New discoveries → add tasks, note in FINDINGS.md
- Push to forge after each completed step:
git push forge main