Memory controls handled by go-mlx internally. Integration smoke test covered by Phase 4 mock tests; real-model test deferred to LEM Lab. Co-Authored-By: Virgil <virgil@lethean.io>
11 KiB
TODO.md — go-ml Task Queue
Dispatched from Virgil in core/go. Pick up tasks in phase order.
Phase 1: go-inference Migration (CRITICAL PATH)
Everything downstream is blocked on this. The old backend_mlx.go imports go-mlx subpackages that no longer exist after Phase 4 refactoring.
Step 1.1: Add go-inference dependency
- Add
forge.lthn.ai/core/go-inferenceto go.mod — Already has areplacedirective pointing to../go-inference. Rungo get forge.lthn.ai/core/go-inferencethengo mod tidy. Verify the module resolves.
Step 1.2: Write the InferenceAdapter
-
Create
adapter.go— Bridge betweengo-inference.TextModel(returnsiter.Seq[Token]) andml.Backend+ml.StreamingBackend(returnsstring/callback). Must implement:Generate()— collect tokens from iterator into stringChat()— same, usingTextModel.Chat()GenerateStream()— forward tokens toTokenCallbackChatStream()— same for chatName()— delegate toTextModel.ModelType()Available()— always true (model already loaded)convertOpts(GenOpts) []inference.GenerateOption— mapGenOptsfields to functional options
Key mapping:
GenOpts.Temperature → inference.WithTemperature(float32(t)) GenOpts.MaxTokens → inference.WithMaxTokens(n) GenOpts.Model → (ignored, model already loaded)Error handling: After the iterator completes, check
model.Err()to distinguish EOS from errors (OOM, ctx cancelled). -
Test adapter.go — 13 test cases with mock TextModel (all pass). Test cases:
- Normal generation (collect tokens → string)
- Streaming (each token hits callback)
- Callback error stops iteration
- Context cancellation propagates
- Empty output (EOS immediately)
- Model error after partial output
Step 1.3: Rewrite backend_mlx.go
-
Replace backend_mlx.go — Deleted the 253 LOC that manually handle tokenisation, KV cache, sampling, and memory cleanup. Replaced with ~35 LOC:
//go:build darwin && arm64 package ml import ( "forge.lthn.ai/core/go-inference" _ "forge.lthn.ai/core/go-mlx" // registers "metal" backend ) func NewMLXBackend(modelPath string) (*InferenceAdapter, error) { m, err := inference.LoadModel(modelPath) if err != nil { return nil, fmt.Errorf("mlx: %w", err) } return &InferenceAdapter{model: m, name: "mlx"}, nil }The
InferenceAdapterfrom Step 1.2 handles all the Generate/Chat/Stream logic. -
Preserve memory controls — Deferred: go-mlx handles cache/memory limits internally. Callers can use
mlx.SetCacheLimit()/mlx.SetMemoryLimit()directly. No wrapper needed until a concrete use case arises. -
Test backend_mlx.go — Covered by Phase 4
backend_mlx_test.go(8 tests via mock TextModel). Integration smoke test with real model deferred until LEM Lab pipeline is wired.
Step 1.4: HTTPBackend and LlamaBackend wrappers
-
HTTPBackend go-inference wrapper —
backend_http_textmodel.go:HTTPTextModelwrapsHTTPBackendto implementinference.TextModel. Generate/Chat yield entire response as single Token. Classify returns unsupported error. BatchGenerate processes prompts sequentially. 17 tests pass. -
LlamaBackend go-inference wrapper —
backend_http_textmodel.go:LlamaTextModelembedsHTTPTextModel, overridesModelType()-> "llama" andClose()->llama.Stop(). 2 tests pass.
Step 1.5: Verify downstream consumers
- Service.Generate() still works —
service.gocallsBackend.Generate(). InferenceAdapter satisfies ml.Backend. HTTPBackend/LlamaBackend still implement ml.Backend directly. No changes needed. - Judge still works —
judge.gocallsBackend.Generate()viajudgeChat(). Same Backend contract, works as before. No changes needed. - go-ai tools_ml.go — Uses
ml.Servicedirectly.ml.Backendinterface is preserved, no code changes needed in go-ai.
Phase 2: Backend Consolidation
After Phase 1, both ml.Backend (string) and inference.TextModel (iterator) coexist. Reconcile.
Audit Results (Virgil, 20 Feb 2026)
StreamingBackend callers — Only 2 files in host-uk/cli:
cmd/ml/cmd_serve.golines 146,201,319: Type-assertsbackend.(ml.StreamingBackend)for SSE streaming at/v1/completionsand/v1/chat/completionscmd/ml/cmd_chat.go: DirectChatStream()call for interactive terminal token echo
All other consumers (service.go, judge.go, agent.go, expand.go, go-ai tools_ml.go) use Backend.Generate() — NOT streaming.
Backend implementations:
InferenceAdapter→ implements Backend + StreamingBackend (via go-inference iter.Seq)HTTPBackend→ implements Backend only (no streaming)LlamaBackend→ implements Backend only (no streaming)
Step 2.1: Unify Message types
- Type alias ml.Message → inference.Message — In
inference.go, replace theMessagestruct with:
This is backward-compatible — all existing callers keep working. Remove thetype Message = inference.MessageconvertMessages()helper fromadapter.gosince types are now identical. Verify withgo build ./...andgo test ./....
Step 2.2: Unify GenOpts
- Add inference fields to GenOpts — Extend
ml.GenOptsto include the extra fields frominference.GenerateConfig:
Updatetype GenOpts struct { Temperature float64 MaxTokens int Model string // override model for this request TopK int // NEW: from inference.GenerateConfig TopP float64 // NEW: from inference.GenerateConfig (float64 to match Temperature) RepeatPenalty float64 // NEW: from inference.GenerateConfig }convertOpts()in adapter.go to map the new fields. Existing callers that only set Temperature/MaxTokens/Model continue working unchanged.
Step 2.3: Deprecate StreamingBackend
- Mark StreamingBackend as deprecated — Add deprecation comment:
Do NOT remove yet —// Deprecated: StreamingBackend is retained for backward compatibility. // New code should use inference.TextModel with iter.Seq[Token] directly. // See InferenceAdapter for the bridge pattern. type StreamingBackend interface { ... }host-uk/clicmd_serve.go and cmd_chat.go still depend on it. Those migrations are out of scope for go-ml (they live in a different repo).
Step 2.4: Document migration path
- Update CLAUDE.md — Add "Backend Architecture" section documenting:
inference.TextModel(iterator-based) is the preferred API for new codeml.Backend(string-based) is the compatibility layer, still supportedStreamingBackendis deprecated, useiter.Seq[Token]directlyInferenceAdapterbridges TextModel → Backend/StreamingBackendHTTPTextModel/LlamaTextModelbridges Backend → TextModel (reverse direction)
Phase 3: Agent Loop Modernisation
agent.go (1,070 LOC) is the largest file with SSH, InfluxDB, scoring, and publishing mixed together. Decompose into focused files.
Step 3.1: Split agent.go into 5 files — COMPLETE
- Split
agent.go(1,070 LOC) into 5 focused files — Commiteae9ec9. Allgo build/test/vetpass:agent_config.go(97 LOC): AgentConfig, Checkpoint, BaseModelMap, ModelFamilies, AdapterMeta()agent_execute.go(215 LOC): RunAgentLoop, DiscoverCheckpoints, GetScoredLabels, FindUnscored, ProcessOne, isMLXNativeagent_eval.go(397 LOC): processMLXNative, processWithConversion, RunCapabilityProbes/Full, RunContentProbes, ProbeResult typesagent_influx.go(291 LOC): ScoreCapabilityAndPush, ScoreContentAndPush, PushCapability*, BufferInfluxResult, ReplayInfluxBufferagent_ssh.go(102 LOC): SSHCommand, SCPFrom, SCPTo, fileBase, EnvOr, IntEnvOr, ExpandHome
Step 3.2: Abstract SSH transport — COMPLETE
- RemoteTransport interface + SSHTransport — Commit
1c2a6a6. Interface with Run/CopyFrom/CopyTo, SSHTransport implementation with functional options (WithPort, WithTimeout). AgentConfig.Transport field with lazy init. All callers updated (DiscoverCheckpoints, processMLXNative, processWithConversion). Old SSHCommand/SCPFrom/SCPTo preserved as deprecated wrappers. Build/test/vet clean.
Step 3.3: Configurable infrastructure — COMPLETE
- Extract hardcoded values to constants — Commit
12f3a1c. 15 constants in agent_config.go: EpochBase, 5 InfluxDB measurements, 2 DuckDB tables, probe defaults (temp/maxTokens/truncation), InfluxBufferFile, LogSeparatorWidth, InterCheckpointDelay. Hardcoded probe counts replaced with len(). 7 files, build/test/vet clean.
Step 3.4: Agent tests — COMPLETE
- Test
AdapterMeta()— 8 tests: known families (12 entries), variant suffix, subdirectory patterns, unknown fallback, no-prefix edge case. Commit3e22761. - Test
FindUnscored()— 5 tests: all unscored (sorted), some scored, all scored, empty input, nil scored map. Commit3e22761. - Test
BufferInfluxResult()/ReplayInfluxBuffer()— 4 tests: JSONL round-trip, multiple entries, empty file, missing file. Commit3e22761. - Test
DiscoverCheckpoints()— 6 tests: happy path (3 checkpoints across 2 dirs), subdirectory pattern, no adapters, SSH error, filter pattern, no safetensors. UsesfakeTransportmock implementingRemoteTransport. Commit3e22761.
Phase 4: Test Coverage — COMPLETE
All 4 test files created and verified with go test -race ./.... Commit 09bf403.
- backend_llama_test.go — 20 tests via httptest mock: Name, Available (4 variants), Generate (6 variants incl. context cancellation, empty choices, opts forwarding), Chat (3 variants), Stop, constructor (4 variants), interface compliance.
- backend_mlx_test.go — 8 tests via mock TextModel (no build tag needed): Generate, Chat, Stream, ModelError, Close, ModelAccess, InterfaceCompliance, ConvertOpts.
- score_race_test.go — 6 race-condition tests: ConcurrentSemantic (20 responses, concurrency=4), ConcurrentMixedSuites (semantic+standard+content fan-out), SemaphoreBoundary (concurrency=1, verifies max concurrent==1), ContextCancellation (400 error→nil semantic), HeuristicOnlyNoRace (50 responses), MultiModelConcurrent (4 models×5 concurrent map writes).
- benchmark_test.go — 25 benchmarks: HeuristicScore (5 sizes: 25µs–8.8ms), ExactMatch (4 patterns: 171ns–2.1µs), JudgeExtractJSON (6 variants: 2.5–3.4µs), Judge round-trip (2 suites: ~52µs), ScoreAll (2 modes: 25µs–4.5ms), sub-components (5 heuristic stages: 244ns–88µs). Baselines on M3 Ultra.
Workflow
- Virgil in core/go writes tasks here after research
- This repo's session picks up tasks in phase order
- Mark
[x]when done, note commit hash - New discoveries → add tasks, note in FINDINGS.md
- Push to forge after each completed step:
git push forge main