diff --git a/TODO.md b/TODO.md index e45637b..b6ac5dd 100644 --- a/TODO.md +++ b/TODO.md @@ -92,10 +92,61 @@ Everything downstream is blocked on this. The old `backend_mlx.go` imports go-ml After Phase 1, both `ml.Backend` (string) and `inference.TextModel` (iterator) coexist. Reconcile. -- [ ] **Audit StreamingBackend usage** — Find all callers of `GenerateStream`/`ChatStream`. Determine which can migrate to `iter.Seq[Token]`. -- [ ] **Deprecate StreamingBackend** — Once all callers use go-inference iterators, mark StreamingBackend as deprecated. -- [ ] **Unify GenOpts** — `ml.GenOpts` and `inference.GenerateConfig` overlap. Add `convertOpts()` in Phase 1, consolidate into one struct later. -- [ ] **Unify Message types** — `ml.Message` and `inference.Message` are identical structs. Consider type alias or shared import. +### Audit Results (Virgil, 20 Feb 2026) + +**StreamingBackend callers** — Only 2 files in `host-uk/cli`: +- `cmd/ml/cmd_serve.go` lines 146,201,319: Type-asserts `backend.(ml.StreamingBackend)` for SSE streaming at `/v1/completions` and `/v1/chat/completions` +- `cmd/ml/cmd_chat.go`: Direct `ChatStream()` call for interactive terminal token echo + +All other consumers (service.go, judge.go, agent.go, expand.go, go-ai tools_ml.go) use `Backend.Generate()` — NOT streaming. + +**Backend implementations**: +- `InferenceAdapter` → implements Backend + StreamingBackend (via go-inference iter.Seq) +- `HTTPBackend` → implements Backend only (no streaming) +- `LlamaBackend` → implements Backend only (no streaming) + +### Step 2.1: Unify Message types + +- [ ] **Type alias ml.Message → inference.Message** — In `inference.go`, replace the `Message` struct with: + ```go + type Message = inference.Message + ``` + This is backward-compatible — all existing callers keep working. Remove the `convertMessages()` helper from `adapter.go` since types are now identical. Verify with `go build ./...` and `go test ./...`. + +### Step 2.2: Unify GenOpts + +- [ ] **Add inference fields to GenOpts** — Extend `ml.GenOpts` to include the extra fields from `inference.GenerateConfig`: + ```go + type GenOpts struct { + Temperature float64 + MaxTokens int + Model string // override model for this request + TopK int // NEW: from inference.GenerateConfig + TopP float64 // NEW: from inference.GenerateConfig (float64 to match Temperature) + RepeatPenalty float64 // NEW: from inference.GenerateConfig + } + ``` + Update `convertOpts()` in adapter.go to map the new fields. Existing callers that only set Temperature/MaxTokens/Model continue working unchanged. + +### Step 2.3: Deprecate StreamingBackend + +- [ ] **Mark StreamingBackend as deprecated** — Add deprecation comment: + ```go + // Deprecated: StreamingBackend is retained for backward compatibility. + // New code should use inference.TextModel with iter.Seq[Token] directly. + // See InferenceAdapter for the bridge pattern. + type StreamingBackend interface { ... } + ``` + Do NOT remove yet — `host-uk/cli` cmd_serve.go and cmd_chat.go still depend on it. Those migrations are out of scope for go-ml (they live in a different repo). + +### Step 2.4: Document migration path + +- [ ] **Update CLAUDE.md** — Add "Backend Architecture" section documenting: + - `inference.TextModel` (iterator-based) is the preferred API for new code + - `ml.Backend` (string-based) is the compatibility layer, still supported + - `StreamingBackend` is deprecated, use `iter.Seq[Token]` directly + - `InferenceAdapter` bridges TextModel → Backend/StreamingBackend + - `HTTPTextModel`/`LlamaTextModel` bridges Backend → TextModel (reverse direction) ---