go-mlx: register_metal.go implements inference.Backend (metalBackend + metalAdapter) go-rocm: register_rocm.go implements inference.Backend (rocmBackend, 5,794 LOC) go-ml: adapter.go bridges inference.TextModel → ml.Backend (118 LOC, 13 tests) Phase 3 (extended interfaces) deliberately deferred per design principles. Co-Authored-By: Virgil <virgil@lethean.io>
2.4 KiB
2.4 KiB
TODO.md — go-inference Task Queue
Dispatched from core/go orchestration. This package is minimal by design.
Phase 1: Foundation — d76448d (Charon)
- Add tests for option application — Verify GenerateConfig defaults, all With* options, ApplyGenerateOpts/ApplyLoadOpts behaviour. Comprehensive API tests (1,074 LOC).
- Add tests for backend registry — Register, Get, List, Default priority order, LoadModel routing.
- Add tests for Default() platform preference — Verify metal > rocm > llama_cpp ordering.
Phase 2: Integration — COMPLETE
- go-mlx migration —
register_metal.goimplementsinference.BackendviametalBackend{}+metalAdapter{}wrappinginternal/metal.Model. Auto-registers viainference.Register()ininit(). Build-taggeddarwin && arm64. Full TextModel coverage: Generate, Chat, Classify, BatchGenerate, Info, Metrics, Err, Close. - go-rocm implementation —
register_rocm.goimplementsinference.Backend+inference.TextModelvia llama-server subprocess. Auto-registers viainference.Register(&rocmBackend{}). Phase 4 complete (5,794 LOC by Charon). - go-ml migration —
adapter.gobridgesinference.TextModel→ml.Backend/StreamingBackend(118 LOC, 13 tests).backend_mlx.gocollapsed from 253 to 35 LOC usinginference.LoadModel.backend_http_textmodel.goprovides reverse wrappers (135 LOC, 19 tests).
Phase 3: Extended Interfaces (when needed)
- BatchModel interface — When go-i18n needs 5K sentences/sec, add:
type BatchModel interface { TextModel; BatchGenerate(ctx, []string, ...GenerateOption) iter.Seq2[int, Token] }. Not before it's needed. - Stats interface — When LEM Lab dashboard needs metrics:
type StatsModel interface { TextModel; Stats() GenerateStats }with tokens/sec, peak memory, GPU util.
Design Principles
- Minimal interface — Only add methods when 2+ consumers need them
- Zero dependencies — stdlib only, compiles everywhere
- Backwards compatible — New interfaces extend, never modify existing ones
- Platform agnostic — No build tags, no CGO, no OS-specific code
Workflow
- Virgil in core/go manages this package directly (too small for a dedicated Claude)
- Changes here are coordinated with go-mlx and go-rocm Claudes via their TODO.md
- New interface methods require Virgil approval before adding