InferenceAdapter.Generate and Chat now return Result{Text, Metrics}
where Metrics is populated from the underlying TextModel.Metrics().
Co-Authored-By: Virgil <virgil@lethean.io>
- Replace Message struct with type alias for inference.Message (backward compat)
- Remove convertMessages() — types are now identical via alias
- Extend GenOpts with TopK, TopP, RepeatPenalty (mapped in convertOpts)
- Deprecate StreamingBackend with doc comment (only 2 callers, both in cli/)
- Simplify HTTPTextModel.Chat() — pass messages directly
- Update CLAUDE.md with Backend Architecture section
- Add 2 new tests, remove 1 obsolete test
Co-Authored-By: Virgil <virgil@lethean.io>
InferenceAdapter wraps inference.TextModel (iter.Seq[Token]) to satisfy
ml.Backend (string returns) and ml.StreamingBackend (TokenCallback).
- adapter.go: InferenceAdapter with Generate/Chat/Stream/Close
- adapter_test.go: 13 test cases with mock TextModel (all pass)
- backend_mlx.go: rewritten from 253 LOC to ~35 LOC using go-inference
- go.mod: add forge.lthn.ai/core/go-inference dependency
- TODO.md: mark Phase 1 steps 1.1-1.3 complete
Co-Authored-By: Virgil <virgil@lethean.io>