docs: comprehensive domain expert session brief for go-inference migration
Rewrites CLAUDE.md with full interface mapping (ml.Backend → inference.TextModel), adapter design pattern, broken import context, and local dependency paths. Expands TODO.md Phase 1 into 5 concrete steps with code patterns for InferenceAdapter, backend_mlx.go rewrite, and downstream verification. Updates FINDINGS.md with type correspondence table and before/after comparison. Fixes go.mod replace directives for ~/Code/ layout and adds go-inference. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
88e926cc24
commit
84757b8331
4 changed files with 370 additions and 146 deletions
214
CLAUDE.md
214
CLAUDE.md
|
|
@ -1,72 +1,162 @@
|
|||
# CLAUDE.md
|
||||
# CLAUDE.md — go-ml Domain Expert Guide
|
||||
|
||||
## What This Is
|
||||
You are a dedicated domain expert for `forge.lthn.ai/core/go-ml`. Virgil (in core/go) orchestrates your work via TODO.md. Pick up tasks in phase order, mark `[x]` when done, commit and push.
|
||||
|
||||
ML inference backends, scoring engine, and agent orchestrator. Module: `forge.lthn.ai/core/go-ml`
|
||||
## What This Package Does
|
||||
|
||||
Provides pluggable inference backends (MLX/Metal, llama.cpp, HTTP/Ollama), a multi-suite scoring engine with ethics-aware probes, GGUF model management, and a concurrent worker pipeline for batch evaluation.
|
||||
ML inference backends, scoring engine, and agent orchestrator. 7.5K LOC across 41 Go files. Provides:
|
||||
|
||||
- **Pluggable inference backends** — MLX/Metal (darwin/arm64), llama.cpp (subprocess), HTTP/Ollama (OpenAI-compatible)
|
||||
- **Multi-suite scoring engine** — Heuristic (regex), semantic (LLM judge), content (sovereignty probes), standard benchmarks (TruthfulQA, DoNotAnswer, Toxigen, GSM8K)
|
||||
- **23 capability probes** — Binary pass/fail tests across 16 categories (math, logic, code, etc.)
|
||||
- **GGUF model management** — Format parsing, conversion, inventory
|
||||
- **Agent orchestrator** — SSH checkpoint discovery, InfluxDB streaming, batch evaluation
|
||||
|
||||
## Critical Context: go-inference Migration
|
||||
|
||||
**This is the #1 priority.** Phase 1 in TODO.md.
|
||||
|
||||
The package currently defines its own `Backend` interface that returns `(string, error)`. The shared `go-inference` package defines `TextModel` which returns `iter.Seq[Token]` (Go 1.23+ range-over-func). Everything downstream is blocked until go-ml bridges these two interfaces.
|
||||
|
||||
### Interface Gap
|
||||
|
||||
```
|
||||
go-ml (CURRENT) go-inference (TARGET)
|
||||
───────────────── ─────────────────────
|
||||
Backend.Generate(ctx, prompt, GenOpts) TextModel.Generate(ctx, prompt, ...GenerateOption)
|
||||
→ (string, error) → iter.Seq[Token]
|
||||
|
||||
Backend.Chat(ctx, messages, GenOpts) TextModel.Chat(ctx, messages, ...GenerateOption)
|
||||
→ (string, error) → iter.Seq[Token]
|
||||
|
||||
StreamingBackend.GenerateStream( (streaming is built-in via iter.Seq)
|
||||
ctx, prompt, opts, TokenCallback)
|
||||
→ error
|
||||
|
||||
GenOpts{Temperature, MaxTokens, Model} GenerateConfig{MaxTokens, Temperature,
|
||||
TopK, TopP, StopTokens, RepeatPenalty}
|
||||
(configured via WithMaxTokens(n) etc.)
|
||||
```
|
||||
|
||||
### What the Adapter Must Do
|
||||
|
||||
```go
|
||||
// InferenceAdapter wraps go-inference.TextModel to satisfy ml.Backend + ml.StreamingBackend.
|
||||
// This is the bridge between the new iterator-based API and the legacy string-return API.
|
||||
type InferenceAdapter struct {
|
||||
model inference.TextModel
|
||||
}
|
||||
|
||||
// Generate collects all tokens from the iterator into a string.
|
||||
func (a *InferenceAdapter) Generate(ctx context.Context, prompt string, opts GenOpts) (string, error) {
|
||||
genOpts := convertOpts(opts) // GenOpts → []inference.GenerateOption
|
||||
var buf strings.Builder
|
||||
for tok := range a.model.Generate(ctx, prompt, genOpts...) {
|
||||
buf.WriteString(tok.Text)
|
||||
}
|
||||
if err := a.model.Err(); err != nil {
|
||||
return buf.String(), err
|
||||
}
|
||||
return buf.String(), nil
|
||||
}
|
||||
|
||||
// GenerateStream yields tokens to the callback as they arrive.
|
||||
func (a *InferenceAdapter) GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error {
|
||||
genOpts := convertOpts(opts)
|
||||
for tok := range a.model.Generate(ctx, prompt, genOpts...) {
|
||||
if err := cb(tok.Text); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return a.model.Err()
|
||||
}
|
||||
```
|
||||
|
||||
### backend_mlx.go Is Broken
|
||||
|
||||
After go-mlx Phase 4, the old subpackage imports no longer exist:
|
||||
- `forge.lthn.ai/core/go-mlx/cache` — **REMOVED** (now `internal/metal`)
|
||||
- `forge.lthn.ai/core/go-mlx/model` — **REMOVED** (now `internal/metal`)
|
||||
- `forge.lthn.ai/core/go-mlx/sample` — **REMOVED** (now `internal/metal`)
|
||||
- `forge.lthn.ai/core/go-mlx/tokenizer` — **REMOVED** (now `internal/metal`)
|
||||
|
||||
The new go-mlx public API is:
|
||||
```go
|
||||
import (
|
||||
"forge.lthn.ai/core/go-inference"
|
||||
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
|
||||
)
|
||||
|
||||
m, err := inference.LoadModel("/path/to/model/", inference.WithContextLen(4096))
|
||||
defer m.Close()
|
||||
for tok := range m.Generate(ctx, "prompt", inference.WithMaxTokens(128)) {
|
||||
fmt.Print(tok.Text)
|
||||
}
|
||||
```
|
||||
|
||||
**The rewrite**: Delete the 253 LOC of manual tokenisation/KV cache/sampling. Replace with ~60 LOC that loads via go-inference and wraps in `InferenceAdapter`.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
go test ./... # Run all tests
|
||||
go test ./... # Run all tests (some will fail until Phase 1)
|
||||
go test -v -run TestHeuristic # Single test
|
||||
go test -bench=. ./... # Benchmarks
|
||||
go test -bench=. ./... # Benchmarks (none exist yet)
|
||||
go test -race ./... # Race detector
|
||||
go vet ./... # Static analysis
|
||||
```
|
||||
|
||||
**Note**: `backend_mlx.go` won't compile until rewritten (Phase 1). Use build tags to skip:
|
||||
```bash
|
||||
go test -tags '!darwin' ./... # Skip MLX tests on non-darwin
|
||||
```
|
||||
|
||||
## Local Dependencies
|
||||
|
||||
All resolve via `replace` directives in go.mod:
|
||||
|
||||
| Module | Local Path | Notes |
|
||||
|--------|-----------|-------|
|
||||
| `forge.lthn.ai/core/go` | `../host-uk/core` | Framework (ServiceRuntime, process, log) |
|
||||
| `forge.lthn.ai/core/go-mlx` | `../go-mlx` | Metal GPU backend (darwin/arm64 only) |
|
||||
| `forge.lthn.ai/core/go-inference` | `../go-inference` | Shared TextModel/Backend interfaces |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Backends (pluggable inference)
|
||||
|
||||
| File | Backend | Notes |
|
||||
|------|---------|-------|
|
||||
| `backend_mlx.go` | MLX/Metal GPU | Native Apple Silicon via go-mlx (darwin/arm64 only) |
|
||||
| `backend_llama.go` | llama.cpp | GGUF models via subprocess |
|
||||
| `backend_http.go` | HTTP API | Generic (Ollama, vLLM, OpenAI-compatible) |
|
||||
| `ollama.go` | Ollama helpers | Ollama-specific client utilities |
|
||||
| File | Backend | Status |
|
||||
|------|---------|--------|
|
||||
| `backend_mlx.go` | MLX/Metal GPU | **BROKEN** — old imports, needs Phase 1 rewrite |
|
||||
| `backend_llama.go` | llama-server subprocess | Works, needs go-inference wrapper |
|
||||
| `backend_http.go` | HTTP API (OpenAI-compatible) | Works, needs go-inference wrapper |
|
||||
| `ollama.go` | Ollama helpers | Works |
|
||||
|
||||
### Scoring Engine
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `score.go` | Main scoring orchestrator |
|
||||
| `heuristic.go` | Fast rule-based scoring (no LLM needed) |
|
||||
| `judge.go` | LLM-as-judge evaluator |
|
||||
| `exact.go` | Exact match scoring (GSM8K-style) |
|
||||
| `probes.go` | Ethics-aware evaluation probes |
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `score.go` | 212 | Concurrent scoring orchestrator (semaphore-bounded workers) |
|
||||
| `heuristic.go` | 258 | 9 regex-based metrics, LEK composite score |
|
||||
| `judge.go` | 205 | LLM-as-judge (6 scoring methods) |
|
||||
| `exact.go` | 77 | GSM8K exact-match with numeric extraction |
|
||||
| `probes.go` | 273 | 23 binary capability probes across 16 categories |
|
||||
|
||||
### Data Pipeline
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `agent.go` (1,070 LOC) | LLM agent orchestrator (largest file) |
|
||||
| `worker.go` | Concurrent worker pool for multi-model scoring |
|
||||
| `ingest.go` | Bulk data ingestion |
|
||||
| `import_all.go` | Import orchestration |
|
||||
| `gguf.go` | GGUF model handling and inventory |
|
||||
| `convert.go` | Model format conversion |
|
||||
| `db.go` | DuckDB storage layer |
|
||||
| `parquet.go` | Parquet I/O |
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `agent.go` | 1,070 | Scoring agent (SSH checkpoint discovery, InfluxDB) |
|
||||
| `worker.go` | 403 | LEM API worker for distributed inference |
|
||||
| `service.go` | 162 | Core framework integration (lifecycle, backend registry) |
|
||||
| `ingest.go` | 384 | JSONL response loading |
|
||||
| `db.go` | 258 | DuckDB analytics storage |
|
||||
| `gguf.go` | 369 | GGUF model format parsing |
|
||||
|
||||
### Monitoring
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `metrics.go` | Metrics tracking |
|
||||
| `influx.go` | InfluxDB integration |
|
||||
| `status.go` | Status reporting |
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `forge.lthn.ai/core/go` — Framework (ServiceRuntime, process, log)
|
||||
- `forge.lthn.ai/core/go-mlx` — Native Metal GPU inference
|
||||
- `github.com/marcboeker/go-duckdb` — Embedded analytics DB
|
||||
- `github.com/parquet-go/parquet-go` — Columnar data format
|
||||
|
||||
## Key Interfaces
|
||||
### Key Types
|
||||
|
||||
```go
|
||||
// Backend — pluggable inference
|
||||
// Current backend interface (inference.go)
|
||||
type Backend interface {
|
||||
Generate(ctx context.Context, prompt string, opts GenOpts) (string, error)
|
||||
Chat(ctx context.Context, messages []Message, opts GenOpts) (string, error)
|
||||
|
|
@ -74,23 +164,39 @@ type Backend interface {
|
|||
Available() bool
|
||||
}
|
||||
|
||||
// StreamingBackend — extends Backend with token streaming
|
||||
type StreamingBackend interface {
|
||||
Backend
|
||||
GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error
|
||||
ChatStream(ctx context.Context, messages []Message, opts GenOpts, cb TokenCallback) error
|
||||
}
|
||||
|
||||
type GenOpts struct {
|
||||
Temperature float64
|
||||
MaxTokens int
|
||||
Model string
|
||||
}
|
||||
|
||||
type Message struct {
|
||||
Role string `json:"role"`
|
||||
Content string `json:"content"`
|
||||
}
|
||||
```
|
||||
|
||||
## Coding Standards
|
||||
|
||||
- UK English
|
||||
- Tests: testify assert/require
|
||||
- Conventional commits
|
||||
- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
|
||||
- Licence: EUPL-1.2
|
||||
- **UK English**: colour, organisation, centre
|
||||
- **Tests**: testify assert/require (existing), Pest-style names welcome for new tests
|
||||
- **Conventional commits**: `feat(backend):`, `fix(scoring):`, `refactor(mlx):`
|
||||
- **Co-Author**: `Co-Authored-By: Virgil <virgil@lethean.io>`
|
||||
- **Licence**: EUPL-1.2
|
||||
- **Imports**: stdlib → forge.lthn.ai → third-party, each group separated by blank line
|
||||
|
||||
## Forge
|
||||
|
||||
- **Repo**: `forge.lthn.ai/core/go-ml`
|
||||
- **Push via SSH**: `git push forge main` (remote: `ssh://git@forge.lthn.ai:2223/core/go-ml.git`)
|
||||
|
||||
## Task Queue
|
||||
|
||||
See `TODO.md` for prioritised work.
|
||||
See `FINDINGS.md` for research notes.
|
||||
See `TODO.md` for prioritised work. Phase 1 (go-inference migration) is the critical path.
|
||||
See `FINDINGS.md` for research notes and interface mapping.
|
||||
|
|
|
|||
138
FINDINGS.md
138
FINDINGS.md
|
|
@ -15,6 +15,7 @@ Split from go-ai on 19 Feb 2026. Was `ai/ml/` subpackage inside `forge.lthn.ai/c
|
|||
### Dependencies
|
||||
|
||||
- `forge.lthn.ai/core/go-mlx` — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
|
||||
- `forge.lthn.ai/core/go-inference` — Shared TextModel/Backend/Token interfaces (target for Phase 1)
|
||||
- `forge.lthn.ai/core/go` — Framework services, process management, logging
|
||||
- `github.com/marcboeker/go-duckdb` — Analytics storage
|
||||
- `github.com/parquet-go/parquet-go` — Columnar data I/O
|
||||
|
|
@ -22,79 +23,108 @@ Split from go-ai on 19 Feb 2026. Was `ai/ml/` subpackage inside `forge.lthn.ai/c
|
|||
|
||||
### Consumers
|
||||
|
||||
- `go-ai/mcp/tools_ml.go` — Exposes ML as MCP tools
|
||||
- `go-ai/test-mlx.go` — Integration test utility
|
||||
- `go-ai/mcp/tools_ml.go` — Exposes ML as MCP tools (uses `ml.Service`, `ml.GenOpts`, `ml.Backend`)
|
||||
- LEM Lab — Uses MLXBackend for chat inference
|
||||
- go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)
|
||||
|
||||
## Architecture
|
||||
## go-inference Interface Mapping
|
||||
|
||||
### Backend Interface
|
||||
### Type Correspondence
|
||||
|
||||
| go-ml | go-inference | Notes |
|
||||
|-------|-------------|-------|
|
||||
| `ml.Backend` | `inference.Backend` | Different semantics: ml returns string, inference returns TextModel |
|
||||
| `ml.StreamingBackend` | (built into TextModel) | iter.Seq[Token] is inherently streaming |
|
||||
| `ml.GenOpts` | `inference.GenerateConfig` | Use functional options: `WithMaxTokens(n)` etc. |
|
||||
| `ml.Message` | `inference.Message` | Identical struct: Role + Content |
|
||||
| `ml.TokenCallback` | (not needed) | iter.Seq[Token] replaces callbacks |
|
||||
| (no equivalent) | `inference.Token` | `{ID int32, Text string}` |
|
||||
| (no equivalent) | `inference.TextModel` | Generate/Chat return iter.Seq[Token] |
|
||||
|
||||
### Method Mapping
|
||||
|
||||
```
|
||||
ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
|
||||
↕ InferenceAdapter collects tokens
|
||||
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
|
||||
|
||||
ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
|
||||
↕ InferenceAdapter forwards tokens to callback
|
||||
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
|
||||
|
||||
ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
|
||||
↕ convertOpts helper
|
||||
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)
|
||||
```
|
||||
|
||||
### backend_mlx.go Before/After
|
||||
|
||||
**Before** (253 LOC — BROKEN, old subpackage imports):
|
||||
```go
|
||||
type Backend interface {
|
||||
Generate(ctx context.Context, prompt string, opts GenOpts) (string, error)
|
||||
Chat(ctx context.Context, messages []Message, opts GenOpts) (string, error)
|
||||
Name() string
|
||||
Available() bool
|
||||
}
|
||||
import (
|
||||
"forge.lthn.ai/core/go-mlx"
|
||||
"forge.lthn.ai/core/go-mlx/cache" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/model" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/sample" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
|
||||
)
|
||||
|
||||
type StreamingBackend interface {
|
||||
Backend
|
||||
GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error
|
||||
ChatStream(ctx context.Context, messages []Message, opts GenOpts, cb TokenCallback) error
|
||||
type MLXBackend struct {
|
||||
model model.Model
|
||||
tok *tokenizer.Tokenizer
|
||||
caches []cache.Cache
|
||||
sampler sample.Sampler
|
||||
// ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
|
||||
}
|
||||
```
|
||||
|
||||
Key design: `Backend.Generate` returns `string`, not `iter.Seq[Token]`. `StreamingBackend` adds token callbacks but is still callback-based, not iterator-based.
|
||||
**After** (~60 LOC — uses go-inference + InferenceAdapter):
|
||||
```go
|
||||
import (
|
||||
"forge.lthn.ai/core/go-inference"
|
||||
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
|
||||
)
|
||||
|
||||
### Scoring Engine
|
||||
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
|
||||
m, err := inference.LoadModel(modelPath)
|
||||
if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
|
||||
return &InferenceAdapter{model: m, name: "mlx"}, nil
|
||||
}
|
||||
```
|
||||
|
||||
Concurrent scoring with semaphore-bounded workers. `Engine` fans out suites across goroutines, collects results.
|
||||
All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's `internal/metal/` package, accessed through the go-inference `TextModel` interface.
|
||||
|
||||
**Heuristic suite** (9 metrics): refusal detection, length ratio, repetition, coherence, instruction following, format compliance, language match, confidence calibration, response diversity.
|
||||
## Scoring Engine Architecture
|
||||
|
||||
**Semantic suite** (4 dimensions): LLM-as-judge scoring across helpfulness, accuracy, harmlessness, and reasoning quality.
|
||||
### 5 Suites
|
||||
|
||||
**Content suite** (6 probes): sovereignty probes testing model behaviour on sensitive topics — political bias, cultural sensitivity, factual grounding, source attribution, opinion vs fact distinction, regional awareness.
|
||||
| Suite | Method | LLM needed? | Metrics |
|
||||
|-------|--------|-------------|---------|
|
||||
| **Heuristic** | Regex + word analysis | No | 9 metrics → LEK composite |
|
||||
| **Semantic** | LLM-as-judge | Yes | 4 dimensions (sovereignty, ethical, creative, self-concept) |
|
||||
| **Content** | LLM-as-judge | Yes | 6 sovereignty probes (CCP, truth, engagement, etc.) |
|
||||
| **Standard** | LLM-as-judge | Yes | TruthfulQA, DoNotAnswer, Toxigen |
|
||||
| **Exact** | Numeric extraction | No | GSM8K answer matching |
|
||||
|
||||
**Standard suite** (4 benchmarks): TruthfulQA (truthfulness), DoNotAnswer (safety refusals), Toxigen (toxicity detection), GSM8K (mathematical reasoning).
|
||||
### LEK Score Formula
|
||||
|
||||
**Exact suite** (GSM8K numeric): Extracts numeric answers from model output and compares against ground truth with tolerance.
|
||||
```
|
||||
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
|
||||
- ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
|
||||
```
|
||||
|
||||
### 23 Capability Probes
|
||||
Positive signals: engagement depth, creative form, emotional register, first-person voice.
|
||||
Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.
|
||||
|
||||
16 categories covering: reasoning, mathematics, coding, instruction following, multilingual, summarisation, creative writing, factual recall, safety, ethics, roleplay, context length, tool use, multimodal description, structured output, and chain-of-thought.
|
||||
### Concurrency Model
|
||||
|
||||
### InfluxDB Integration
|
||||
|
||||
- Endpoint: `10.69.69.165:8181`
|
||||
- Database: `training`
|
||||
- Protocol: Line protocol writes (hand-rolled, no official client)
|
||||
- Purpose: Streaming checkpoint scores during agent evaluation runs
|
||||
|
||||
### Data Pipeline
|
||||
|
||||
DuckDB for local analytics storage, Parquet for columnar I/O, InfluxDB for time-series streaming. GGUF converter handles MLX LoRA to GGUF tensor name mapping for model format conversion.
|
||||
|
||||
## go-inference Gap
|
||||
|
||||
This is the critical finding driving Phase 1.
|
||||
|
||||
**go-ml has**: `ml.Backend` interface where `Generate` returns `(string, error)`. Callback-based streaming via `StreamingBackend`.
|
||||
|
||||
**go-inference has**: `TextModel` interface where `Generate` returns `iter.Seq[Token]`. Iterator-based streaming (Go 1.23+ range-over-func).
|
||||
|
||||
**Gap**: No adapter between the two. `backend_mlx.go` imports go-mlx directly (~253 LOC of manual tokenisation, KV cache, sampling) instead of using go-inference which wraps all of that. This means:
|
||||
1. MLX backend duplicates logic that go-inference already provides
|
||||
2. Other backends (HTTP, Llama) cannot benefit from go-inference's unified interface
|
||||
3. Scoring engine is locked to the legacy string-return interface
|
||||
|
||||
**Solution**: Write `InferenceAdapter` bridging `go-inference.TextModel` to `ml.Backend`, then rewrite `backend_mlx.go` to use go-inference. This is Phase 1 in TODO.md.
|
||||
`Engine.ScoreAll()` fans out goroutines bounded by semaphore (`concurrency` setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with `sync.WaitGroup`. Results collected into `[]PromptScore` via mutex.
|
||||
|
||||
## Known Issues
|
||||
|
||||
- **backend_mlx.go imports go-mlx directly** — Should go through go-inference. ~253 LOC that collapses to ~60 LOC after migration.
|
||||
- **agent.go is too large** — 1,070 LOC handling SSH, InfluxDB, scoring orchestration, and result publishing. Decomposition candidate.
|
||||
- **Hardcoded infrastructure** — InfluxDB endpoint (`10.69.69.165:8181`), M3 SSH details baked into agent.go. Should be configurable.
|
||||
- **No tests for backend_llama and backend_mlx** — Only backend_http_test.go exists for backends.
|
||||
- **score.go concurrency untested** — Semaphore-bounded worker pool has no race condition tests.
|
||||
- **backend_mlx.go imports dead subpackages** — Blocked on Phase 1 migration
|
||||
- **agent.go too large** — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
|
||||
- **Hardcoded infrastructure** — InfluxDB endpoint `10.69.69.165:8181`, M3 SSH details in agent.go
|
||||
- **No tests for backend_llama and backend_mlx** — Only backend_http_test.go exists
|
||||
- **score.go concurrency untested** — No race condition tests
|
||||
- **Message type duplication** — `ml.Message` and `inference.Message` are identical but separate
|
||||
|
|
|
|||
160
TODO.md
160
TODO.md
|
|
@ -1,46 +1,132 @@
|
|||
# TODO.md — go-ml Task Queue
|
||||
|
||||
## Phase 1: go-inference Migration
|
||||
|
||||
The big one. `backend_mlx.go` needs rewriting to use `go-inference.TextModel` instead of direct go-mlx imports. This collapses ~253 LOC to ~60 LOC.
|
||||
|
||||
- [ ] **Rewrite backend_mlx.go** — Replace direct go-mlx calls with go-inference TextModel. The current implementation manually handles tokenisation, KV cache, sampling, and token decoding. go-inference wraps all of that behind `TextModel.Generate()` returning `iter.Seq[Token]`.
|
||||
- [ ] **HTTPBackend go-inference wrapper** — HTTPBackend should implement `go-inference.Backend` or wrap it. Currently returns `(string, error)` from Generate; needs an adapter that yields `iter.Seq[Token]` from SSE streams.
|
||||
- [ ] **LlamaBackend go-inference wrapper** — Same treatment as HTTPBackend. llama-server already supports SSE streaming; the adapter reads the stream and yields tokens.
|
||||
- [ ] **Bridge ml.Backend to go-inference** — The old `ml.Backend` interface (`Generate` returns `string`, not `iter.Seq[Token]`) needs a bridging adapter. Write `InferenceAdapter` that wraps `go-inference.TextModel` and collects tokens into a string for the legacy interface.
|
||||
|
||||
## Phase 2: Backend Consolidation
|
||||
|
||||
`StreamingBackend` vs `go-inference.TextModel` overlap. Reconcile: go-inference is the standard, `ml.Backend` is legacy.
|
||||
|
||||
- [ ] **Audit StreamingBackend usage** — Find all callers of `GenerateStream`/`ChatStream`. Determine which can migrate directly to `iter.Seq[Token]`.
|
||||
- [ ] **Migration path** — Keep both interfaces temporarily. Add `BackendAdapter` that wraps go-inference.TextModel and satisfies both `ml.Backend` and `StreamingBackend`.
|
||||
- [ ] **Deprecate StreamingBackend** — Once all callers use go-inference iterators, mark StreamingBackend as deprecated. Remove in a later phase.
|
||||
- [ ] **Unify GenOpts** — `ml.GenOpts` and `go-inference.GenerateOptions` likely overlap. Consolidate into one options struct or add conversion helpers.
|
||||
|
||||
## Phase 3: Agent Loop Modernisation
|
||||
|
||||
`agent.go` (1,070 LOC) is the largest file. SSH checkpoint discovery, InfluxDB streaming. Needs splitting into smaller files.
|
||||
|
||||
- [ ] **Split agent.go** — Decompose into: `agent_config.go` (SSH/infra config), `agent_execute.go` (scoring run orchestration), `agent_eval.go` (result evaluation and publishing), `agent_influx.go` (InfluxDB streaming).
|
||||
- [ ] **Abstract SSH transport** — M3 homelab SSH may change to Linux. Extract SSH checkpoint discovery into an interface so the transport layer is swappable.
|
||||
- [ ] **InfluxDB client modernisation** — Current line protocol writes are hand-rolled. Evaluate using the official InfluxDB Go client library.
|
||||
- [ ] **Configurable endpoints** — Hardcoded `10.69.69.165:8181` and M3 SSH details should come from config/environment, not constants.
|
||||
|
||||
## Phase 4: Test Coverage
|
||||
|
||||
`backend_http_test` exists but `backend_llama` and `backend_mlx` have no tests. `score.go` concurrency needs race condition tests.
|
||||
|
||||
- [ ] **backend_llama_test.go** — Mock llama-server subprocess. Test: model loading, prompt formatting, streaming, error recovery, process lifecycle.
|
||||
- [ ] **backend_mlx_test.go** — Mock go-mlx (or go-inference after Phase 1). Test: darwin/arm64 gating, Metal availability check, generation flow, tokeniser errors.
|
||||
- [ ] **score.go race tests** — Run `go test -race ./...`. Add concurrent scoring tests: multiple suites running simultaneously, semaphore boundary conditions, context cancellation mid-score.
|
||||
- [ ] **Benchmark suite** — Add `BenchmarkHeuristic`, `BenchmarkJudge`, `BenchmarkExact` for various input sizes. No benchmarks exist currently.
|
||||
Dispatched from Virgil in core/go. Pick up tasks in phase order.
|
||||
|
||||
---
|
||||
|
||||
## Standing: Workflow
|
||||
## Phase 1: go-inference Migration (CRITICAL PATH)
|
||||
|
||||
Everything downstream is blocked on this. The old `backend_mlx.go` imports go-mlx subpackages that no longer exist after Phase 4 refactoring.
|
||||
|
||||
### Step 1.1: Add go-inference dependency
|
||||
|
||||
- [ ] **Add `forge.lthn.ai/core/go-inference` to go.mod** — Already has a `replace` directive pointing to `../go-inference`. Run `go get forge.lthn.ai/core/go-inference` then `go mod tidy`. Verify the module resolves.
|
||||
|
||||
### Step 1.2: Write the InferenceAdapter
|
||||
|
||||
- [ ] **Create `adapter.go`** — Bridge between `go-inference.TextModel` (returns `iter.Seq[Token]`) and `ml.Backend` + `ml.StreamingBackend` (returns `string`/callback). Must implement:
|
||||
- `Generate()` — collect tokens from iterator into string
|
||||
- `Chat()` — same, using `TextModel.Chat()`
|
||||
- `GenerateStream()` — forward tokens to `TokenCallback`
|
||||
- `ChatStream()` — same for chat
|
||||
- `Name()` — delegate to `TextModel.ModelType()`
|
||||
- `Available()` — always true (model already loaded)
|
||||
- `convertOpts(GenOpts) []inference.GenerateOption` — map `GenOpts` fields to functional options
|
||||
|
||||
**Key mapping**:
|
||||
```
|
||||
GenOpts.Temperature → inference.WithTemperature(float32(t))
|
||||
GenOpts.MaxTokens → inference.WithMaxTokens(n)
|
||||
GenOpts.Model → (ignored, model already loaded)
|
||||
```
|
||||
|
||||
**Error handling**: After the iterator completes, check `model.Err()` to distinguish EOS from errors (OOM, ctx cancelled).
|
||||
|
||||
- [ ] **Test adapter.go** — Test with a mock `inference.TextModel` that yields predetermined tokens. Test cases:
|
||||
- Normal generation (collect tokens → string)
|
||||
- Streaming (each token hits callback)
|
||||
- Callback error stops iteration
|
||||
- Context cancellation propagates
|
||||
- Empty output (EOS immediately)
|
||||
- Model error after partial output
|
||||
|
||||
### Step 1.3: Rewrite backend_mlx.go
|
||||
|
||||
- [ ] **Replace backend_mlx.go** — Delete the 253 LOC that manually handle tokenisation, KV cache, sampling, and memory cleanup. Replace with ~60 LOC:
|
||||
```go
|
||||
//go:build darwin && arm64
|
||||
|
||||
package ml
|
||||
|
||||
import (
|
||||
"forge.lthn.ai/core/go-inference"
|
||||
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend
|
||||
)
|
||||
|
||||
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
|
||||
m, err := inference.LoadModel(modelPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("mlx: %w", err)
|
||||
}
|
||||
return &InferenceAdapter{model: m, name: "mlx"}, nil
|
||||
}
|
||||
```
|
||||
The `InferenceAdapter` from Step 1.2 handles all the Generate/Chat/Stream logic.
|
||||
|
||||
- [ ] **Preserve memory controls** — The old `MLXBackend` set cache/memory limits (16GB/24GB). These should be configurable. Options:
|
||||
- Accept memory limits in `NewMLXBackend` params
|
||||
- Or set them in `InferenceAdapter` wrapper
|
||||
- go-mlx exposes `SetCacheLimit()` / `SetMemoryLimit()` at package level
|
||||
|
||||
- [ ] **Test backend_mlx.go** — Verify the new backend can:
|
||||
- Load a model via go-inference registry
|
||||
- Generate text (smoke test, requires model on disk)
|
||||
- Stream tokens via callback
|
||||
- Handle Metal availability check (build tag gating)
|
||||
|
||||
### Step 1.4: HTTPBackend and LlamaBackend wrappers
|
||||
|
||||
- [ ] **HTTPBackend go-inference wrapper** — HTTPBackend already works fine as `ml.Backend`. For go-inference compatibility, write a thin wrapper that implements `inference.TextModel`:
|
||||
- `Generate()` calls HTTP API, yields entire response as single Token
|
||||
- `Chat()` same
|
||||
- This is lower priority than MLX — HTTP backends don't need the full iter.Seq pattern
|
||||
- Consider SSE streaming: `/v1/chat/completions` with `"stream": true` returns SSE events that CAN be yielded as `iter.Seq[Token]`
|
||||
|
||||
- [ ] **LlamaBackend go-inference wrapper** — LlamaBackend delegates to HTTPBackend already. Same treatment.
|
||||
|
||||
### Step 1.5: Verify downstream consumers
|
||||
|
||||
- [ ] **Service.Generate() still works** — `service.go` calls `Backend.Generate()`. After migration, backends wrapped in `InferenceAdapter` must still satisfy `ml.Backend`.
|
||||
- [ ] **Judge still works** — `judge.go` uses `Backend.Generate()` for LLM-as-judge. Verify scoring pipeline runs end-to-end.
|
||||
- [ ] **go-ai tools_ml.go** — Uses `ml.Service` directly. No code changes needed in go-ai if `ml.Backend` interface is preserved.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Backend Consolidation
|
||||
|
||||
After Phase 1, both `ml.Backend` (string) and `inference.TextModel` (iterator) coexist. Reconcile.
|
||||
|
||||
- [ ] **Audit StreamingBackend usage** — Find all callers of `GenerateStream`/`ChatStream`. Determine which can migrate to `iter.Seq[Token]`.
|
||||
- [ ] **Deprecate StreamingBackend** — Once all callers use go-inference iterators, mark StreamingBackend as deprecated.
|
||||
- [ ] **Unify GenOpts** — `ml.GenOpts` and `inference.GenerateConfig` overlap. Add `convertOpts()` in Phase 1, consolidate into one struct later.
|
||||
- [ ] **Unify Message types** — `ml.Message` and `inference.Message` are identical structs. Consider type alias or shared import.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Agent Loop Modernisation
|
||||
|
||||
`agent.go` (1,070 LOC) is the largest file. Decompose.
|
||||
|
||||
- [ ] **Split agent.go** — Into: `agent_config.go` (config, model maps), `agent_execute.go` (run loop, checkpoint processing), `agent_eval.go` (probe evaluation, result publishing), `agent_influx.go` (InfluxDB streaming, JSONL buffer).
|
||||
- [ ] **Abstract SSH transport** — Extract SSH checkpoint discovery into interface. Current M3 homelab SSH may change to Linux (go-rocm).
|
||||
- [ ] **Configurable endpoints** — `10.69.69.165:8181` and M3 SSH details hardcoded. Move to config/environment.
|
||||
- [ ] **InfluxDB client** — Hand-rolled line protocol. Evaluate official InfluxDB Go client.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Test Coverage
|
||||
|
||||
- [ ] **backend_llama_test.go** — Mock llama-server subprocess. Test: model loading, health checks, process lifecycle.
|
||||
- [ ] **backend_mlx_test.go** — After Phase 1 rewrite, test with mock go-inference TextModel.
|
||||
- [ ] **score.go race tests** — `go test -race ./...`. Concurrent scoring, semaphore boundaries, context cancellation.
|
||||
- [ ] **Benchmark suite** — `BenchmarkHeuristic`, `BenchmarkJudge`, `BenchmarkExact` for various input sizes.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Virgil in core/go writes tasks here after research
|
||||
2. This repo's session picks up tasks in phase order
|
||||
3. Mark `[x]` when done, note commit hash
|
||||
4. Phase 1 is the critical path — everything else builds on go-inference migration
|
||||
4. New discoveries → add tasks, note in FINDINGS.md
|
||||
5. Push to forge after each completed step: `git push forge main`
|
||||
|
|
|
|||
4
go.mod
4
go.mod
|
|
@ -31,6 +31,8 @@ require (
|
|||
golang.org/x/xerrors v0.0.0-20240903120638-7835f813f4da // indirect
|
||||
)
|
||||
|
||||
replace forge.lthn.ai/core/go => ../core
|
||||
replace forge.lthn.ai/core/go => ../host-uk/core
|
||||
|
||||
replace forge.lthn.ai/core/go-mlx => ../go-mlx
|
||||
|
||||
replace forge.lthn.ai/core/go-inference => ../go-inference
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue