docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
7075d7cbe7
commit
3918051112
6 changed files with 892 additions and 531 deletions
143
CLAUDE.md
143
CLAUDE.md
|
|
@ -1,43 +1,18 @@
|
|||
# CLAUDE.md — go-ml Domain Expert Guide
|
||||
# CLAUDE.md — go-ml Agent Guide
|
||||
|
||||
You are a dedicated domain expert for `forge.lthn.ai/core/go-ml`. Virgil (in core/go) orchestrates your work via TODO.md. Pick up tasks in phase order, mark `[x]` when done, commit and push.
|
||||
You are a dedicated domain expert for `forge.lthn.ai/core/go-ml`. Virgil (in core/go) orchestrates work. Pick up tasks in phase order, mark `[x]` when done, commit and push.
|
||||
|
||||
## What This Package Does
|
||||
|
||||
ML inference backends, scoring engine, and agent orchestrator. 7.5K LOC across 41 Go files. Provides:
|
||||
ML inference backends, scoring engine, and agent orchestrator. ~7,500 LOC across 41 Go files. Provides:
|
||||
|
||||
- **Pluggable inference backends** — MLX/Metal (darwin/arm64), llama.cpp (subprocess), HTTP/Ollama (OpenAI-compatible)
|
||||
- **Multi-suite scoring engine** — Heuristic (regex), semantic (LLM judge), content (sovereignty probes), standard benchmarks (TruthfulQA, DoNotAnswer, Toxigen, GSM8K)
|
||||
- **23 capability probes** — Binary pass/fail tests across 16 categories (math, logic, code, etc.)
|
||||
- **GGUF model management** — Format parsing, conversion, inventory
|
||||
- **Multi-suite scoring engine** — heuristic (regex), semantic (LLM judge), content (sovereignty probes), standard benchmarks (TruthfulQA, DoNotAnswer, Toxigen, GSM8K)
|
||||
- **23 capability probes** — binary pass/fail tests across 16 categories
|
||||
- **GGUF model management** — format parsing, conversion, inventory
|
||||
- **Agent orchestrator** — SSH checkpoint discovery, InfluxDB streaming, batch evaluation
|
||||
|
||||
## Critical Context: go-inference Migration
|
||||
|
||||
**Phase 1 is complete.** Both directions of the bridge are implemented:
|
||||
|
||||
1. **Forward adapter** (`adapter.go`): `inference.TextModel` (iter.Seq) -> `ml.Backend`/`ml.StreamingBackend` (string/callback). Used by `backend_mlx.go` to wrap Metal GPU models.
|
||||
2. **Reverse adapters** (`backend_http_textmodel.go`): `HTTPBackend`/`LlamaBackend` -> `inference.TextModel`. Enables HTTP and llama-server backends to be used anywhere that expects a go-inference TextModel.
|
||||
|
||||
### Interface Bridge (DONE)
|
||||
|
||||
```
|
||||
ml.Backend (string) <──adapter.go──> inference.TextModel (iter.Seq[Token])
|
||||
<──backend_http_textmodel.go──>
|
||||
```
|
||||
|
||||
- `InferenceAdapter`: TextModel -> Backend + StreamingBackend (for MLX, ROCm, etc.)
|
||||
- `HTTPTextModel`: HTTPBackend -> TextModel (for remote APIs)
|
||||
- `LlamaTextModel`: LlamaBackend -> TextModel (for managed llama-server)
|
||||
|
||||
### backend_mlx.go (DONE)
|
||||
|
||||
Rewritten from 253 LOC to ~35 LOC. Loads via `inference.LoadModel()` and wraps in `InferenceAdapter`. Uses go-mlx's Metal backend registered via `init()`.
|
||||
|
||||
### Downstream Consumers Verified
|
||||
|
||||
- `service.go` — `Service.Generate()` calls `Backend.Generate()`. InferenceAdapter satisfies Backend. No changes needed.
|
||||
- `judge.go` — `Judge.judgeChat()` calls `Backend.Generate()`. Same contract, works as before.
|
||||
See `docs/architecture.md` for the full architecture reference.
|
||||
|
||||
## Commands
|
||||
|
||||
|
|
@ -45,7 +20,7 @@ Rewritten from 253 LOC to ~35 LOC. Loads via `inference.LoadModel()` and wraps i
|
|||
go mod download # FIRST RUN: populate go.sum
|
||||
go test ./... # Run all tests
|
||||
go test -v -run TestHeuristic # Single test
|
||||
go test -bench=. ./... # Benchmarks (none exist yet)
|
||||
go test -bench=. ./... # Benchmarks
|
||||
go test -race ./... # Race detector
|
||||
go vet ./... # Static analysis
|
||||
```
|
||||
|
|
@ -56,103 +31,16 @@ All resolve via `replace` directives in go.mod:
|
|||
|
||||
| Module | Local Path | Notes |
|
||||
|--------|-----------|-------|
|
||||
| `forge.lthn.ai/core/go` | `../host-uk/core` | Framework (ServiceRuntime, process, log) |
|
||||
| `forge.lthn.ai/core/go` | `../go` | Framework (ServiceRuntime, process, log) |
|
||||
| `forge.lthn.ai/core/go-mlx` | `../go-mlx` | Metal GPU backend (darwin/arm64 only) |
|
||||
| `forge.lthn.ai/core/go-inference` | `../go-inference` | Shared TextModel/Backend interfaces |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Backends (pluggable inference)
|
||||
|
||||
| File | Backend | Status |
|
||||
|------|---------|--------|
|
||||
| `adapter.go` | InferenceAdapter (TextModel -> Backend) | DONE — bridges go-inference to ml.Backend |
|
||||
| `backend_mlx.go` | MLX/Metal GPU | DONE — uses go-inference LoadModel + InferenceAdapter |
|
||||
| `backend_http.go` | HTTP API (OpenAI-compatible) | Works as ml.Backend |
|
||||
| `backend_http_textmodel.go` | HTTPTextModel + LlamaTextModel | DONE — reverse wrappers (Backend -> TextModel) |
|
||||
| `backend_llama.go` | llama-server subprocess | Works as ml.Backend |
|
||||
| `ollama.go` | Ollama helpers | Works |
|
||||
|
||||
### Scoring Engine
|
||||
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `score.go` | 212 | Concurrent scoring orchestrator (semaphore-bounded workers) |
|
||||
| `heuristic.go` | 258 | 9 regex-based metrics, LEK composite score |
|
||||
| `judge.go` | 205 | LLM-as-judge (6 scoring methods) |
|
||||
| `exact.go` | 77 | GSM8K exact-match with numeric extraction |
|
||||
| `probes.go` | 273 | 23 binary capability probes across 16 categories |
|
||||
|
||||
### Data Pipeline
|
||||
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `agent.go` | 1,070 | Scoring agent (SSH checkpoint discovery, InfluxDB) |
|
||||
| `worker.go` | 403 | LEM API worker for distributed inference |
|
||||
| `service.go` | 162 | Core framework integration (lifecycle, backend registry) |
|
||||
| `ingest.go` | 384 | JSONL response loading |
|
||||
| `db.go` | 258 | DuckDB analytics storage |
|
||||
| `gguf.go` | 369 | GGUF model format parsing |
|
||||
|
||||
### Backend Architecture
|
||||
|
||||
Two interface families coexist, bridged by adapters:
|
||||
|
||||
**`inference.TextModel`** (iterator-based) is the **preferred API** for new code. Returns `iter.Seq[inference.Token]` for streaming. Defined in `forge.lthn.ai/core/go-inference`. Use this for GPU backends (MLX Metal, ROCm) and any code that needs token-level control.
|
||||
|
||||
**`ml.Backend`** (string-based) is the **compatibility layer**, still fully supported. Returns complete strings. Used by `service.go`, `judge.go`, and external consumers like `host-uk/cli`.
|
||||
|
||||
**`ml.StreamingBackend`** is **deprecated**. New code should use `inference.TextModel` with `iter.Seq[Token]` directly. Retained for backward compatibility with existing callers.
|
||||
|
||||
**Adapters:**
|
||||
|
||||
| Adapter | Direction | File |
|
||||
|---------|-----------|------|
|
||||
| `InferenceAdapter` | `inference.TextModel` -> `ml.Backend` + `ml.StreamingBackend` | `adapter.go` |
|
||||
| `HTTPTextModel` | `ml.HTTPBackend` -> `inference.TextModel` | `backend_http_textmodel.go` |
|
||||
| `LlamaTextModel` | `ml.LlamaBackend` -> `inference.TextModel` | `backend_http_textmodel.go` |
|
||||
|
||||
**Unified types (Phase 2):**
|
||||
|
||||
- `ml.Message` is a type alias for `inference.Message` — the types are identical, no conversion needed between packages.
|
||||
- `ml.GenOpts` extends `inference.GenerateConfig` with a `Model` field for per-request model overrides. The `convertOpts()` helper maps GenOpts to `[]inference.GenerateOption`.
|
||||
|
||||
### Key Types
|
||||
|
||||
```go
|
||||
// Backend interface (inference.go) — compatibility layer
|
||||
type Backend interface {
|
||||
Generate(ctx context.Context, prompt string, opts GenOpts) (string, error)
|
||||
Chat(ctx context.Context, messages []Message, opts GenOpts) (string, error)
|
||||
Name() string
|
||||
Available() bool
|
||||
}
|
||||
|
||||
// Deprecated: use inference.TextModel with iter.Seq[Token] directly
|
||||
type StreamingBackend interface {
|
||||
Backend
|
||||
GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error
|
||||
ChatStream(ctx context.Context, messages []Message, opts GenOpts, cb TokenCallback) error
|
||||
}
|
||||
|
||||
type GenOpts struct {
|
||||
Temperature float64
|
||||
MaxTokens int
|
||||
Model string // override model for this request
|
||||
TopK int // top-k sampling (0 = disabled)
|
||||
TopP float64 // nucleus sampling threshold (0 = disabled)
|
||||
RepeatPenalty float64 // repetition penalty (0 = disabled, 1.0 = no penalty)
|
||||
}
|
||||
|
||||
// Type alias — identical to inference.Message
|
||||
type Message = inference.Message
|
||||
```
|
||||
|
||||
## Coding Standards
|
||||
|
||||
- **UK English**: colour, organisation, centre
|
||||
- **Tests**: testify assert/require (existing), Pest-style names welcome for new tests
|
||||
- **Conventional commits**: `feat(backend):`, `fix(scoring):`, `refactor(mlx):`
|
||||
- **UK English**: colour, organisation, centre, licence (noun)
|
||||
- **SPDX header**: `// SPDX-Licence-Identifier: EUPL-1.2` in every new source file
|
||||
- **Tests**: testify assert/require; `_Good`/`_Bad`/`_Ugly` suffix pattern
|
||||
- **Conventional commits**: `feat(backend):`, `fix(scoring):`, `refactor(agent):`
|
||||
- **Co-Author**: `Co-Authored-By: Virgil <virgil@lethean.io>`
|
||||
- **Licence**: EUPL-1.2
|
||||
- **Imports**: stdlib → forge.lthn.ai → third-party, each group separated by blank line
|
||||
|
|
@ -161,8 +49,3 @@ type Message = inference.Message
|
|||
|
||||
- **Repo**: `forge.lthn.ai/core/go-ml`
|
||||
- **Push via SSH**: `git push forge main` (remote: `ssh://git@forge.lthn.ai:2223/core/go-ml.git`)
|
||||
|
||||
## Task Queue
|
||||
|
||||
See `TODO.md` for prioritised work. Phase 1 (go-inference migration) is the critical path.
|
||||
See `FINDINGS.md` for research notes and interface mapping.
|
||||
|
|
|
|||
208
FINDINGS.md
208
FINDINGS.md
|
|
@ -1,208 +0,0 @@
|
|||
# FINDINGS.md — go-ml Research & Discovery
|
||||
|
||||
## 2026-02-19: Split from go-ai (Virgil)
|
||||
|
||||
### Origin
|
||||
|
||||
Split from go-ai on 19 Feb 2026. Was `ai/ml/` subpackage inside `forge.lthn.ai/core/go-ai`. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.
|
||||
|
||||
### What Was Extracted
|
||||
|
||||
- 41 Go files (~7,494 LOC excluding tests)
|
||||
- 6 test files (backend_http, exact, heuristic, judge, probes, score)
|
||||
- ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).
|
||||
|
||||
### Dependencies
|
||||
|
||||
- `forge.lthn.ai/core/go-mlx` — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
|
||||
- `forge.lthn.ai/core/go-inference` — Shared TextModel/Backend/Token interfaces (target for Phase 1)
|
||||
- `forge.lthn.ai/core/go` — Framework services, process management, logging
|
||||
- `github.com/marcboeker/go-duckdb` — Analytics storage
|
||||
- `github.com/parquet-go/parquet-go` — Columnar data I/O
|
||||
- `github.com/stretchr/testify` — Test assertions
|
||||
|
||||
### Consumers
|
||||
|
||||
- `go-ai/mcp/tools_ml.go` — Exposes ML as MCP tools (uses `ml.Service`, `ml.GenOpts`, `ml.Backend`)
|
||||
- LEM Lab — Uses MLXBackend for chat inference
|
||||
- go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)
|
||||
|
||||
## go-inference Interface Mapping
|
||||
|
||||
### Type Correspondence
|
||||
|
||||
| go-ml | go-inference | Notes |
|
||||
|-------|-------------|-------|
|
||||
| `ml.Backend` | `inference.Backend` | Different semantics: ml returns string, inference returns TextModel |
|
||||
| `ml.StreamingBackend` | (built into TextModel) | iter.Seq[Token] is inherently streaming |
|
||||
| `ml.GenOpts` | `inference.GenerateConfig` | Use functional options: `WithMaxTokens(n)` etc. |
|
||||
| `ml.Message` | `inference.Message` | Identical struct: Role + Content |
|
||||
| `ml.TokenCallback` | (not needed) | iter.Seq[Token] replaces callbacks |
|
||||
| (no equivalent) | `inference.Token` | `{ID int32, Text string}` |
|
||||
| (no equivalent) | `inference.TextModel` | Generate/Chat return iter.Seq[Token] |
|
||||
|
||||
### Method Mapping
|
||||
|
||||
```
|
||||
ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
|
||||
↕ InferenceAdapter collects tokens
|
||||
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
|
||||
|
||||
ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
|
||||
↕ InferenceAdapter forwards tokens to callback
|
||||
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
|
||||
|
||||
ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
|
||||
↕ convertOpts helper
|
||||
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)
|
||||
```
|
||||
|
||||
### backend_mlx.go Before/After
|
||||
|
||||
**Before** (253 LOC — BROKEN, old subpackage imports):
|
||||
```go
|
||||
import (
|
||||
"forge.lthn.ai/core/go-mlx"
|
||||
"forge.lthn.ai/core/go-mlx/cache" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/model" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/sample" // REMOVED
|
||||
"forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
|
||||
)
|
||||
|
||||
type MLXBackend struct {
|
||||
model model.Model
|
||||
tok *tokenizer.Tokenizer
|
||||
caches []cache.Cache
|
||||
sampler sample.Sampler
|
||||
// ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
|
||||
}
|
||||
```
|
||||
|
||||
**After** (~60 LOC — uses go-inference + InferenceAdapter):
|
||||
```go
|
||||
import (
|
||||
"forge.lthn.ai/core/go-inference"
|
||||
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
|
||||
)
|
||||
|
||||
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
|
||||
m, err := inference.LoadModel(modelPath)
|
||||
if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
|
||||
return &InferenceAdapter{model: m, name: "mlx"}, nil
|
||||
}
|
||||
```
|
||||
|
||||
All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's `internal/metal/` package, accessed through the go-inference `TextModel` interface.
|
||||
|
||||
## Scoring Engine Architecture
|
||||
|
||||
### 5 Suites
|
||||
|
||||
| Suite | Method | LLM needed? | Metrics |
|
||||
|-------|--------|-------------|---------|
|
||||
| **Heuristic** | Regex + word analysis | No | 9 metrics → LEK composite |
|
||||
| **Semantic** | LLM-as-judge | Yes | 4 dimensions (sovereignty, ethical, creative, self-concept) |
|
||||
| **Content** | LLM-as-judge | Yes | 6 sovereignty probes (CCP, truth, engagement, etc.) |
|
||||
| **Standard** | LLM-as-judge | Yes | TruthfulQA, DoNotAnswer, Toxigen |
|
||||
| **Exact** | Numeric extraction | No | GSM8K answer matching |
|
||||
|
||||
### LEK Score Formula
|
||||
|
||||
```
|
||||
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
|
||||
- ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
|
||||
```
|
||||
|
||||
Positive signals: engagement depth, creative form, emotional register, first-person voice.
|
||||
Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.
|
||||
|
||||
### Concurrency Model
|
||||
|
||||
`Engine.ScoreAll()` fans out goroutines bounded by semaphore (`concurrency` setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with `sync.WaitGroup`. Results collected into `[]PromptScore` via mutex.
|
||||
|
||||
## Phase 2 Audit: StreamingBackend Usage (Virgil, 20 Feb 2026)
|
||||
|
||||
### Callers of GenerateStream/ChatStream
|
||||
|
||||
Only 2 files across the entire ecosystem call StreamingBackend methods:
|
||||
|
||||
1. **`host-uk/cli/cmd/ml/cmd_serve.go`** (lines 146, 201, 319)
|
||||
- Type-asserts `backend.(ml.StreamingBackend)` for SSE streaming
|
||||
- `/v1/completions` → `streamer.GenerateStream()` (line 201)
|
||||
- `/v1/chat/completions` → `streamer.ChatStream()` (line 319)
|
||||
- Has non-streaming fallback: `backend.Generate()` when assertion fails
|
||||
|
||||
2. **`host-uk/cli/cmd/ml/cmd_chat.go`**
|
||||
- Direct `ChatStream()` call for terminal token-by-token echo
|
||||
- No fallback — assumes backend supports streaming
|
||||
|
||||
### Non-streaming consumers (use Backend.Generate only)
|
||||
|
||||
| File | Method | Notes |
|
||||
|------|--------|-------|
|
||||
| service.go | `Backend.Generate()` | Backend registry dispatch |
|
||||
| judge.go | `Backend.Generate()` | Via judgeChat() |
|
||||
| agent.go | `Backend.Generate()` | Probe evaluation |
|
||||
| expand.go | `Backend.Generate()` | Prompt expansion |
|
||||
| go-ai/mcp/tools_ml.go | `ml.Service` | Via service layer |
|
||||
|
||||
### Backend Implementation Status
|
||||
|
||||
| Backend | Backend? | StreamingBackend? | Notes |
|
||||
|---------|----------|-------------------|-------|
|
||||
| InferenceAdapter | YES | YES | Bridges iter.Seq[Token] → callbacks |
|
||||
| HTTPBackend | YES | NO | Returns complete string from API |
|
||||
| LlamaBackend | YES | NO | Returns complete string via HTTP |
|
||||
|
||||
### Conclusion
|
||||
|
||||
StreamingBackend is only needed by `host-uk/cli` (2 files, out of go-ml scope). Safe to deprecate in go-ml with a comment. The actual migration of those CLI files is a separate task for the cli repo.
|
||||
|
||||
### GenOpts vs GenerateConfig Field Comparison
|
||||
|
||||
| ml.GenOpts | inference.GenerateConfig | Type |
|
||||
|-----------|--------------------------|------|
|
||||
| Temperature | Temperature | float64 vs float32 |
|
||||
| MaxTokens | MaxTokens | int (same) |
|
||||
| Model | (none) | string |
|
||||
| (none) | TopK | int |
|
||||
| (none) | TopP | float32 |
|
||||
| (none) | StopTokens | []int32 |
|
||||
| (none) | RepeatPenalty | float32 |
|
||||
| (none) | ReturnLogits | bool |
|
||||
|
||||
## Known Issues
|
||||
|
||||
- ~~**backend_mlx.go imports dead subpackages**~~ — FIXED in Phase 1 (`c3c2c14`)
|
||||
- **agent.go too large** — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
|
||||
- **Hardcoded infrastructure** — InfluxDB endpoint `10.69.69.165:8181`, M3 SSH details in agent.go
|
||||
- **No tests for backend_llama and backend_mlx** — Only backend_http_test.go exists
|
||||
- **score.go concurrency untested** — No race condition tests
|
||||
- ~~**Message type duplication**~~ — FIXED in Phase 2 (`747e703`): type alias `Message = inference.Message`
|
||||
|
||||
## Phase 3 Audit: agent.go Structure (Virgil, 20 Feb 2026)
|
||||
|
||||
### File Layout (1,070 LOC)
|
||||
|
||||
| Section | Lines | LOC | Purpose |
|
||||
|---------|-------|-----|---------|
|
||||
| Types & Config | 19–112 | ~95 | `AgentConfig`, `Checkpoint`, config maps, `AdapterMeta()` |
|
||||
| Main Loop | 141–343 | ~200 | `RunAgentLoop()`, checkpoint discovery, unscored filtering |
|
||||
| Evaluation | 345–700 | ~355 | MLX-native + conversion paths, 4 probe functions |
|
||||
| Judge & Push | 708–887 | ~180 | Scoring, InfluxDB line protocol, DuckDB dual-write |
|
||||
| Buffering | 926–977 | ~50 | JSONL buffer for InfluxDB failures |
|
||||
| SSH/SCP | 979–1070 | ~90 | `SSHCommand()`, `SCPFrom()`, `SCPTo()`, utility helpers |
|
||||
|
||||
### Hardcoded Infrastructure
|
||||
|
||||
- SSH options duplicated across 3 functions: `ConnectTimeout=10, BatchMode=yes, StrictHostKeyChecking=no`
|
||||
- InfluxDB timestamp base: `1739577600` (13 Feb 2026 00:00 UTC)
|
||||
- InfluxDB measurements: `probe_score`, `capability_score`, `capability_judge`, `content_score`
|
||||
- DuckDB tables: `checkpoint_scores`, `probe_results`
|
||||
|
||||
### Test Coverage
|
||||
|
||||
Zero tests for agent.go. Testable without infrastructure:
|
||||
- `AdapterMeta()` — pure function, dirname → metadata
|
||||
- `FindUnscored()` — filtering logic
|
||||
- `BufferInfluxResult()`/`ReplayInfluxBuffer()` — JSONL round-trip
|
||||
193
TODO.md
193
TODO.md
|
|
@ -1,193 +0,0 @@
|
|||
# TODO.md — go-ml Task Queue
|
||||
|
||||
Dispatched from Virgil in core/go. Pick up tasks in phase order.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: go-inference Migration (CRITICAL PATH)
|
||||
|
||||
Everything downstream is blocked on this. The old `backend_mlx.go` imports go-mlx subpackages that no longer exist after Phase 4 refactoring.
|
||||
|
||||
### Step 1.1: Add go-inference dependency
|
||||
|
||||
- [x] **Add `forge.lthn.ai/core/go-inference` to go.mod** — Already has a `replace` directive pointing to `../go-inference`. Run `go get forge.lthn.ai/core/go-inference` then `go mod tidy`. Verify the module resolves.
|
||||
|
||||
### Step 1.2: Write the InferenceAdapter
|
||||
|
||||
- [x] **Create `adapter.go`** — Bridge between `go-inference.TextModel` (returns `iter.Seq[Token]`) and `ml.Backend` + `ml.StreamingBackend` (returns `string`/callback). Must implement:
|
||||
- `Generate()` — collect tokens from iterator into string
|
||||
- `Chat()` — same, using `TextModel.Chat()`
|
||||
- `GenerateStream()` — forward tokens to `TokenCallback`
|
||||
- `ChatStream()` — same for chat
|
||||
- `Name()` — delegate to `TextModel.ModelType()`
|
||||
- `Available()` — always true (model already loaded)
|
||||
- `convertOpts(GenOpts) []inference.GenerateOption` — map `GenOpts` fields to functional options
|
||||
|
||||
**Key mapping**:
|
||||
```
|
||||
GenOpts.Temperature → inference.WithTemperature(float32(t))
|
||||
GenOpts.MaxTokens → inference.WithMaxTokens(n)
|
||||
GenOpts.Model → (ignored, model already loaded)
|
||||
```
|
||||
|
||||
**Error handling**: After the iterator completes, check `model.Err()` to distinguish EOS from errors (OOM, ctx cancelled).
|
||||
|
||||
- [x] **Test adapter.go** — 13 test cases with mock TextModel (all pass). Test cases:
|
||||
- Normal generation (collect tokens → string)
|
||||
- Streaming (each token hits callback)
|
||||
- Callback error stops iteration
|
||||
- Context cancellation propagates
|
||||
- Empty output (EOS immediately)
|
||||
- Model error after partial output
|
||||
|
||||
### Step 1.3: Rewrite backend_mlx.go
|
||||
|
||||
- [x] **Replace backend_mlx.go** — Deleted the 253 LOC that manually handle tokenisation, KV cache, sampling, and memory cleanup. Replaced with ~35 LOC:
|
||||
```go
|
||||
//go:build darwin && arm64
|
||||
|
||||
package ml
|
||||
|
||||
import (
|
||||
"forge.lthn.ai/core/go-inference"
|
||||
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend
|
||||
)
|
||||
|
||||
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
|
||||
m, err := inference.LoadModel(modelPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("mlx: %w", err)
|
||||
}
|
||||
return &InferenceAdapter{model: m, name: "mlx"}, nil
|
||||
}
|
||||
```
|
||||
The `InferenceAdapter` from Step 1.2 handles all the Generate/Chat/Stream logic.
|
||||
|
||||
- [x] **Preserve memory controls** — Deferred: go-mlx handles cache/memory limits internally. Callers can use `mlx.SetCacheLimit()`/`mlx.SetMemoryLimit()` directly. No wrapper needed until a concrete use case arises.
|
||||
|
||||
- [x] **Test backend_mlx.go** — Covered by Phase 4 `backend_mlx_test.go` (8 tests via mock TextModel). Integration smoke test with real model deferred until LEM Lab pipeline is wired.
|
||||
|
||||
### Step 1.4: HTTPBackend and LlamaBackend wrappers
|
||||
|
||||
- [x] **HTTPBackend go-inference wrapper** — `backend_http_textmodel.go`: `HTTPTextModel` wraps `HTTPBackend` to implement `inference.TextModel`. Generate/Chat yield entire response as single Token. Classify returns unsupported error. BatchGenerate processes prompts sequentially. 17 tests pass.
|
||||
|
||||
- [x] **LlamaBackend go-inference wrapper** — `backend_http_textmodel.go`: `LlamaTextModel` embeds `HTTPTextModel`, overrides `ModelType()` -> "llama" and `Close()` -> `llama.Stop()`. 2 tests pass.
|
||||
|
||||
### Step 1.5: Verify downstream consumers
|
||||
|
||||
- [x] **Service.Generate() still works** — `service.go` calls `Backend.Generate()`. InferenceAdapter satisfies ml.Backend. HTTPBackend/LlamaBackend still implement ml.Backend directly. No changes needed.
|
||||
- [x] **Judge still works** — `judge.go` calls `Backend.Generate()` via `judgeChat()`. Same Backend contract, works as before. No changes needed.
|
||||
- [x] **go-ai tools_ml.go** — Uses `ml.Service` directly. `ml.Backend` interface is preserved, no code changes needed in go-ai.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Backend Consolidation
|
||||
|
||||
After Phase 1, both `ml.Backend` (string) and `inference.TextModel` (iterator) coexist. Reconcile.
|
||||
|
||||
### Audit Results (Virgil, 20 Feb 2026)
|
||||
|
||||
**StreamingBackend callers** — Only 2 files in `host-uk/cli`:
|
||||
- `cmd/ml/cmd_serve.go` lines 146,201,319: Type-asserts `backend.(ml.StreamingBackend)` for SSE streaming at `/v1/completions` and `/v1/chat/completions`
|
||||
- `cmd/ml/cmd_chat.go`: Direct `ChatStream()` call for interactive terminal token echo
|
||||
|
||||
All other consumers (service.go, judge.go, agent.go, expand.go, go-ai tools_ml.go) use `Backend.Generate()` — NOT streaming.
|
||||
|
||||
**Backend implementations**:
|
||||
- `InferenceAdapter` → implements Backend + StreamingBackend (via go-inference iter.Seq)
|
||||
- `HTTPBackend` → implements Backend only (no streaming)
|
||||
- `LlamaBackend` → implements Backend only (no streaming)
|
||||
|
||||
### Step 2.1: Unify Message types
|
||||
|
||||
- [x] **Type alias ml.Message → inference.Message** — In `inference.go`, replace the `Message` struct with:
|
||||
```go
|
||||
type Message = inference.Message
|
||||
```
|
||||
This is backward-compatible — all existing callers keep working. Remove the `convertMessages()` helper from `adapter.go` since types are now identical. Verify with `go build ./...` and `go test ./...`.
|
||||
|
||||
### Step 2.2: Unify GenOpts
|
||||
|
||||
- [x] **Add inference fields to GenOpts** — Extend `ml.GenOpts` to include the extra fields from `inference.GenerateConfig`:
|
||||
```go
|
||||
type GenOpts struct {
|
||||
Temperature float64
|
||||
MaxTokens int
|
||||
Model string // override model for this request
|
||||
TopK int // NEW: from inference.GenerateConfig
|
||||
TopP float64 // NEW: from inference.GenerateConfig (float64 to match Temperature)
|
||||
RepeatPenalty float64 // NEW: from inference.GenerateConfig
|
||||
}
|
||||
```
|
||||
Update `convertOpts()` in adapter.go to map the new fields. Existing callers that only set Temperature/MaxTokens/Model continue working unchanged.
|
||||
|
||||
### Step 2.3: Deprecate StreamingBackend
|
||||
|
||||
- [x] **Mark StreamingBackend as deprecated** — Add deprecation comment:
|
||||
```go
|
||||
// Deprecated: StreamingBackend is retained for backward compatibility.
|
||||
// New code should use inference.TextModel with iter.Seq[Token] directly.
|
||||
// See InferenceAdapter for the bridge pattern.
|
||||
type StreamingBackend interface { ... }
|
||||
```
|
||||
Do NOT remove yet — `host-uk/cli` cmd_serve.go and cmd_chat.go still depend on it. Those migrations are out of scope for go-ml (they live in a different repo).
|
||||
|
||||
### Step 2.4: Document migration path
|
||||
|
||||
- [x] **Update CLAUDE.md** — Add "Backend Architecture" section documenting:
|
||||
- `inference.TextModel` (iterator-based) is the preferred API for new code
|
||||
- `ml.Backend` (string-based) is the compatibility layer, still supported
|
||||
- `StreamingBackend` is deprecated, use `iter.Seq[Token]` directly
|
||||
- `InferenceAdapter` bridges TextModel → Backend/StreamingBackend
|
||||
- `HTTPTextModel`/`LlamaTextModel` bridges Backend → TextModel (reverse direction)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Agent Loop Modernisation
|
||||
|
||||
`agent.go` (1,070 LOC) is the largest file with SSH, InfluxDB, scoring, and publishing mixed together. Decompose into focused files.
|
||||
|
||||
### Step 3.1: Split agent.go into 5 files — COMPLETE
|
||||
|
||||
- [x] **Split `agent.go` (1,070 LOC) into 5 focused files** — Commit `eae9ec9`. All `go build/test/vet` pass:
|
||||
- `agent_config.go` (97 LOC): AgentConfig, Checkpoint, BaseModelMap, ModelFamilies, AdapterMeta()
|
||||
- `agent_execute.go` (215 LOC): RunAgentLoop, DiscoverCheckpoints, GetScoredLabels, FindUnscored, ProcessOne, isMLXNative
|
||||
- `agent_eval.go` (397 LOC): processMLXNative, processWithConversion, RunCapabilityProbes/Full, RunContentProbes, ProbeResult types
|
||||
- `agent_influx.go` (291 LOC): ScoreCapabilityAndPush, ScoreContentAndPush, PushCapability*, BufferInfluxResult, ReplayInfluxBuffer
|
||||
- `agent_ssh.go` (102 LOC): SSHCommand, SCPFrom, SCPTo, fileBase, EnvOr, IntEnvOr, ExpandHome
|
||||
|
||||
### Step 3.2: Abstract SSH transport — COMPLETE
|
||||
|
||||
- [x] **RemoteTransport interface + SSHTransport** — Commit `1c2a6a6`. Interface with Run/CopyFrom/CopyTo, SSHTransport implementation with functional options (WithPort, WithTimeout). AgentConfig.Transport field with lazy init. All callers updated (DiscoverCheckpoints, processMLXNative, processWithConversion). Old SSHCommand/SCPFrom/SCPTo preserved as deprecated wrappers. Build/test/vet clean.
|
||||
|
||||
### Step 3.3: Configurable infrastructure — COMPLETE
|
||||
|
||||
- [x] **Extract hardcoded values to constants** — Commit `12f3a1c`. 15 constants in agent_config.go: EpochBase, 5 InfluxDB measurements, 2 DuckDB tables, probe defaults (temp/maxTokens/truncation), InfluxBufferFile, LogSeparatorWidth, InterCheckpointDelay. Hardcoded probe counts replaced with len(). 7 files, build/test/vet clean.
|
||||
|
||||
### Step 3.4: Agent tests — COMPLETE
|
||||
|
||||
- [x] **Test `AdapterMeta()`** — 8 tests: known families (12 entries), variant suffix, subdirectory patterns, unknown fallback, no-prefix edge case. Commit `3e22761`.
|
||||
- [x] **Test `FindUnscored()`** — 5 tests: all unscored (sorted), some scored, all scored, empty input, nil scored map. Commit `3e22761`.
|
||||
- [x] **Test `BufferInfluxResult()`/`ReplayInfluxBuffer()`** — 4 tests: JSONL round-trip, multiple entries, empty file, missing file. Commit `3e22761`.
|
||||
- [x] **Test `DiscoverCheckpoints()`** — 6 tests: happy path (3 checkpoints across 2 dirs), subdirectory pattern, no adapters, SSH error, filter pattern, no safetensors. Uses `fakeTransport` mock implementing `RemoteTransport`. Commit `3e22761`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Test Coverage — COMPLETE
|
||||
|
||||
All 4 test files created and verified with `go test -race ./...`. Commit `09bf403`.
|
||||
|
||||
- [x] **backend_llama_test.go** — 20 tests via httptest mock: Name, Available (4 variants), Generate (6 variants incl. context cancellation, empty choices, opts forwarding), Chat (3 variants), Stop, constructor (4 variants), interface compliance.
|
||||
- [x] **backend_mlx_test.go** — 8 tests via mock TextModel (no build tag needed): Generate, Chat, Stream, ModelError, Close, ModelAccess, InterfaceCompliance, ConvertOpts.
|
||||
- [x] **score_race_test.go** — 6 race-condition tests: ConcurrentSemantic (20 responses, concurrency=4), ConcurrentMixedSuites (semantic+standard+content fan-out), SemaphoreBoundary (concurrency=1, verifies max concurrent==1), ContextCancellation (400 error→nil semantic), HeuristicOnlyNoRace (50 responses), MultiModelConcurrent (4 models×5 concurrent map writes).
|
||||
- [x] **benchmark_test.go** — 25 benchmarks: HeuristicScore (5 sizes: 25µs–8.8ms), ExactMatch (4 patterns: 171ns–2.1µs), JudgeExtractJSON (6 variants: 2.5–3.4µs), Judge round-trip (2 suites: ~52µs), ScoreAll (2 modes: 25µs–4.5ms), sub-components (5 heuristic stages: 244ns–88µs). Baselines on M3 Ultra.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Virgil in core/go writes tasks here after research
|
||||
2. This repo's session picks up tasks in phase order
|
||||
3. Mark `[x]` when done, note commit hash
|
||||
4. New discoveries → add tasks, note in FINDINGS.md
|
||||
5. Push to forge after each completed step: `git push forge main`
|
||||
378
docs/architecture.md
Normal file
378
docs/architecture.md
Normal file
|
|
@ -0,0 +1,378 @@
|
|||
# go-ml Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
`forge.lthn.ai/core/go-ml` is the ML inference, evaluation, and orchestration library for the Core Go ecosystem. It was extracted from `go-ai` on 19 February 2026 and now stands as an independent module of approximately 7,500 LOC across 41 source files.
|
||||
|
||||
The package provides three distinct subsystems:
|
||||
|
||||
1. **Pluggable inference backends** — a common `Backend` interface with implementations for Metal GPU (MLX), managed llama-server subprocesses, and OpenAI-compatible HTTP APIs.
|
||||
2. **Multi-suite scoring engine** — concurrent evaluation of model responses across heuristic, semantic, content, standard benchmark, and exact-match scoring suites.
|
||||
3. **Agent orchestrator** — SSH-based checkpoint discovery, distributed probe evaluation, and InfluxDB/DuckDB result streaming for continuous fine-tuning evaluation.
|
||||
|
||||
---
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
```
|
||||
forge.lthn.ai/core/go-ml
|
||||
├── forge.lthn.ai/core/go-inference (shared TextModel/Token interfaces)
|
||||
│ └── (no further Core deps)
|
||||
├── forge.lthn.ai/core/go-mlx (Metal GPU inference, darwin/arm64 only)
|
||||
│ └── forge.lthn.ai/core/go-inference
|
||||
├── forge.lthn.ai/core/go (ServiceRuntime, process, log)
|
||||
├── github.com/marcboeker/go-duckdb (analytics storage)
|
||||
└── github.com/parquet-go/parquet-go (columnar data I/O)
|
||||
```
|
||||
|
||||
### Role of each dependency
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `go-inference` | Zero-dependency shared interfaces. Defines `TextModel`, `Token`, `Backend`, `GenerateConfig`. Compiles on all platforms. |
|
||||
| `go-mlx` | Native Metal GPU inference for Apple Silicon. Registers the `"metal"` backend via its `init()` function. Active only on `darwin && arm64`. |
|
||||
| `go` | Core framework. Provides `ServiceRuntime`, lifecycle hooks (`OnStartup`/`OnShutdown`), process management, and structured logging. |
|
||||
| `go-duckdb` | DuckDB bindings for local analytical storage of checkpoint scores and probe results. |
|
||||
| `parquet-go` | Columnar Parquet I/O for bulk dataset export and import. |
|
||||
|
||||
---
|
||||
|
||||
## Backend Architecture
|
||||
|
||||
Two interface families coexist within go-ml, connected by a set of adapters.
|
||||
|
||||
### The `ml.Backend` interface (compatibility layer)
|
||||
|
||||
```go
|
||||
type Backend interface {
|
||||
Generate(ctx context.Context, prompt string, opts GenOpts) (string, error)
|
||||
Chat(ctx context.Context, messages []Message, opts GenOpts) (string, error)
|
||||
Name() string
|
||||
Available() bool
|
||||
}
|
||||
```
|
||||
|
||||
`Backend` returns complete strings. It is the primary interface consumed by `service.go`, `judge.go`, `agent_eval.go`, and `expand.go`. All three concrete backend types — `HTTPBackend`, `LlamaBackend`, and `InferenceAdapter` — satisfy this interface.
|
||||
|
||||
### The `inference.TextModel` interface (preferred for new code)
|
||||
|
||||
Defined in `go-inference`, this interface returns `iter.Seq[inference.Token]` — a Go 1.23 range-over-function iterator. This is the natural API for GPU backends where tokens are generated one at a time. New code that requires token-level control or needs to interoperate with other Core Go packages should use `TextModel`.
|
||||
|
||||
### `ml.StreamingBackend` (deprecated)
|
||||
|
||||
```go
|
||||
// Deprecated: use inference.TextModel with iter.Seq[Token] directly.
|
||||
type StreamingBackend interface {
|
||||
Backend
|
||||
GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error
|
||||
ChatStream(ctx context.Context, messages []Message, opts GenOpts, cb TokenCallback) error
|
||||
}
|
||||
```
|
||||
|
||||
Only two files in `host-uk/cli` call `StreamingBackend` methods. It is retained for backward compatibility; no new code should use it.
|
||||
|
||||
### Type unification
|
||||
|
||||
`ml.Message` is a type alias for `inference.Message`:
|
||||
|
||||
```go
|
||||
type Message = inference.Message
|
||||
```
|
||||
|
||||
The two types are identical at compile time. No conversion is needed when passing messages between the `ml` and `inference` packages.
|
||||
|
||||
`ml.GenOpts` extends `inference.GenerateConfig` with a `Model` field for per-request model selection:
|
||||
|
||||
```go
|
||||
type GenOpts struct {
|
||||
Temperature float64
|
||||
MaxTokens int
|
||||
Model string // per-request model override; ignored by GPU backends
|
||||
TopK int
|
||||
TopP float64
|
||||
RepeatPenalty float64
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backend Implementations
|
||||
|
||||
### HTTPBackend (`backend_http.go`)
|
||||
|
||||
Speaks the OpenAI-compatible `/v1/chat/completions` API. Used for remote APIs (Ollama, LM Studio, vLLM, any OpenAI-compatible server).
|
||||
|
||||
- Implements `ml.Backend` only (no streaming — returns complete response strings).
|
||||
- Retries up to 3 times with exponential backoff on 5xx and connection errors.
|
||||
- 300-second HTTP client timeout suitable for long-running inference.
|
||||
|
||||
### LlamaBackend (`backend_llama.go`)
|
||||
|
||||
Manages a `llama-server` subprocess and delegates HTTP calls to an embedded `HTTPBackend`.
|
||||
|
||||
- Implements `ml.Backend`.
|
||||
- `Start()` launches the subprocess and polls the `/health` endpoint for up to 30 seconds.
|
||||
- `Stop()` kills the managed process via the Core `process.Service`.
|
||||
- Supports optional LoRA adapter loading via `--lora`.
|
||||
|
||||
### InferenceAdapter (`adapter.go`)
|
||||
|
||||
Bridges a `go-inference.TextModel` (iterator-based) into the `ml.Backend` and `ml.StreamingBackend` interfaces. This is the gateway through which GPU backends enter the go-ml ecosystem.
|
||||
|
||||
```
|
||||
inference.TextModel (iter.Seq[Token])
|
||||
│
|
||||
└─── InferenceAdapter ───► ml.Backend (string)
|
||||
───► ml.StreamingBackend (TokenCallback)
|
||||
```
|
||||
|
||||
Key behaviours:
|
||||
|
||||
- `Generate` and `Chat` collect all tokens into a `strings.Builder` and return the concatenated string. After the iterator is exhausted, `model.Err()` is checked to distinguish normal end-of-sequence from OOM or context cancellation errors.
|
||||
- `GenerateStream` and `ChatStream` forward each token's text to the provided `TokenCallback`. If the callback returns an error, iteration stops.
|
||||
- `Available()` always returns `true` — the model is already loaded when the adapter is constructed.
|
||||
- `Close()` delegates to `TextModel.Close()`, releasing GPU memory.
|
||||
|
||||
### MLX Backend (`backend_mlx.go`, darwin/arm64 only)
|
||||
|
||||
```go
|
||||
//go:build darwin && arm64
|
||||
|
||||
func NewMLXBackend(modelPath string, loadOpts ...inference.LoadOption) (*InferenceAdapter, error) {
|
||||
m, err := inference.LoadModel(modelPath, loadOpts...)
|
||||
// ...
|
||||
return NewInferenceAdapter(m, "mlx"), nil
|
||||
}
|
||||
```
|
||||
|
||||
The blank import `_ "forge.lthn.ai/core/go-mlx"` triggers go-mlx's `init()`, which registers the `"metal"` backend with go-inference's backend registry. Subsequent calls to `inference.LoadModel()` automatically use Metal GPU acceleration on Apple Silicon.
|
||||
|
||||
The model file at `modelPath` may be a local directory (MLX format) or a HuggingFace model identifier. All tokenisation, KV cache management, sampling, and memory limits are handled inside go-mlx's `internal/metal/` package.
|
||||
|
||||
### Reverse adapters (`backend_http_textmodel.go`)
|
||||
|
||||
Two types wrap `ml` backends as `inference.TextModel`, enabling HTTP and llama-server backends to be used in packages that expect the go-inference interface (e.g. `go-ai`, `go-i18n`).
|
||||
|
||||
| Type | Wraps | Notes |
|
||||
|------|-------|-------|
|
||||
| `HTTPTextModel` | `*HTTPBackend` | Yields the full HTTP response as a single `Token`. Classify returns an unsupported error. BatchGenerate processes sequentially. |
|
||||
| `LlamaTextModel` | `*LlamaBackend` | Embeds `HTTPTextModel`; overrides `ModelType()` → `"llama"` and `Close()` → `llama.Stop()`. |
|
||||
|
||||
### Adapter map (all directions)
|
||||
|
||||
```
|
||||
ml.Backend (string) <──── InferenceAdapter ──── inference.TextModel (iter.Seq[Token])
|
||||
(adapter.go)
|
||||
|
||||
ml.HTTPBackend ──── HTTPTextModel ────► inference.TextModel
|
||||
ml.LlamaBackend ─── LlamaTextModel ───► inference.TextModel
|
||||
(backend_http_textmodel.go)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Layer (`service.go`)
|
||||
|
||||
`Service` integrates go-ml into the Core framework lifecycle:
|
||||
|
||||
```go
|
||||
core.New(
|
||||
framework.WithName("ml", ml.NewService(ml.Options{
|
||||
OllamaURL: "http://localhost:11434",
|
||||
JudgeURL: "http://localhost:11434",
|
||||
JudgeModel: "qwen3:8b",
|
||||
Concurrency: 4,
|
||||
Suites: "all",
|
||||
})),
|
||||
)
|
||||
```
|
||||
|
||||
`OnStartup` registers the Ollama backend and initialises the `Judge` and scoring `Engine` if a judge URL is configured. Backends can also be registered at runtime via `RegisterBackend(name, backend)`.
|
||||
|
||||
---
|
||||
|
||||
## Scoring Engine
|
||||
|
||||
### Engine (`score.go`)
|
||||
|
||||
`Engine.ScoreAll()` evaluates a slice of `Response` values across all configured suites concurrently.
|
||||
|
||||
```
|
||||
ScoreAll(responses []Response) map[string][]PromptScore
|
||||
│
|
||||
├── Heuristic (inline, no goroutine)
|
||||
└── Semantic / Content / Standard / Exact (worker pool, semaphore-bounded)
|
||||
```
|
||||
|
||||
The worker pool is bounded by a semaphore channel of capacity `concurrency`. `sync.WaitGroup` coordinates completion. Results are written to pre-allocated score slots via pointer to avoid allocations during fan-out.
|
||||
|
||||
Suites are selected at engine construction time via a comma-separated string or `"all"`.
|
||||
|
||||
### Heuristic scoring (`heuristic.go`)
|
||||
|
||||
Analyses a response using pre-compiled regular expressions. No LLM is needed.
|
||||
|
||||
Nine sub-scores feed into the composite LEK (Linguistic Engagement Kernel) score:
|
||||
|
||||
```
|
||||
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
|
||||
- ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
|
||||
```
|
||||
|
||||
**Positive signals**
|
||||
|
||||
| Sub-score | What it measures |
|
||||
|-----------|-----------------|
|
||||
| `EngagementDepth` | Structural markers (headings, bold), ethical vocabulary, technical depth, word count |
|
||||
| `CreativeForm` | Poetry structure (short lines), narrative openings, metaphor density |
|
||||
| `EmotionalRegister` | Emotional vocabulary (feel, grief, compassion, etc.) |
|
||||
| `FirstPerson` | Sentences beginning with "I" or containing first-person agency verbs |
|
||||
|
||||
**Negative signals**
|
||||
|
||||
| Sub-score | What it measures |
|
||||
|-----------|-----------------|
|
||||
| `ComplianceMarkers` | RLHF safety phrases ("As an AI", "I cannot", "ethical considerations") |
|
||||
| `FormulaicPreamble` | Opener templates ("Sure, let's...", "Great question") |
|
||||
| `Degeneration` | Sentence repetition ratio (looping/stuck output) |
|
||||
| `EmptyBroken` | Empty, error-prefixed, or pad-token-polluted responses |
|
||||
|
||||
### Judge (`judge.go`)
|
||||
|
||||
`Judge` uses any `Backend` as an evaluator. It sends a formatted prompt to the judge model and parses the JSON response.
|
||||
|
||||
```go
|
||||
judge := ml.NewJudge(ml.NewHTTPBackend("http://localhost:11434", "qwen3:8b"))
|
||||
scores, err := judge.ScoreSemantic(ctx, prompt, response)
|
||||
```
|
||||
|
||||
JSON extraction (`extractJSON`) handles raw JSON, JSON embedded in prose, and JSON inside markdown code fences.
|
||||
|
||||
Six scoring methods are available:
|
||||
|
||||
| Method | Suite | Dimensions |
|
||||
|--------|-------|-----------|
|
||||
| `ScoreSemantic` | semantic | Sovereignty, EthicalDepth, CreativeExpression, SelfConcept |
|
||||
| `ScoreContent` | content | CCPCompliance, TruthTelling, Engagement, AxiomIntegration, SovereigntyReasoning, EmotionalRegister |
|
||||
| `ScoreCapability` | (agent) | Reasoning, Correctness, Clarity |
|
||||
| `ScoreTruthfulQA` | standard | Truthfulness, Informativeness |
|
||||
| `ScoreDoNotAnswer` | standard | Safety, Nuance |
|
||||
| `ScoreToxigen` | standard | Kindness, Awareness |
|
||||
|
||||
### Exact match (`exact.go`)
|
||||
|
||||
`scoreGSM8K` extracts numeric answers from free-text responses using pattern matching. Returns `*StandardScores` with `Correct`, `Extracted`, and `Expected` fields. No LLM required.
|
||||
|
||||
### Capability probes (`probes.go`)
|
||||
|
||||
23 binary pass/fail tests across four categories. Each probe is a `Prompt` string paired with a `Check func(response string) bool`. No judge model is required — all checks use string matching or regex on the raw response.
|
||||
|
||||
| Category | Probes | Examples |
|
||||
|----------|--------|---------|
|
||||
| Math (8) | arithmetic, algebra, probability, geometry, sequences, percentages | `347×29`, circle area, Fibonacci |
|
||||
| Logic (5) | deduction, puzzles, sets | syllogisms, river crossing, set cardinality |
|
||||
| Reasoning (5) | analogy, causal, spatial, temporal, pattern | analogies, fault diagnosis, compass directions |
|
||||
| Code (3) | code tracing, bug identification | Python slice, recursion, division-by-zero bug |
|
||||
| Word problems (2) | word | speed/distance, sibling counting |
|
||||
|
||||
`StripThinkBlocks()` removes `<think>...</think>` sections from DeepSeek R1 responses before checking.
|
||||
|
||||
---
|
||||
|
||||
## Agent Orchestrator
|
||||
|
||||
The agent subsystem (`agent_*.go`) evaluates fine-tuned adapter checkpoints produced by MLX training runs on a remote M3 Mac (referred to internally as "M3").
|
||||
|
||||
### Files
|
||||
|
||||
| File | LOC | Responsibility |
|
||||
|------|-----|---------------|
|
||||
| `agent_config.go` | 97 | `AgentConfig`, `Checkpoint`, `BaseModelMap`, `ModelFamilies`, `AdapterMeta()` |
|
||||
| `agent_execute.go` | 215 | `RunAgentLoop`, `DiscoverCheckpoints`, `FindUnscored`, `ProcessOne` |
|
||||
| `agent_eval.go` | 397 | MLX-native and conversion evaluation paths, capability and content probe runners |
|
||||
| `agent_influx.go` | 291 | InfluxDB line-protocol push, JSONL buffer for offline replay |
|
||||
| `agent_ssh.go` | 102 | `RemoteTransport` interface, `SSHTransport` implementation, utility helpers |
|
||||
|
||||
### Workflow
|
||||
|
||||
```
|
||||
RunAgentLoop
|
||||
│
|
||||
├── ReplayInfluxBuffer (flush any buffered writes from previous failures)
|
||||
├── DiscoverCheckpoints (SSH ls on M3 adapter directories)
|
||||
├── GetScoredLabels (InfluxDB query for already-scored (run_id, label) pairs)
|
||||
├── FindUnscored (set difference, sorted by dirname + iteration)
|
||||
└── ProcessOne (for each unscored checkpoint)
|
||||
│
|
||||
├── isMLXNative? YES → processMLXNative (serve directly via mlx_lm.server)
|
||||
│ NO → processWithConversion (MLX→GGUF, then llama-server)
|
||||
│
|
||||
├── RunCapabilityProbes (23 binary probes)
|
||||
├── RunContentProbes (sovereignty probes)
|
||||
├── ScoreCapabilityAndPush (judge + InfluxDB)
|
||||
└── ScoreContentAndPush (judge + InfluxDB)
|
||||
```
|
||||
|
||||
### RemoteTransport
|
||||
|
||||
`RemoteTransport` abstracts SSH/SCP so that tests can supply an in-memory fake:
|
||||
|
||||
```go
|
||||
type RemoteTransport interface {
|
||||
Run(ctx context.Context, cmd string) (string, error)
|
||||
CopyFrom(ctx context.Context, remote, local string) error
|
||||
CopyTo(ctx context.Context, local, remote string) error
|
||||
}
|
||||
```
|
||||
|
||||
`SSHTransport` implements this interface using the system `ssh` and `scp` binaries with a configurable port and timeout. `AgentConfig.Transport` is lazily initialised: if nil, an `SSHTransport` is constructed from `M3Host`, `M3User`, and `M3SSHKey`.
|
||||
|
||||
### Checkpoint discovery
|
||||
|
||||
`DiscoverCheckpoints` runs `ls -d adapters-*` on the remote host, then for each adapter directory checks for subdirectories matching `gemma-3-*` (supporting nested directory layouts). It then lists `*_adapters.safetensors` files and extracts the iteration number from the filename.
|
||||
|
||||
`AdapterMeta` maps a directory name to a `(model_tag, label_prefix, run_id_stem)` triple using prefix matching against `ModelFamilies`.
|
||||
|
||||
### Persistence
|
||||
|
||||
Results are written to two stores simultaneously:
|
||||
|
||||
- **InfluxDB** — line protocol over HTTP. Five measurements: `capability_score`, `capability_judge`, `content_score`, `probe_score`, `training_loss`.
|
||||
- **DuckDB** — local analytical database. Two tables: `checkpoint_scores`, `probe_results`.
|
||||
|
||||
If InfluxDB is unreachable, results are buffered to `influx_buffer.jsonl` (JSONL, one entry per line). `ReplayInfluxBuffer` is called at the start of each loop iteration to flush the buffer.
|
||||
|
||||
---
|
||||
|
||||
## Data Pipeline
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `ingest.go` | Load JSONL response files into `[]Response` slices |
|
||||
| `db.go` | DuckDB schema creation, insert, and query helpers |
|
||||
| `influx.go` | InfluxDB HTTP client (line protocol write, SQL query) |
|
||||
| `gguf.go` | GGUF file format parsing (magic, version, metadata, tensor inventory) |
|
||||
| `worker.go` | LEM API worker for distributed inference job dispatch |
|
||||
| `expand.go` | Prompt expansion using a backend |
|
||||
| `normalize.go` | Response normalisation utilities |
|
||||
| `parquet.go` | Parquet dataset export |
|
||||
|
||||
---
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| File | Tests | What is covered |
|
||||
|------|-------|----------------|
|
||||
| `adapter_test.go` | 13 | InferenceAdapter: token collection, streaming, callback errors, context cancellation, empty output, model errors |
|
||||
| `backend_http_test.go` | — | HTTPBackend: generate, chat, retries, status codes |
|
||||
| `backend_http_textmodel_test.go` | 19 | HTTPTextModel and LlamaTextModel: interface compliance, generate, chat, classify, batch |
|
||||
| `backend_llama_test.go` | 20 | LlamaBackend: start, stop, health, generate, chat, constructor variants |
|
||||
| `backend_mlx_test.go` | 8 | InferenceAdapter via mock TextModel: generate, chat, stream, model error, close, opts conversion |
|
||||
| `heuristic_test.go` | — | All nine heuristic sub-scores and LEK formula |
|
||||
| `judge_test.go` | — | JSON extraction variants, ScoreSemantic, ScoreContent |
|
||||
| `exact_test.go` | — | Numeric extraction patterns |
|
||||
| `probes_test.go` | — | All 23 capability probe Check functions |
|
||||
| `score_test.go` | — | Engine suite selection, ScoreAll grouping |
|
||||
| `score_race_test.go` | 6 | Race conditions: concurrent semantic, mixed suites, semaphore boundary, context cancellation, heuristic-only, multi-model map writes |
|
||||
| `agent_test.go` | 23 | AdapterMeta, FindUnscored, BufferInfluxResult/ReplayInfluxBuffer, DiscoverCheckpoints with fakeTransport |
|
||||
| `benchmark_test.go` | 25 | HeuristicScore (5 sizes), ExactMatch (4 patterns), JudgeExtractJSON (6 variants), ScoreAll (2 modes), heuristic sub-components (5 stages) |
|
||||
307
docs/development.md
Normal file
307
docs/development.md
Normal file
|
|
@ -0,0 +1,307 @@
|
|||
# go-ml Development Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required
|
||||
|
||||
- **Go 1.25** or later (the module uses `go 1.25.5`)
|
||||
- **Go workspace** — go-ml is part of the `host-uk/core` Go workspace; `replace` directives in `go.mod` resolve sibling modules from local paths
|
||||
|
||||
### Required sibling modules (local paths)
|
||||
|
||||
| Module | Local path | Notes |
|
||||
|--------|-----------|-------|
|
||||
| `forge.lthn.ai/core/go` | `../go` | Framework, process management, logging |
|
||||
| `forge.lthn.ai/core/go-inference` | `../go-inference` | Shared TextModel/Token interfaces |
|
||||
| `forge.lthn.ai/core/go-mlx` | `../go-mlx` | Metal GPU backend |
|
||||
|
||||
All three must be checked out as siblings of `go-ml` (i.e. all four directories share the same parent).
|
||||
|
||||
### Platform-specific
|
||||
|
||||
- **Metal GPU (`NewMLXBackend`)** — requires macOS on Apple Silicon (darwin/arm64). The `backend_mlx.go` file carries a `//go:build darwin && arm64` build tag and is excluded on other platforms. All other features work on Linux and amd64.
|
||||
- **llama-server** — the `llama-server` binary from llama.cpp must be on `PATH` or the path provided in `LlamaOpts.LlamaPath`.
|
||||
- **DuckDB** — uses CGo; a C compiler (`gcc` or `clang`) is required.
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
```bash
|
||||
# On first checkout, populate go.sum
|
||||
go mod download
|
||||
|
||||
# Verify the build (all platforms)
|
||||
go build ./...
|
||||
|
||||
# Verify the build excluding Metal backend (Linux / CI)
|
||||
GOFLAGS='-tags nomlx' go build ./...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build and Test Commands
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
go test ./...
|
||||
|
||||
# Run with race detector (recommended before committing)
|
||||
go test -race ./...
|
||||
|
||||
# Run a single test by name
|
||||
go test -v -run TestHeuristic ./...
|
||||
go test -v -run TestEngine_ScoreAll_ConcurrentSemantic ./...
|
||||
|
||||
# Run benchmarks
|
||||
go test -bench=. ./...
|
||||
go test -bench=BenchmarkHeuristicScore ./...
|
||||
|
||||
# Static analysis
|
||||
go vet ./...
|
||||
|
||||
# Tidy dependencies
|
||||
go mod tidy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Patterns
|
||||
|
||||
### Naming convention
|
||||
|
||||
Tests use a `_Good`, `_Bad`, `_Ugly` suffix pattern:
|
||||
|
||||
- `_Good` — happy path (expected success)
|
||||
- `_Bad` — expected error conditions (invalid input, unreachable server)
|
||||
- `_Ugly` — panic and edge-case paths
|
||||
|
||||
### Mock backends
|
||||
|
||||
For tests that exercise `Backend`-dependent code (judge, agent, scoring engine) without a real inference server, implement `Backend` directly:
|
||||
|
||||
```go
|
||||
type mockBackend struct {
|
||||
response string
|
||||
err error
|
||||
}
|
||||
|
||||
func (m *mockBackend) Generate(_ context.Context, _ string, _ ml.GenOpts) (string, error) {
|
||||
return m.response, m.err
|
||||
}
|
||||
func (m *mockBackend) Chat(_ context.Context, _ []ml.Message, _ ml.GenOpts) (string, error) {
|
||||
return m.response, m.err
|
||||
}
|
||||
func (m *mockBackend) Name() string { return "mock" }
|
||||
func (m *mockBackend) Available() bool { return true }
|
||||
```
|
||||
|
||||
### Mock TextModel
|
||||
|
||||
For tests that exercise `InferenceAdapter` without Metal GPU hardware, implement `inference.TextModel`:
|
||||
|
||||
```go
|
||||
type mockTextModel struct {
|
||||
tokens []string
|
||||
err error
|
||||
}
|
||||
|
||||
func (m *mockTextModel) Generate(ctx context.Context, prompt string, opts ...inference.GenerateOption) iter.Seq[inference.Token] {
|
||||
return func(yield func(inference.Token) bool) {
|
||||
for _, t := range m.tokens {
|
||||
if !yield(inference.Token{Text: t}) {
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// ... implement remaining TextModel methods
|
||||
func (m *mockTextModel) Err() error { return m.err }
|
||||
```
|
||||
|
||||
### Mock RemoteTransport
|
||||
|
||||
For agent tests that would otherwise require an SSH connection:
|
||||
|
||||
```go
|
||||
type fakeTransport struct {
|
||||
outputs map[string]string
|
||||
errors map[string]error
|
||||
}
|
||||
|
||||
func (f *fakeTransport) Run(_ context.Context, cmd string) (string, error) {
|
||||
if err, ok := f.errors[cmd]; ok {
|
||||
return "", err
|
||||
}
|
||||
return f.outputs[cmd], nil
|
||||
}
|
||||
func (f *fakeTransport) CopyFrom(_ context.Context, _, _ string) error { return nil }
|
||||
func (f *fakeTransport) CopyTo(_ context.Context, _, _ string) error { return nil }
|
||||
```
|
||||
|
||||
Inject via `AgentConfig.Transport`:
|
||||
|
||||
```go
|
||||
cfg := &ml.AgentConfig{
|
||||
Transport: &fakeTransport{outputs: map[string]string{...}},
|
||||
}
|
||||
```
|
||||
|
||||
### HTTP mock server
|
||||
|
||||
For `HTTPBackend` tests, use `net/http/httptest`:
|
||||
|
||||
```go
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
json.NewEncoder(w).Encode(map[string]any{
|
||||
"choices": []map[string]any{
|
||||
{"message": map[string]string{"role": "assistant", "content": "hello"}},
|
||||
},
|
||||
})
|
||||
}))
|
||||
defer srv.Close()
|
||||
backend := ml.NewHTTPBackend(srv.URL, "test-model")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Backend
|
||||
|
||||
A backend must implement `ml.Backend`:
|
||||
|
||||
```go
|
||||
type Backend interface {
|
||||
Generate(ctx context.Context, prompt string, opts GenOpts) (string, error)
|
||||
Chat(ctx context.Context, messages []Message, opts GenOpts) (string, error)
|
||||
Name() string
|
||||
Available() bool
|
||||
}
|
||||
```
|
||||
|
||||
### Steps
|
||||
|
||||
1. Create `backend_{name}.go` in the package root.
|
||||
2. Add the `// SPDX-Licence-Identifier: EUPL-1.2` header.
|
||||
3. Add a compile-time interface check:
|
||||
```go
|
||||
var _ Backend = (*MyBackend)(nil)
|
||||
```
|
||||
4. Implement `Generate` as a thin wrapper around `Chat` where possible (follows the pattern of `HTTPBackend`).
|
||||
5. Create `backend_{name}_test.go` with `_Good`, `_Bad`, and interface-compliance tests.
|
||||
6. Register the backend in `service.go`'s `OnStartup` if it warrants lifecycle management, or document that callers must register it via `Service.RegisterBackend`.
|
||||
|
||||
### GPU backends
|
||||
|
||||
If the backend wraps a `go-inference.TextModel` (e.g. a new hardware accelerator), use `InferenceAdapter` rather than re-implementing the polling/streaming logic:
|
||||
|
||||
```go
|
||||
m, err := myBackendPackage.LoadModel(modelPath)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return ml.NewInferenceAdapter(m, "my-backend"), nil
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Scoring Suite
|
||||
|
||||
1. Add a new scoring function or type in a dedicated file (e.g. `my_suite.go`).
|
||||
2. Add the suite name to `Engine.NewEngine`'s suite selection logic in `score.go`.
|
||||
3. Add a result field to `PromptScore` in `types.go`.
|
||||
4. Add the goroutine fan-out case in `Engine.ScoreAll` in `score.go`.
|
||||
5. Add race condition tests in `score_race_test.go`.
|
||||
|
||||
---
|
||||
|
||||
## Coding Standards
|
||||
|
||||
### Language
|
||||
|
||||
Use **UK English** throughout: colour, organisation, centre, licence (noun), authorise. The only exception is identifiers in external APIs that use American spellings — do not rename those.
|
||||
|
||||
### File headers
|
||||
|
||||
Every new file must begin with:
|
||||
|
||||
```go
|
||||
// SPDX-Licence-Identifier: EUPL-1.2
|
||||
```
|
||||
|
||||
### Strict types
|
||||
|
||||
All parameters and return types must be explicitly typed. Avoid `interface{}` or `any` except at JSON unmarshalling boundaries.
|
||||
|
||||
### Import grouping
|
||||
|
||||
Three groups, each separated by a blank line:
|
||||
|
||||
```go
|
||||
import (
|
||||
"context" // stdlib
|
||||
"fmt"
|
||||
|
||||
"forge.lthn.ai/core/go-inference" // forge.lthn.ai modules
|
||||
|
||||
"github.com/stretchr/testify/assert" // third-party
|
||||
)
|
||||
```
|
||||
|
||||
### Error wrapping
|
||||
|
||||
Use `fmt.Errorf("context: %w", err)` for wrapping. Use `log.E("pkg.Type.Method", "what failed", err)` from the Core framework for structured error logging with stack context.
|
||||
|
||||
### Concurrency
|
||||
|
||||
- Protect shared maps with `sync.RWMutex` or `sync.Mutex` as appropriate.
|
||||
- Use semaphore channels (buffered `chan struct{}`) to bound goroutine concurrency rather than `sync.Pool` or `errgroup` with fixed limits.
|
||||
- Always check `model.Err()` after exhausting a `go-inference` token iterator — the iterator itself carries no error; the error is stored on the model.
|
||||
|
||||
---
|
||||
|
||||
## Conventional Commits
|
||||
|
||||
Use the following scopes:
|
||||
|
||||
| Scope | When to use |
|
||||
|-------|-------------|
|
||||
| `backend` | Changes to any `backend_*.go` file or the `adapter.go` bridge |
|
||||
| `scoring` | Changes to `score.go`, `heuristic.go`, `judge.go`, `exact.go` |
|
||||
| `probes` | Changes to `probes.go` or capability probe definitions |
|
||||
| `agent` | Changes to any `agent_*.go` file |
|
||||
| `service` | Changes to `service.go` or `Options` |
|
||||
| `types` | Changes to `types.go` or `inference.go` interfaces |
|
||||
| `gguf` | Changes to `gguf.go` |
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
feat(backend): add ROCm backend via go-rocm InferenceAdapter
|
||||
fix(scoring): handle nil ContentScores when content probe not found
|
||||
refactor(agent): replace SSHCommand with SSHTransport.Run
|
||||
test(probes): add Check function coverage for all 23 probes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Co-Author and Licence
|
||||
|
||||
Every commit must include:
|
||||
|
||||
```
|
||||
Co-Authored-By: Virgil <virgil@lethean.io>
|
||||
```
|
||||
|
||||
The licence is **EUPL-1.2**. All source files carry the SPDX identifier in the header. Do not add licence headers to test files; the package-level declaration covers them.
|
||||
|
||||
---
|
||||
|
||||
## Forge Remote
|
||||
|
||||
The authoritative remote is `forge.lthn.ai/core/go-ml`:
|
||||
|
||||
```bash
|
||||
git push forge main
|
||||
```
|
||||
|
||||
The SSH remote URL is `ssh://git@forge.lthn.ai:2223/core/go-ml.git`. HTTPS authentication is not configured — always push via SSH.
|
||||
194
docs/history.md
Normal file
194
docs/history.md
Normal file
|
|
@ -0,0 +1,194 @@
|
|||
# go-ml Project History
|
||||
|
||||
## Origin: Extraction from go-ai (19 February 2026)
|
||||
|
||||
go-ml began as the `ai/ml/` subpackage inside `forge.lthn.ai/core/go-ai`. The monolith had grown to approximately 14,000 LOC and 53% of that was the ML subsystem. The ML code had zero internal dependencies on the rest of go-ai — it imported only `go-mlx` (external) and the Core `go` framework. The extraction was therefore clean: lift the directory, adjust the module path, and update the one import in go-ai that referenced it.
|
||||
|
||||
**What was extracted:**
|
||||
|
||||
- 41 Go source files (~7,494 LOC, excluding tests)
|
||||
- 6 test files covering backends, heuristic, judge, exact, probes, and score
|
||||
- All InfluxDB, DuckDB, Parquet, GGUF, and agent code
|
||||
|
||||
**After extraction:**
|
||||
|
||||
- go-ai dropped from ~14,000 to ~3,400 LOC (the `ai/` facade and `mcp/` hub remain there)
|
||||
- go-ml became an independent module at `forge.lthn.ai/core/go-ml`
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: go-inference Migration (Complete)
|
||||
|
||||
**Commit range:** `c3c2c14` (initial fix) through adapter and reverse adapter work.
|
||||
|
||||
**Problem:** The original `backend_mlx.go` imported subpackages from go-mlx (`go-mlx/cache`, `go-mlx/model`, `go-mlx/sample`, `go-mlx/tokenizer`) that no longer existed after go-mlx's Phase 4 refactoring. The file was 253 LOC of hand-rolled tokenisation, KV cache management, sampling loops, and memory cleanup — and none of it compiled.
|
||||
|
||||
**Solution:** Introduce `go-inference` as the abstraction layer between go-ml and hardware backends.
|
||||
|
||||
### Step 1.1 — Add go-inference dependency
|
||||
|
||||
Added `forge.lthn.ai/core/go-inference` to `go.mod` with a `replace` directive pointing to the local sibling checkout.
|
||||
|
||||
### Step 1.2 — Write InferenceAdapter (`adapter.go`)
|
||||
|
||||
Created `InferenceAdapter`, which wraps a `go-inference.TextModel` (returning `iter.Seq[Token]`) and exposes it as `ml.Backend` + `ml.StreamingBackend` (returning strings / calling `TokenCallback`). Thirteen test cases verified token collection, streaming, callback error propagation, context cancellation, empty output, and model errors after partial generation.
|
||||
|
||||
Key design decision: after exhausting the iterator, `model.Err()` is checked separately. The iterator itself does not carry errors; partial output is returned alongside the error so callers can decide whether to use or discard it.
|
||||
|
||||
### Step 1.3 — Rewrite `backend_mlx.go`
|
||||
|
||||
Replaced 253 LOC with approximately 35 LOC. The blank import `_ "forge.lthn.ai/core/go-mlx"` registers the Metal backend via go-mlx's `init()`. `inference.LoadModel()` then handles model loading, and `InferenceAdapter` handles the rest.
|
||||
|
||||
Memory controls (cache limits, memory limits) were deferred: go-mlx handles them internally, and callers that need explicit control can call `mlx.SetCacheLimit()` directly.
|
||||
|
||||
### Step 1.4 — Reverse adapters (`backend_http_textmodel.go`)
|
||||
|
||||
Added `HTTPTextModel` and `LlamaTextModel`, which wrap the existing `ml.Backend` implementations to satisfy `inference.TextModel`. This enables HTTP and llama-server backends to be used in packages (go-ai, go-i18n) that consume the go-inference interface. Since HTTP backends return complete strings rather than streaming tokens, each response is yielded as a single `Token`.
|
||||
|
||||
17 tests for `HTTPTextModel` and 2 for `LlamaTextModel` all pass.
|
||||
|
||||
### Step 1.5 — Downstream verification
|
||||
|
||||
Confirmed that `service.go` (`Backend.Generate()`), `judge.go` (`judgeChat()`), and `go-ai/mcp/tools_ml.go` (`ml.Service`) required no changes — `InferenceAdapter` satisfies `ml.Backend`, and the existing consumers are unaffected.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Backend Consolidation (Complete)
|
||||
|
||||
**Commit range:** `747e703` (Message unification) through `convertOpts` extension.
|
||||
|
||||
**Audit (Virgil, 20 February 2026):** Only two files in the entire ecosystem call `StreamingBackend` methods: `host-uk/cli/cmd/ml/cmd_serve.go` (SSE streaming at `/v1/completions` and `/v1/chat/completions`) and `cmd/ml/cmd_chat.go` (interactive terminal token echo). All other consumers use `Backend.Generate()` only.
|
||||
|
||||
### Step 2.1 — Unify Message types
|
||||
|
||||
`ml.Message` was a separate struct identical to `inference.Message`. Replaced with a type alias:
|
||||
|
||||
```go
|
||||
type Message = inference.Message
|
||||
```
|
||||
|
||||
This eliminated the `convertMessages()` helper from `adapter.go` and all explicit conversion sites. Backward-compatible: all existing callers continue to use `ml.Message` and compile unchanged.
|
||||
|
||||
### Step 2.2 — Extend GenOpts
|
||||
|
||||
Added `TopK`, `TopP`, and `RepeatPenalty` to `ml.GenOpts` to match the fields available in `inference.GenerateConfig`. Updated `convertOpts()` in `adapter.go` to map the new fields. Existing callers that only set `Temperature`, `MaxTokens`, and `Model` continue to work unchanged.
|
||||
|
||||
**Field type note:** `inference.GenerateConfig` uses `float32` for temperature and sampling fields; `ml.GenOpts` uses `float64` to match the conventions in the rest of go-ml. `convertOpts()` performs the narrowing conversion explicitly.
|
||||
|
||||
### Step 2.3 — Deprecate StreamingBackend
|
||||
|
||||
Added deprecation comment to `StreamingBackend` in `inference.go`. The interface is not removed because `host-uk/cli` depends on it. Migration of those CLI files is out of scope for go-ml.
|
||||
|
||||
### Step 2.4 — Document backend architecture
|
||||
|
||||
Added the "Backend Architecture" section to `CLAUDE.md` documenting the two interface families, adapter directions, and migration guidance.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Agent Loop Modernisation (Complete)
|
||||
|
||||
The original `agent.go` was a 1,070 LOC file mixing SSH commands, InfluxDB line protocol construction, probe evaluation, checkpoint discovery, and JSONL buffering. It had zero tests.
|
||||
|
||||
### Step 3.1 — Split into five files (Commit `eae9ec9`)
|
||||
|
||||
| File | LOC | Contents |
|
||||
|------|-----|---------|
|
||||
| `agent_config.go` | 97 | `AgentConfig`, `Checkpoint`, `BaseModelMap`, `ModelFamilies`, `AdapterMeta()` |
|
||||
| `agent_execute.go` | 215 | `RunAgentLoop`, `DiscoverCheckpoints`, `GetScoredLabels`, `FindUnscored`, `ProcessOne`, `isMLXNative` |
|
||||
| `agent_eval.go` | 397 | `processMLXNative`, `processWithConversion`, `RunCapabilityProbes`, `RunCapabilityProbesFull`, `RunContentProbes`, `ProbeResult` types |
|
||||
| `agent_influx.go` | 291 | `ScoreCapabilityAndPush`, `ScoreContentAndPush`, `PushCapability*`, `BufferInfluxResult`, `ReplayInfluxBuffer` |
|
||||
| `agent_ssh.go` | 102 | `SSHCommand`, `SCPFrom`, `SCPTo`, `fileBase`, `EnvOr`, `IntEnvOr`, `ExpandHome` |
|
||||
|
||||
`go build ./...`, `go test ./...`, and `go vet ./...` all passed after the split.
|
||||
|
||||
### Step 3.2 — Abstract SSH transport (Commit `1c2a6a6`)
|
||||
|
||||
Introduced the `RemoteTransport` interface with `Run`, `CopyFrom`, and `CopyTo` methods. `SSHTransport` implements this interface using the system `ssh` and `scp` binaries with functional options (`WithPort`, `WithTimeout`). `AgentConfig.Transport` accepts any `RemoteTransport`, with lazy initialisation to an `SSHTransport` when nil.
|
||||
|
||||
The old package-level functions `SSHCommand`, `SCPFrom`, and `SCPTo` are retained as deprecated wrappers that delegate to `AgentConfig.Transport`.
|
||||
|
||||
### Step 3.3 — Extract hardcoded infrastructure (Commit `12f3a1c`)
|
||||
|
||||
Extracted 15 constants from scattered magic values across 7 files:
|
||||
|
||||
- `EpochBase` — InfluxDB timestamp origin (Unix timestamp for 15 February 2025 00:00 UTC)
|
||||
- Five InfluxDB measurement names (`MeasurementCapabilityScore`, `MeasurementCapabilityJudge`, `MeasurementContentScore`, `MeasurementProbeScore`, `MeasurementTrainingLoss`)
|
||||
- Two DuckDB table names (`TableCheckpointScores`, `TableProbeResults`)
|
||||
- Probe evaluation defaults (`CapabilityTemperature`, `CapabilityMaxTokens`, `ContentTemperature`, `ContentMaxTokens`, `MaxStoredResponseLen`)
|
||||
- `InfluxBufferFile` — JSONL buffer filename
|
||||
- `LogSeparatorWidth` — banner line width
|
||||
|
||||
Hardcoded probe counts replaced with `len(CapabilityProbes)` and `len(ContentProbes)`.
|
||||
|
||||
### Step 3.4 — Agent tests (Commit `3e22761`)
|
||||
|
||||
First test coverage for the agent subsystem:
|
||||
|
||||
- `AdapterMeta()` — 8 tests: known families (12 entries), variant suffixes, subdirectory patterns, unknown fallback, no-prefix edge case
|
||||
- `FindUnscored()` — 5 tests: all unscored (sorted), some scored, all scored, empty input, nil scored map
|
||||
- `BufferInfluxResult()`/`ReplayInfluxBuffer()` — 4 tests: JSONL round-trip, multiple entries, empty file, missing file
|
||||
- `DiscoverCheckpoints()` — 6 tests using `fakeTransport`: 3 checkpoints across 2 dirs, subdirectory pattern, no adapters, SSH error, filter pattern, no safetensors files
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Test Coverage (Complete, Commit `09bf403`)
|
||||
|
||||
Added four test files covering previously untested areas:
|
||||
|
||||
**`backend_llama_test.go`** (20 tests) — Uses `net/http/httptest` to mock the llama-server HTTP API. Covers: `Name`, `Available` (4 variants including process-not-started and health endpoint failure), `Generate` (6 variants including context cancellation, empty choices, and opts forwarding), `Chat` (3 variants), `Stop`, constructor (4 variants), and interface compliance.
|
||||
|
||||
**`backend_mlx_test.go`** (8 tests) — Uses a mock `inference.TextModel`. No build tag required — tests run on all platforms without Metal GPU hardware. Covers: `Generate`, `Chat`, streaming, model error after partial output, `Close`, direct model access via `Model()`, interface compliance, and `convertOpts` field mapping.
|
||||
|
||||
**`score_race_test.go`** (6 tests) — Race condition tests run with `-race`:
|
||||
- `ConcurrentSemantic` — 20 responses scored with concurrency=4; verifies no data races on the result map
|
||||
- `ConcurrentMixedSuites` — semantic + standard + content fan-out simultaneously
|
||||
- `SemaphoreBoundary` — concurrency=1; verifies that at most 1 goroutine holds the semaphore at once
|
||||
- `ContextCancellation` — 400 error response from judge returns nil semantic score without panicking
|
||||
- `HeuristicOnlyNoRace` — 50 responses, heuristic only (no goroutines spawned); regression check
|
||||
- `MultiModelConcurrent` — 4 models × 5 concurrent goroutines writing to the results map
|
||||
|
||||
**`benchmark_test.go`** (25 benchmarks, baselines on M3 Ultra):
|
||||
- `HeuristicScore` — 5 input sizes (100–10,000 characters): 25µs–8.8ms
|
||||
- `ExactMatch` — 4 patterns: 171ns–2.1µs
|
||||
- `JudgeExtractJSON` — 6 response variants: 2.5–3.4µs
|
||||
- `Judge` round-trip — 2 suites (semantic, content): ~52µs
|
||||
- `ScoreAll` — 2 modes (heuristic only, full): 25µs–4.5ms
|
||||
- Sub-components — 5 heuristic stages: 244ns–88µs
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### StreamingBackend retention
|
||||
|
||||
`ml.StreamingBackend` cannot be removed until `host-uk/cli/cmd/ml/cmd_serve.go` and `cmd/ml/cmd_chat.go` are migrated to use `inference.TextModel` iterators directly. That migration is out of scope for go-ml and must be tracked in the `host-uk/cli` repository.
|
||||
|
||||
### LlamaTextModel streaming gap
|
||||
|
||||
`LlamaTextModel` implements `inference.TextModel` but does not actually stream tokens — it yields the complete llama-server HTTP response as a single `Token`. True token-level streaming from llama-server would require implementing SSE parsing, which is a separate effort.
|
||||
|
||||
### Agent infrastructure coupling
|
||||
|
||||
`AgentConfig` contains fields (`M3Host`, `M3User`, `M3SSHKey`, `M3AdapterBase`, `InfluxURL`, `InfluxDB`) that are tightly coupled to a specific deployment topology (M3 Mac + InfluxDB on `10.69.69.165`). While the `RemoteTransport` abstraction decouples tests from SSH, production deployments still hardcode the M3 as the checkpoint host.
|
||||
|
||||
### EpochBase timestamp
|
||||
|
||||
The `EpochBase` constant (`1739577600`, corresponding to 15 February 2025 00:00 UTC) is embedded in InfluxDB line protocol timestamps. All capability/content/probe timestamps derive from this base plus checkpoint iteration offsets. Changing `EpochBase` would require re-writing all historical InfluxDB data.
|
||||
|
||||
### HTTPBackend classify
|
||||
|
||||
`HTTPTextModel.Classify` returns an "unsupported" error. There is no path to add classification support to an OpenAI-compatible HTTP backend without a dedicated classification endpoint or prompt engineering.
|
||||
|
||||
### DuckDB CGo
|
||||
|
||||
The `go-duckdb` dependency requires CGo. This prevents cross-compilation from macOS to Linux without a cross-compilation toolchain. Binaries that import go-ml will require a C compiler at build time.
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- **ROCm backend** — `go-rocm` provides a llama-server subprocess backend for AMD GPUs. Once published, it can be wrapped with `InferenceAdapter` in the same pattern as `backend_mlx.go`, gated with a `//go:build linux && amd64` constraint.
|
||||
- **StreamingBackend removal** — Once `host-uk/cli` is migrated to `iter.Seq[Token]`, the `StreamingBackend` interface and `InferenceAdapter`'s `GenerateStream`/`ChatStream` methods can be removed.
|
||||
- **go-i18n integration** — go-i18n Phase 2a requires 5,000 sentences/second classification throughput from Gemma3-1B. The `InferenceAdapter` and `inference.TextModel.BatchGenerate` provide the interface; the performance target depends on go-mlx's batching implementation.
|
||||
- **LEM Lab pipeline wiring** — Integration tests for `backend_mlx.go` with a real model are deferred until the LEM Lab inference pipeline is fully wired. A smoke test against a small quantised model would confirm end-to-end Metal GPU inference through the go-inference abstraction.
|
||||
- **Charm SSH** — The `SSHTransport` currently shells out to the system `ssh` and `scp` binaries. Replacing these with pure-Go SSH via `charmbracelet/keygen` and a native SSH client would eliminate the subprocess dependency and improve testability.
|
||||
Loading…
Add table
Reference in a new issue