go-ml/FINDINGS.md
Snider 58854390eb docs: add Phase 3 agent.go structure audit to FINDINGS.md
Documented 6 logical sections, hardcoded infrastructure,
and testable functions (AdapterMeta, FindUnscored, buffer).

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 02:13:06 +00:00

208 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FINDINGS.md — go-ml Research & Discovery
## 2026-02-19: Split from go-ai (Virgil)
### Origin
Split from go-ai on 19 Feb 2026. Was `ai/ml/` subpackage inside `forge.lthn.ai/core/go-ai`. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.
### What Was Extracted
- 41 Go files (~7,494 LOC excluding tests)
- 6 test files (backend_http, exact, heuristic, judge, probes, score)
- ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).
### Dependencies
- `forge.lthn.ai/core/go-mlx` — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
- `forge.lthn.ai/core/go-inference` — Shared TextModel/Backend/Token interfaces (target for Phase 1)
- `forge.lthn.ai/core/go` — Framework services, process management, logging
- `github.com/marcboeker/go-duckdb` — Analytics storage
- `github.com/parquet-go/parquet-go` — Columnar data I/O
- `github.com/stretchr/testify` — Test assertions
### Consumers
- `go-ai/mcp/tools_ml.go` — Exposes ML as MCP tools (uses `ml.Service`, `ml.GenOpts`, `ml.Backend`)
- LEM Lab — Uses MLXBackend for chat inference
- go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)
## go-inference Interface Mapping
### Type Correspondence
| go-ml | go-inference | Notes |
|-------|-------------|-------|
| `ml.Backend` | `inference.Backend` | Different semantics: ml returns string, inference returns TextModel |
| `ml.StreamingBackend` | (built into TextModel) | iter.Seq[Token] is inherently streaming |
| `ml.GenOpts` | `inference.GenerateConfig` | Use functional options: `WithMaxTokens(n)` etc. |
| `ml.Message` | `inference.Message` | Identical struct: Role + Content |
| `ml.TokenCallback` | (not needed) | iter.Seq[Token] replaces callbacks |
| (no equivalent) | `inference.Token` | `{ID int32, Text string}` |
| (no equivalent) | `inference.TextModel` | Generate/Chat return iter.Seq[Token] |
### Method Mapping
```
ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
↕ InferenceAdapter collects tokens
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
↕ InferenceAdapter forwards tokens to callback
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
↕ convertOpts helper
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)
```
### backend_mlx.go Before/After
**Before** (253 LOC — BROKEN, old subpackage imports):
```go
import (
"forge.lthn.ai/core/go-mlx"
"forge.lthn.ai/core/go-mlx/cache" // REMOVED
"forge.lthn.ai/core/go-mlx/model" // REMOVED
"forge.lthn.ai/core/go-mlx/sample" // REMOVED
"forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
)
type MLXBackend struct {
model model.Model
tok *tokenizer.Tokenizer
caches []cache.Cache
sampler sample.Sampler
// ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
}
```
**After** (~60 LOC — uses go-inference + InferenceAdapter):
```go
import (
"forge.lthn.ai/core/go-inference"
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
)
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
m, err := inference.LoadModel(modelPath)
if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
return &InferenceAdapter{model: m, name: "mlx"}, nil
}
```
All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's `internal/metal/` package, accessed through the go-inference `TextModel` interface.
## Scoring Engine Architecture
### 5 Suites
| Suite | Method | LLM needed? | Metrics |
|-------|--------|-------------|---------|
| **Heuristic** | Regex + word analysis | No | 9 metrics → LEK composite |
| **Semantic** | LLM-as-judge | Yes | 4 dimensions (sovereignty, ethical, creative, self-concept) |
| **Content** | LLM-as-judge | Yes | 6 sovereignty probes (CCP, truth, engagement, etc.) |
| **Standard** | LLM-as-judge | Yes | TruthfulQA, DoNotAnswer, Toxigen |
| **Exact** | Numeric extraction | No | GSM8K answer matching |
### LEK Score Formula
```
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
- ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
```
Positive signals: engagement depth, creative form, emotional register, first-person voice.
Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.
### Concurrency Model
`Engine.ScoreAll()` fans out goroutines bounded by semaphore (`concurrency` setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with `sync.WaitGroup`. Results collected into `[]PromptScore` via mutex.
## Phase 2 Audit: StreamingBackend Usage (Virgil, 20 Feb 2026)
### Callers of GenerateStream/ChatStream
Only 2 files across the entire ecosystem call StreamingBackend methods:
1. **`host-uk/cli/cmd/ml/cmd_serve.go`** (lines 146, 201, 319)
- Type-asserts `backend.(ml.StreamingBackend)` for SSE streaming
- `/v1/completions``streamer.GenerateStream()` (line 201)
- `/v1/chat/completions``streamer.ChatStream()` (line 319)
- Has non-streaming fallback: `backend.Generate()` when assertion fails
2. **`host-uk/cli/cmd/ml/cmd_chat.go`**
- Direct `ChatStream()` call for terminal token-by-token echo
- No fallback — assumes backend supports streaming
### Non-streaming consumers (use Backend.Generate only)
| File | Method | Notes |
|------|--------|-------|
| service.go | `Backend.Generate()` | Backend registry dispatch |
| judge.go | `Backend.Generate()` | Via judgeChat() |
| agent.go | `Backend.Generate()` | Probe evaluation |
| expand.go | `Backend.Generate()` | Prompt expansion |
| go-ai/mcp/tools_ml.go | `ml.Service` | Via service layer |
### Backend Implementation Status
| Backend | Backend? | StreamingBackend? | Notes |
|---------|----------|-------------------|-------|
| InferenceAdapter | YES | YES | Bridges iter.Seq[Token] → callbacks |
| HTTPBackend | YES | NO | Returns complete string from API |
| LlamaBackend | YES | NO | Returns complete string via HTTP |
### Conclusion
StreamingBackend is only needed by `host-uk/cli` (2 files, out of go-ml scope). Safe to deprecate in go-ml with a comment. The actual migration of those CLI files is a separate task for the cli repo.
### GenOpts vs GenerateConfig Field Comparison
| ml.GenOpts | inference.GenerateConfig | Type |
|-----------|--------------------------|------|
| Temperature | Temperature | float64 vs float32 |
| MaxTokens | MaxTokens | int (same) |
| Model | (none) | string |
| (none) | TopK | int |
| (none) | TopP | float32 |
| (none) | StopTokens | []int32 |
| (none) | RepeatPenalty | float32 |
| (none) | ReturnLogits | bool |
## Known Issues
- ~~**backend_mlx.go imports dead subpackages**~~ — FIXED in Phase 1 (`c3c2c14`)
- **agent.go too large** — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
- **Hardcoded infrastructure** — InfluxDB endpoint `10.69.69.165:8181`, M3 SSH details in agent.go
- **No tests for backend_llama and backend_mlx** — Only backend_http_test.go exists
- **score.go concurrency untested** — No race condition tests
- ~~**Message type duplication**~~ — FIXED in Phase 2 (`747e703`): type alias `Message = inference.Message`
## Phase 3 Audit: agent.go Structure (Virgil, 20 Feb 2026)
### File Layout (1,070 LOC)
| Section | Lines | LOC | Purpose |
|---------|-------|-----|---------|
| Types & Config | 19112 | ~95 | `AgentConfig`, `Checkpoint`, config maps, `AdapterMeta()` |
| Main Loop | 141343 | ~200 | `RunAgentLoop()`, checkpoint discovery, unscored filtering |
| Evaluation | 345700 | ~355 | MLX-native + conversion paths, 4 probe functions |
| Judge & Push | 708887 | ~180 | Scoring, InfluxDB line protocol, DuckDB dual-write |
| Buffering | 926977 | ~50 | JSONL buffer for InfluxDB failures |
| SSH/SCP | 9791070 | ~90 | `SSHCommand()`, `SCPFrom()`, `SCPTo()`, utility helpers |
### Hardcoded Infrastructure
- SSH options duplicated across 3 functions: `ConnectTimeout=10, BatchMode=yes, StrictHostKeyChecking=no`
- InfluxDB timestamp base: `1739577600` (13 Feb 2026 00:00 UTC)
- InfluxDB measurements: `probe_score`, `capability_score`, `capability_judge`, `content_score`
- DuckDB tables: `checkpoint_scores`, `probe_results`
### Test Coverage
Zero tests for agent.go. Testable without infrastructure:
- `AdapterMeta()` — pure function, dirname → metadata
- `FindUnscored()` — filtering logic
- `BufferInfluxResult()`/`ReplayInfluxBuffer()` — JSONL round-trip