go-ml/FINDINGS.md

# FINDINGS.md — go-ml Research & Discovery

## 2026-02-19: Split from go-ai (Virgil)

### Origin

Split from go-ai on 19 Feb 2026. Was `ai/ml/` subpackage inside `forge.lthn.ai/core/go-ai`. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.

### What Was Extracted

- 41 Go files (~7,494 LOC excluding tests)
- 6 test files (backend_http, exact, heuristic, judge, probes, score)
- ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).

### Dependencies

- `forge.lthn.ai/core/go-mlx` — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
- `forge.lthn.ai/core/go-inference` — Shared TextModel/Backend/Token interfaces (target for Phase 1)
- `forge.lthn.ai/core/go` — Framework services, process management, logging
- `github.com/marcboeker/go-duckdb` — Analytics storage
- `github.com/parquet-go/parquet-go` — Columnar data I/O
- `github.com/stretchr/testify` — Test assertions

### Consumers

- `go-ai/mcp/tools_ml.go` — Exposes ML as MCP tools (uses `ml.Service`, `ml.GenOpts`, `ml.Backend`)
- LEM Lab — Uses MLXBackend for chat inference
- go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)

## go-inference Interface Mapping

### Type Correspondence

| go-ml | go-inference | Notes |
|-------|-------------|-------|
| `ml.Backend` | `inference.Backend` | Different semantics: ml returns string, inference returns TextModel |
| `ml.StreamingBackend` | (built into TextModel) | iter.Seq[Token] is inherently streaming |
| `ml.GenOpts` | `inference.GenerateConfig` | Use functional options: `WithMaxTokens(n)` etc. |
| `ml.Message` | `inference.Message` | Identical struct: Role + Content |
| `ml.TokenCallback` | (not needed) | iter.Seq[Token] replaces callbacks |
| (no equivalent) | `inference.Token` | `{ID int32, Text string}` |
| (no equivalent) | `inference.TextModel` | Generate/Chat return iter.Seq[Token] |

### Method Mapping

```
ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
   ↕ InferenceAdapter collects tokens
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
   ↕ InferenceAdapter forwards tokens to callback
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
   ↕ convertOpts helper
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)
```

### backend_mlx.go Before/After

**Before** (253 LOC — BROKEN, old subpackage imports):
```go
import (
    "forge.lthn.ai/core/go-mlx"
    "forge.lthn.ai/core/go-mlx/cache"    // REMOVED
    "forge.lthn.ai/core/go-mlx/model"    // REMOVED
    "forge.lthn.ai/core/go-mlx/sample"   // REMOVED
    "forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
)

type MLXBackend struct {
    model      model.Model
    tok        *tokenizer.Tokenizer
    caches     []cache.Cache
    sampler    sample.Sampler
    // ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
}
```

**After** (~60 LOC — uses go-inference + InferenceAdapter):
```go
import (
    "forge.lthn.ai/core/go-inference"
    _ "forge.lthn.ai/core/go-mlx"  // registers "metal" backend via init()
)

func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
    m, err := inference.LoadModel(modelPath)
    if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
    return &InferenceAdapter{model: m, name: "mlx"}, nil
}
```

All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's `internal/metal/` package, accessed through the go-inference `TextModel` interface.

## Scoring Engine Architecture

### 5 Suites

| Suite | Method | LLM needed? | Metrics |
|-------|--------|-------------|---------|
| **Heuristic** | Regex + word analysis | No | 9 metrics → LEK composite |
| **Semantic** | LLM-as-judge | Yes | 4 dimensions (sovereignty, ethical, creative, self-concept) |
| **Content** | LLM-as-judge | Yes | 6 sovereignty probes (CCP, truth, engagement, etc.) |
| **Standard** | LLM-as-judge | Yes | TruthfulQA, DoNotAnswer, Toxigen |
| **Exact** | Numeric extraction | No | GSM8K answer matching |

### LEK Score Formula

```
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
    - ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
```

Positive signals: engagement depth, creative form, emotional register, first-person voice.
Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.

### Concurrency Model

`Engine.ScoreAll()` fans out goroutines bounded by semaphore (`concurrency` setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with `sync.WaitGroup`. Results collected into `[]PromptScore` via mutex.

## Known Issues

- **backend_mlx.go imports dead subpackages** — Blocked on Phase 1 migration
- **agent.go too large** — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
- **Hardcoded infrastructure** — InfluxDB endpoint `10.69.69.165:8181`, M3 SSH details in agent.go
- **No tests for backend_llama and backend_mlx** — Only backend_http_test.go exists
- **score.go concurrency untested** — No race condition tests
- **Message type duplication** — `ml.Message` and `inference.Message` are identical but separate