Rewrites CLAUDE.md with full interface mapping (ml.Backend → inference.TextModel), adapter design pattern, broken import context, and local dependency paths. Expands TODO.md Phase 1 into 5 concrete steps with code patterns for InferenceAdapter, backend_mlx.go rewrite, and downstream verification. Updates FINDINGS.md with type correspondence table and before/after comparison. Fixes go.mod replace directives for ~/Code/ layout and adds go-inference. Co-Authored-By: Virgil <virgil@lethean.io>
5.3 KiB
FINDINGS.md — go-ml Research & Discovery
2026-02-19: Split from go-ai (Virgil)
Origin
Split from go-ai on 19 Feb 2026. Was ai/ml/ subpackage inside forge.lthn.ai/core/go-ai. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.
What Was Extracted
- 41 Go files (~7,494 LOC excluding tests)
- 6 test files (backend_http, exact, heuristic, judge, probes, score)
- ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).
Dependencies
forge.lthn.ai/core/go-mlx— Metal GPU inference (backend_mlx.go, darwin/arm64 only)forge.lthn.ai/core/go-inference— Shared TextModel/Backend/Token interfaces (target for Phase 1)forge.lthn.ai/core/go— Framework services, process management, logginggithub.com/marcboeker/go-duckdb— Analytics storagegithub.com/parquet-go/parquet-go— Columnar data I/Ogithub.com/stretchr/testify— Test assertions
Consumers
go-ai/mcp/tools_ml.go— Exposes ML as MCP tools (usesml.Service,ml.GenOpts,ml.Backend)- LEM Lab — Uses MLXBackend for chat inference
- go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)
go-inference Interface Mapping
Type Correspondence
| go-ml | go-inference | Notes |
|---|---|---|
ml.Backend |
inference.Backend |
Different semantics: ml returns string, inference returns TextModel |
ml.StreamingBackend |
(built into TextModel) | iter.Seq[Token] is inherently streaming |
ml.GenOpts |
inference.GenerateConfig |
Use functional options: WithMaxTokens(n) etc. |
ml.Message |
inference.Message |
Identical struct: Role + Content |
ml.TokenCallback |
(not needed) | iter.Seq[Token] replaces callbacks |
| (no equivalent) | inference.Token |
{ID int32, Text string} |
| (no equivalent) | inference.TextModel |
Generate/Chat return iter.Seq[Token] |
Method Mapping
ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
↕ InferenceAdapter collects tokens
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
↕ InferenceAdapter forwards tokens to callback
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]
ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
↕ convertOpts helper
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)
backend_mlx.go Before/After
Before (253 LOC — BROKEN, old subpackage imports):
import (
"forge.lthn.ai/core/go-mlx"
"forge.lthn.ai/core/go-mlx/cache" // REMOVED
"forge.lthn.ai/core/go-mlx/model" // REMOVED
"forge.lthn.ai/core/go-mlx/sample" // REMOVED
"forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
)
type MLXBackend struct {
model model.Model
tok *tokenizer.Tokenizer
caches []cache.Cache
sampler sample.Sampler
// ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
}
After (~60 LOC — uses go-inference + InferenceAdapter):
import (
"forge.lthn.ai/core/go-inference"
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
)
func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
m, err := inference.LoadModel(modelPath)
if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
return &InferenceAdapter{model: m, name: "mlx"}, nil
}
All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's internal/metal/ package, accessed through the go-inference TextModel interface.
Scoring Engine Architecture
5 Suites
| Suite | Method | LLM needed? | Metrics |
|---|---|---|---|
| Heuristic | Regex + word analysis | No | 9 metrics → LEK composite |
| Semantic | LLM-as-judge | Yes | 4 dimensions (sovereignty, ethical, creative, self-concept) |
| Content | LLM-as-judge | Yes | 6 sovereignty probes (CCP, truth, engagement, etc.) |
| Standard | LLM-as-judge | Yes | TruthfulQA, DoNotAnswer, Toxigen |
| Exact | Numeric extraction | No | GSM8K answer matching |
LEK Score Formula
LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
- ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20
Positive signals: engagement depth, creative form, emotional register, first-person voice. Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.
Concurrency Model
Engine.ScoreAll() fans out goroutines bounded by semaphore (concurrency setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with sync.WaitGroup. Results collected into []PromptScore via mutex.
Known Issues
- backend_mlx.go imports dead subpackages — Blocked on Phase 1 migration
- agent.go too large — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
- Hardcoded infrastructure — InfluxDB endpoint
10.69.69.165:8181, M3 SSH details in agent.go - No tests for backend_llama and backend_mlx — Only backend_http_test.go exists
- score.go concurrency untested — No race condition tests
- Message type duplication —
ml.Messageandinference.Messageare identical but separate