go-ml/FINDINGS.md
Snider 84757b8331 docs: comprehensive domain expert session brief for go-inference migration
Rewrites CLAUDE.md with full interface mapping (ml.Backend → inference.TextModel),
adapter design pattern, broken import context, and local dependency paths.

Expands TODO.md Phase 1 into 5 concrete steps with code patterns for
InferenceAdapter, backend_mlx.go rewrite, and downstream verification.

Updates FINDINGS.md with type correspondence table and before/after comparison.

Fixes go.mod replace directives for ~/Code/ layout and adds go-inference.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:55:50 +00:00

5.3 KiB
Raw Blame History

FINDINGS.md — go-ml Research & Discovery

2026-02-19: Split from go-ai (Virgil)

Origin

Split from go-ai on 19 Feb 2026. Was ai/ml/ subpackage inside forge.lthn.ai/core/go-ai. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.

What Was Extracted

  • 41 Go files (~7,494 LOC excluding tests)
  • 6 test files (backend_http, exact, heuristic, judge, probes, score)
  • ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).

Dependencies

  • forge.lthn.ai/core/go-mlx — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
  • forge.lthn.ai/core/go-inference — Shared TextModel/Backend/Token interfaces (target for Phase 1)
  • forge.lthn.ai/core/go — Framework services, process management, logging
  • github.com/marcboeker/go-duckdb — Analytics storage
  • github.com/parquet-go/parquet-go — Columnar data I/O
  • github.com/stretchr/testify — Test assertions

Consumers

  • go-ai/mcp/tools_ml.go — Exposes ML as MCP tools (uses ml.Service, ml.GenOpts, ml.Backend)
  • LEM Lab — Uses MLXBackend for chat inference
  • go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)

go-inference Interface Mapping

Type Correspondence

go-ml go-inference Notes
ml.Backend inference.Backend Different semantics: ml returns string, inference returns TextModel
ml.StreamingBackend (built into TextModel) iter.Seq[Token] is inherently streaming
ml.GenOpts inference.GenerateConfig Use functional options: WithMaxTokens(n) etc.
ml.Message inference.Message Identical struct: Role + Content
ml.TokenCallback (not needed) iter.Seq[Token] replaces callbacks
(no equivalent) inference.Token {ID int32, Text string}
(no equivalent) inference.TextModel Generate/Chat return iter.Seq[Token]

Method Mapping

ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
   ↕ InferenceAdapter collects tokens
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
   ↕ InferenceAdapter forwards tokens to callback
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
   ↕ convertOpts helper
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)

backend_mlx.go Before/After

Before (253 LOC — BROKEN, old subpackage imports):

import (
    "forge.lthn.ai/core/go-mlx"
    "forge.lthn.ai/core/go-mlx/cache"    // REMOVED
    "forge.lthn.ai/core/go-mlx/model"    // REMOVED
    "forge.lthn.ai/core/go-mlx/sample"   // REMOVED
    "forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
)

type MLXBackend struct {
    model      model.Model
    tok        *tokenizer.Tokenizer
    caches     []cache.Cache
    sampler    sample.Sampler
    // ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
}

After (~60 LOC — uses go-inference + InferenceAdapter):

import (
    "forge.lthn.ai/core/go-inference"
    _ "forge.lthn.ai/core/go-mlx"  // registers "metal" backend via init()
)

func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
    m, err := inference.LoadModel(modelPath)
    if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
    return &InferenceAdapter{model: m, name: "mlx"}, nil
}

All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's internal/metal/ package, accessed through the go-inference TextModel interface.

Scoring Engine Architecture

5 Suites

Suite Method LLM needed? Metrics
Heuristic Regex + word analysis No 9 metrics → LEK composite
Semantic LLM-as-judge Yes 4 dimensions (sovereignty, ethical, creative, self-concept)
Content LLM-as-judge Yes 6 sovereignty probes (CCP, truth, engagement, etc.)
Standard LLM-as-judge Yes TruthfulQA, DoNotAnswer, Toxigen
Exact Numeric extraction No GSM8K answer matching

LEK Score Formula

LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
    - ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20

Positive signals: engagement depth, creative form, emotional register, first-person voice. Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.

Concurrency Model

Engine.ScoreAll() fans out goroutines bounded by semaphore (concurrency setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with sync.WaitGroup. Results collected into []PromptScore via mutex.

Known Issues

  • backend_mlx.go imports dead subpackages — Blocked on Phase 1 migration
  • agent.go too large — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
  • Hardcoded infrastructure — InfluxDB endpoint 10.69.69.165:8181, M3 SSH details in agent.go
  • No tests for backend_llama and backend_mlx — Only backend_http_test.go exists
  • score.go concurrency untested — No race condition tests
  • Message type duplicationml.Message and inference.Message are identical but separate