core/go-ml

Snider 58854390eb docs: add Phase 3 agent.go structure audit to FINDINGS.md

Documented 6 logical sections, hardcoded infrastructure,
and testable functions (AdapterMeta, FindUnscored, buffer).

Co-Authored-By: Virgil <virgil@lethean.io>

2026-02-20 02:13:06 +00:00

8.6 KiB

Raw Blame History

FINDINGS.md — go-ml Research & Discovery

2026-02-19: Split from go-ai (Virgil)

Origin

Split from go-ai on 19 Feb 2026. Was ai/ml/ subpackage inside forge.lthn.ai/core/go-ai. Zero internal go-ai dependencies — imports go-mlx (external module) and core/go framework only.

What Was Extracted

41 Go files (~7,494 LOC excluding tests)
6 test files (backend_http, exact, heuristic, judge, probes, score)
ml/ was 53% of go-ai's total LOC. After extraction, go-ai drops from ~14K to ~3.4K LOC (ai/ facade + mcp/ hub).

Dependencies

forge.lthn.ai/core/go-mlx — Metal GPU inference (backend_mlx.go, darwin/arm64 only)
forge.lthn.ai/core/go-inference — Shared TextModel/Backend/Token interfaces (target for Phase 1)
forge.lthn.ai/core/go — Framework services, process management, logging
github.com/marcboeker/go-duckdb — Analytics storage
github.com/parquet-go/parquet-go — Columnar data I/O
github.com/stretchr/testify — Test assertions

Consumers

go-ai/mcp/tools_ml.go — Exposes ML as MCP tools (uses ml.Service, ml.GenOpts, ml.Backend)
LEM Lab — Uses MLXBackend for chat inference
go-i18n Phase 2a — Needs 5K sentences/sec Gemma3-1B classification (blocked on go-inference)

go-inference Interface Mapping

Type Correspondence

go-ml	go-inference	Notes
`ml.Backend`	`inference.Backend`	Different semantics: ml returns string, inference returns TextModel
`ml.StreamingBackend`	(built into TextModel)	iter.Seq[Token] is inherently streaming
`ml.GenOpts`	`inference.GenerateConfig`	Use functional options: `WithMaxTokens(n)` etc.
`ml.Message`	`inference.Message`	Identical struct: Role + Content
`ml.TokenCallback`	(not needed)	iter.Seq[Token] replaces callbacks
(no equivalent)	`inference.Token`	`{ID int32, Text string}`
(no equivalent)	`inference.TextModel`	Generate/Chat return iter.Seq[Token]

Method Mapping

ml.Backend.Generate(ctx, prompt, GenOpts) → (string, error)
   ↕ InferenceAdapter collects tokens
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.StreamingBackend.GenerateStream(ctx, prompt, opts, TokenCallback) → error
   ↕ InferenceAdapter forwards tokens to callback
inference.TextModel.Generate(ctx, prompt, ...GenerateOption) → iter.Seq[Token]

ml.GenOpts{Temperature: 0.7, MaxTokens: 2048}
   ↕ convertOpts helper
inference.WithTemperature(0.7), inference.WithMaxTokens(2048)

backend_mlx.go Before/After

Before (253 LOC — BROKEN, old subpackage imports):

import (
    "forge.lthn.ai/core/go-mlx"
    "forge.lthn.ai/core/go-mlx/cache"    // REMOVED
    "forge.lthn.ai/core/go-mlx/model"    // REMOVED
    "forge.lthn.ai/core/go-mlx/sample"   // REMOVED
    "forge.lthn.ai/core/go-mlx/tokenizer"// REMOVED
)

type MLXBackend struct {
    model      model.Model
    tok        *tokenizer.Tokenizer
    caches     []cache.Cache
    sampler    sample.Sampler
    // ... manual tokenisation, KV cache mgmt, sampling loop, memory cleanup
}

After (~60 LOC — uses go-inference + InferenceAdapter):

import (
    "forge.lthn.ai/core/go-inference"
    _ "forge.lthn.ai/core/go-mlx"  // registers "metal" backend via init()
)

func NewMLXBackend(modelPath string) (*InferenceAdapter, error) {
    m, err := inference.LoadModel(modelPath)
    if err != nil { return nil, fmt.Errorf("mlx: %w", err) }
    return &InferenceAdapter{model: m, name: "mlx"}, nil
}

All tokenisation, KV cache, sampling, and memory management is now handled inside go-mlx's internal/metal/ package, accessed through the go-inference TextModel interface.

Scoring Engine Architecture

5 Suites

Suite	Method	LLM needed?	Metrics
Heuristic	Regex + word analysis	No	9 metrics → LEK composite
Semantic	LLM-as-judge	Yes	4 dimensions (sovereignty, ethical, creative, self-concept)
Content	LLM-as-judge	Yes	6 sovereignty probes (CCP, truth, engagement, etc.)
Standard	LLM-as-judge	Yes	TruthfulQA, DoNotAnswer, Toxigen
Exact	Numeric extraction	No	GSM8K answer matching

LEK Score Formula

LEK = EngagementDepth×2 + CreativeForm×3 + EmotionalRegister×2 + FirstPerson×1.5
    - ComplianceMarkers×5 - FormulaicPreamble×3 - Degeneration×4 - EmptyBroken×20

Positive signals: engagement depth, creative form, emotional register, first-person voice. Negative signals: RLHF compliance markers, formulaic preambles, text degeneration, empty/broken output.

Concurrency Model

Engine.ScoreAll() fans out goroutines bounded by semaphore (concurrency setting). Heuristic runs inline (instant). Semantic/content/standard run via worker pool with sync.WaitGroup. Results collected into []PromptScore via mutex.

Phase 2 Audit: StreamingBackend Usage (Virgil, 20 Feb 2026)

Callers of GenerateStream/ChatStream

Only 2 files across the entire ecosystem call StreamingBackend methods:

host-uk/cli/cmd/ml/cmd_serve.go (lines 146, 201, 319)
- Type-asserts backend.(ml.StreamingBackend) for SSE streaming
- /v1/completions → streamer.GenerateStream() (line 201)
- /v1/chat/completions → streamer.ChatStream() (line 319)
- Has non-streaming fallback: backend.Generate() when assertion fails
host-uk/cli/cmd/ml/cmd_chat.go
- Direct ChatStream() call for terminal token-by-token echo
- No fallback — assumes backend supports streaming

Non-streaming consumers (use Backend.Generate only)

File	Method	Notes
service.go	`Backend.Generate()`	Backend registry dispatch
judge.go	`Backend.Generate()`	Via judgeChat()
agent.go	`Backend.Generate()`	Probe evaluation
expand.go	`Backend.Generate()`	Prompt expansion
go-ai/mcp/tools_ml.go	`ml.Service`	Via service layer

Backend Implementation Status

Backend	Backend?	StreamingBackend?	Notes
InferenceAdapter	YES	YES	Bridges iter.Seq[Token] → callbacks
HTTPBackend	YES	NO	Returns complete string from API
LlamaBackend	YES	NO	Returns complete string via HTTP

Conclusion

StreamingBackend is only needed by host-uk/cli (2 files, out of go-ml scope). Safe to deprecate in go-ml with a comment. The actual migration of those CLI files is a separate task for the cli repo.

GenOpts vs GenerateConfig Field Comparison

ml.GenOpts	inference.GenerateConfig	Type
Temperature	Temperature	float64 vs float32
MaxTokens	MaxTokens	int (same)
Model	(none)	string
(none)	TopK	int
(none)	TopP	float32
(none)	StopTokens	[]int32
(none)	RepeatPenalty	float32
(none)	ReturnLogits	bool

Known Issues

~~backend_mlx.go imports dead subpackages~~ — FIXED in Phase 1 (c3c2c14)
agent.go too large — 1,070 LOC, SSH + InfluxDB + scoring + publishing mixed together
Hardcoded infrastructure — InfluxDB endpoint 10.69.69.165:8181, M3 SSH details in agent.go
No tests for backend_llama and backend_mlx — Only backend_http_test.go exists
score.go concurrency untested — No race condition tests
~~Message type duplication~~ — FIXED in Phase 2 (747e703): type alias Message = inference.Message

Phase 3 Audit: agent.go Structure (Virgil, 20 Feb 2026)

File Layout (1,070 LOC)

Section	Lines	LOC	Purpose
Types & Config	19–112	~95	`AgentConfig`, `Checkpoint`, config maps, `AdapterMeta()`
Main Loop	141–343	~200	`RunAgentLoop()`, checkpoint discovery, unscored filtering
Evaluation	345–700	~355	MLX-native + conversion paths, 4 probe functions
Judge & Push	708–887	~180	Scoring, InfluxDB line protocol, DuckDB dual-write
Buffering	926–977	~50	JSONL buffer for InfluxDB failures
SSH/SCP	979–1070	~90	`SSHCommand()`, `SCPFrom()`, `SCPTo()`, utility helpers

Hardcoded Infrastructure

SSH options duplicated across 3 functions: ConnectTimeout=10, BatchMode=yes, StrictHostKeyChecking=no
InfluxDB timestamp base: 1739577600 (13 Feb 2026 00:00 UTC)
InfluxDB measurements: probe_score, capability_score, capability_judge, content_score
DuckDB tables: checkpoint_scores, probe_results

Test Coverage

Zero tests for agent.go. Testable without infrastructure:

AdapterMeta() — pure function, dirname → metadata
FindUnscored() — filtering logic
BufferInfluxResult()/ReplayInfluxBuffer() — JSONL round-trip

8.6 KiB Raw Blame History Unescape Escape