Reference distribution builder: - BuildReferences() tokenises samples, computes per-domain centroid imprints - Per-key variance for Mahalanobis distance normalisation Imprint comparator: - Compare() returns cosine, KL divergence, Mahalanobis per domain - Classify() picks best domain with confidence margin - Symmetric KL with epsilon smoothing, component-weighted Cross-domain anomaly detection: - DetectAnomalies() flags model vs imprint domain disagreements - AnomalyStats tracks rate and confusion pair counts 17 new tests, all race-clean. Phase 2b complete. Co-Authored-By: Virgil <virgil@lethean.io>
7.4 KiB
TODO.md — go-i18n Task Queue
Dispatched from core/go orchestration. Pick up tasks in order.
Phase 1: Harden the Engine
- Add CLAUDE.md — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition. (
d5b3eac) - Ambiguity resolution for dual-class words — Two-pass probabilistic disambiguation with 7 weighted signals. All 6 dual-class words {commit, run, test, check, file, build} correctly disambiguate in both verb and noun contexts. Confidence scores flow into imprints. (
3848297) - Extend irregular verb coverage — Added 44 irregular verbs: 17 compound (undo, redo, rerun, rewrite, rebuild, resend, override, rethink, remake, undergo, overcome, withdraw, uphold, withhold, outgrow, outrun, overshoot), 22 simple (become, come, give, fall, understand, arise, bind, spin, quit, cast, broadcast, burst, cost, shed, rid, shrink, shoot, forbid, offset, upset, input, output), 5 CVC doubling overrides (debug, embed, unzip, remap, unpin, unwrap).
- Add benchmarks — 8 forward composition + 7 reversal benchmarks. Baseline on M3 Ultra: PastTense 26ns/0alloc, Tokenise(short) 639ns/8alloc, Imprint 648ns/10alloc, Similar 516ns/0alloc.
Phase 2: Reference Distribution + 1B Classification Pipeline
2a: 1B Pre-Classification — UNBLOCKED (19 Feb 2026)
go-mlx Phases 2-5 are complete. Gemma3-1B inference validated at 46 tok/s, batch classify at 152 prompts/sec on M3 Ultra. Import go-inference + go-mlx directly — no go-ai needed.
Setup: Add to go.mod:
require forge.lthn.ai/core/go-inference v0.0.0
require forge.lthn.ai/core/go-mlx v0.0.0
replace forge.lthn.ai/core/go-inference => ../go-inference
replace forge.lthn.ai/core/go-mlx => ../go-mlx
Usage:
import (
"forge.lthn.ai/core/go-inference"
_ "forge.lthn.ai/core/go-mlx" // registers "metal" backend via init()
)
m, err := inference.LoadModel("/Volumes/Data/lem/LEM-Gemma3-1B-layered-v2")
defer m.Close()
// Option A: Batch classify (prefill-only, 152 prompts/sec) — best for domain sorting
results, err := m.Classify(ctx, prompts, inference.WithMaxTokens(1))
// Option B: Single-token generation (46 tok/s) — for article/irregular validation
for tok := range m.Generate(ctx, prompt, inference.WithMaxTokens(1), inference.WithTemperature(0.05)) {
fmt.Print(tok.Text)
}
// Model discovery (finds all models under a directory)
models, _ := inference.Discover("/Volumes/Data/lem/")
Key types (all from inference package): Token, Message, TextModel, Backend, GenerateConfig, ClassifyResult, BatchResult, GenerateMetrics.
- Classification benchmark suite — 220 domain-tagged sentences, leave-one-out classification via imprint similarity. Grammar engine: technical 78%, creative 82%, ethical 46%, casual 11%. Ethical↔technical and casual↔creative confusion confirms 1B model needed for those domains.
- 1B pre-sort pipeline tool —
ClassifyCorpus()inclassify.go. Streaming JSONL batch classification viainference.TextModel.Classify(). Mock-tested (3 test cases) + integration-tested with real Gemma3-1B (80 prompts/sec on 50-prompt run, 100% domain accuracy). Configurable batch size, prompt field, and template.
Virgil Review: Fix Before Continuing (20 Feb 2026)
Do these first, in order, before picking up the next Phase 2a task.
-
Fix go.mod: remove go-mlx from module require — Removed go-mlx
requireandreplacefrom go.mod. Moved integration test tointegration/sub-module with its own go.mod that depends on go-mlx. Main module now compiles cleanly on all platforms.go mod tidyno longer pulls go-mlx. -
Fix go.mod: go-inference pseudo-version —
go mod tidyresolved to the standard replaced-module pseudo-versionv0.0.0-00010101000000-000000000000. CI-safe. -
Fix mapTokenToDomain prefix collision — Replaced
strings.HasPrefixwith exact match + known BPE fragment fallback. Added test cases for "castle", "cascade", "credential", "creature" — all return "unknown". -
Fix classify_bench_test.go naming — Added
testing.Short()skip toTestClassification_DomainSeparationandTestClassification_LeaveOneOut(the two O(n^2) tests). Verified withgo test -short -v. -
Add accuracy assertion to integration test — Integration test now asserts at least 80% (40/50) of technical prompts classified as "technical". Logs full domain breakdown and misclassified entries on failure. Test moved to
integration/sub-module.
Remaining Phase 2a Tasks
- 1B vs 27B calibration check —
CalibrateDomains()incalibrate.go. Accepts two TextModels + 500 CalibrationSamples (220 ground-truth + 280 unlabelled). Batch-classifies with both models, computes agreement rate, per-domain distribution, confusion pairs, and accuracy vs ground truth. 7 mock tests (race-clean). Integration test atintegration/calibrate_test.goloads LEM-1B + Gemma3-27B from/Volumes/Data/lem/, runs full calibration with detailed reporting. Run with:cd integration && go test -v -run TestCalibrateDomains_1Bvs27B - Article/irregular validator — Lightweight Go funcs that use the 1B model's strong article correctness (100%) and irregular base form accuracy (100%) as fast validators. Use
m.Generate()withinference.WithMaxTokens(1)andinference.WithTemperature(0.05)for single-token classification.
2b: Reference Distributions
- Reference distribution builder —
BuildReferences()inreversal/reference.go. Tokenises samples, builds imprints, computes per-domain centroid (averaged maps + normalised) and per-key variance for Mahalanobis.ReferenceSetholds all domain references. 7 tests. - Imprint comparator —
ReferenceSet.Compare()+ReferenceSet.Classify()inreversal/reference.go. Three distance metrics: cosine similarity (reusesSimilar()), symmetric KL divergence (epsilon-smoothed, weighted by component), simplified Mahalanobis (variance-normalised, Euclidean fallback). Classify returns best domain + confidence margin. 5 tests for distance functions. - Cross-domain anomaly detection —
ReferenceSet.DetectAnomalies()inreversal/anomaly.go. Tokenises + classifies each sample against references, flags where model domain != imprint domain. Returns[]AnomalyResult+AnomalyStats(rate, by-pair breakdown). 5 tests including mismatch detection ("She painted the sunset" tagged technical → flagged as creative anomaly).
Phase 3: Multi-Language
- Grammar table format spec — Full JSON schema documented in
docs/grammar-table-spec.md. Covers all 7gram.*sub-keys, required/optional fields, examples for English and French, 3-tier fallback chain, and new-language checklist. - French grammar tables — 50 verbs, 24 gendered nouns, gendered articles (by_gender), French punctuation spacing, 33 noun determiners, 21 verb auxiliaries. Loader extended to parse
by_genderarticle map. Stress test confirms: verb/noun/article/punct/signal all load correctly; elision (l') and plural articles (les/des) need future Article() extension.
Workflow
- Virgil in core/go writes tasks here after research
- This repo's session picks up tasks in phase order
- Mark
[x]when done, note commit hash - New discoveries → add tasks, flag in FINDINGS.md