- pkg/io/node: implement ReadFile (fs.ReadFileFS), Walk with WalkOptions, CopyFile, FromTar constructor; fix Exists test calls to match bool return - pkg/cache: add Medium DI parameter, use errors.Is for wrapped ErrNotExist - pkg/cli: add Medium DI to PIDFile and DaemonOptions for testability - TODO.md: mark go-i18n article/irregular validator complete Co-Authored-By: Virgil <virgil@lethean.io>
3.6 KiB
3.6 KiB
TODO.md — Core Go Dispatch Queue
Tasks dispatched from core/go orchestration to satellite repos.
Format: - [ ] REPO: task description / - [x] when done.
go-i18n (forge.lthn.ai/core/go-i18n)
Phase 1: Harden the Engine
- go-i18n: Add CLAUDE.md — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
- go-i18n: Ambiguity resolution for dual-class words — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before → noun, after subject → verb).
- go-i18n: Extend irregular verb coverage — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
- go-i18n: Add benchmarks —
grammar_test.goandreversal/tokeniser_test.goneedBenchmark*functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
Phase 2: Reference Distribution + 1B Classification Pipeline
2a: 1B Pre-Classification (based on LEK-1B benchmarks)
- go-i18n: Classification benchmark suite —
classify_bench_test.gowith 200+ domain-tagged sentences. Categories: {technical, creative, ethical, casual}. Ground truth for calibrating 1B pre-tags. - go-i18n: 1B pre-sort pipeline tool — CLI/func that reads JSONL corpus, classifies via LEK-Gemma3-1B, writes back with
domain_1bfield. Target: ~5K sentences/sec on M3. - go-i18n: 1B vs 27B calibration check — Sample 500 sentences, classify with both, measure agreement. 75% baseline from benchmarks, technical↔creative is known weak spot.
- go-i18n: Article/irregular validator —
validate.go+validate_test.go(14 tests).ValidateArticle(),ValidateIrregular(), batch variants. Commit3c55d91.
2b: Reference Distributions
- go-i18n: Reference distribution builder — Process 88K scored seeds through tokeniser + imprint. Pre-sort by
domain_1btag. Output: per-category reference distributions as JSON. - go-i18n: Imprint comparator — Distance metrics (cosine, KL divergence, Mahalanobis) against reference distributions. Classification signal with confidence. Poindexter integration point.
- go-i18n: Cross-domain anomaly detection — Flag texts where 1B domain tag disagrees with imprint classification. Training signal or genuine cross-domain text — both valuable.
Phase 3: Multi-Language
- go-i18n: Grammar table format spec — Document the exact JSON schema for
gram.*keys so new languages can be added. Currently only inferred fromen.json. - go-i18n: French grammar tables — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
core/go (this repo)
- core/go: Remove pkg/i18n dependency — Once core/cli imports go-i18n directly, remove
pkg/i18n/from this repo. The locale files are bad data and shouldn't be migrated. - core/go: Update go.work — Add go-i18n to the workspace for local development (
go work use ../go-i18n).
Workflow
- Virgil (this session) writes tasks above after research
- Second GoLand session opens the target repo and works from this TODO
- When a task is done, mark
[x]and note the commit/PR - If a task reveals new work, add it here
- Scale to other repos once pattern is proven