go/TODO.md
Snider b2e78bf29e fix: resolve IO migration test failures in node, cache, and cli
- pkg/io/node: implement ReadFile (fs.ReadFileFS), Walk with WalkOptions,
  CopyFile, FromTar constructor; fix Exists test calls to match bool return
- pkg/cache: add Medium DI parameter, use errors.Is for wrapped ErrNotExist
- pkg/cli: add Medium DI to PIDFile and DaemonOptions for testability
- TODO.md: mark go-i18n article/irregular validator complete

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 10:58:57 +00:00

3.6 KiB

TODO.md — Core Go Dispatch Queue

Tasks dispatched from core/go orchestration to satellite repos. Format: - [ ] REPO: task description / - [x] when done.


go-i18n (forge.lthn.ai/core/go-i18n)

Phase 1: Harden the Engine

  • go-i18n: Add CLAUDE.md — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
  • go-i18n: Ambiguity resolution for dual-class words — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before → noun, after subject → verb).
  • go-i18n: Extend irregular verb coverage — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
  • go-i18n: Add benchmarksgrammar_test.go and reversal/tokeniser_test.go need Benchmark* functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.

Phase 2: Reference Distribution + 1B Classification Pipeline

2a: 1B Pre-Classification (based on LEK-1B benchmarks)

  • go-i18n: Classification benchmark suiteclassify_bench_test.go with 200+ domain-tagged sentences. Categories: {technical, creative, ethical, casual}. Ground truth for calibrating 1B pre-tags.
  • go-i18n: 1B pre-sort pipeline tool — CLI/func that reads JSONL corpus, classifies via LEK-Gemma3-1B, writes back with domain_1b field. Target: ~5K sentences/sec on M3.
  • go-i18n: 1B vs 27B calibration check — Sample 500 sentences, classify with both, measure agreement. 75% baseline from benchmarks, technical↔creative is known weak spot.
  • go-i18n: Article/irregular validatorvalidate.go + validate_test.go (14 tests). ValidateArticle(), ValidateIrregular(), batch variants. Commit 3c55d91.

2b: Reference Distributions

  • go-i18n: Reference distribution builder — Process 88K scored seeds through tokeniser + imprint. Pre-sort by domain_1b tag. Output: per-category reference distributions as JSON.
  • go-i18n: Imprint comparator — Distance metrics (cosine, KL divergence, Mahalanobis) against reference distributions. Classification signal with confidence. Poindexter integration point.
  • go-i18n: Cross-domain anomaly detection — Flag texts where 1B domain tag disagrees with imprint classification. Training signal or genuine cross-domain text — both valuable.

Phase 3: Multi-Language

  • go-i18n: Grammar table format spec — Document the exact JSON schema for gram.* keys so new languages can be added. Currently only inferred from en.json.
  • go-i18n: French grammar tables — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.

core/go (this repo)

  • core/go: Remove pkg/i18n dependency — Once core/cli imports go-i18n directly, remove pkg/i18n/ from this repo. The locale files are bad data and shouldn't be migrated.
  • core/go: Update go.work — Add go-i18n to the workspace for local development (go work use ../go-i18n).

Workflow

  1. Virgil (this session) writes tasks above after research
  2. Second GoLand session opens the target repo and works from this TODO
  3. When a task is done, mark [x] and note the commit/PR
  4. If a task reveals new work, add it here
  5. Scale to other repos once pattern is proven