- pkg/io/node: implement ReadFile (fs.ReadFileFS), Walk with WalkOptions, CopyFile, FromTar constructor; fix Exists test calls to match bool return - pkg/cache: add Medium DI parameter, use errors.Is for wrapped ErrNotExist - pkg/cli: add Medium DI to PIDFile and DaemonOptions for testability - TODO.md: mark go-i18n article/irregular validator complete Co-Authored-By: Virgil <virgil@lethean.io>
52 lines
3.6 KiB
Markdown
52 lines
3.6 KiB
Markdown
# TODO.md — Core Go Dispatch Queue
|
|
|
|
Tasks dispatched from core/go orchestration to satellite repos.
|
|
Format: `- [ ] REPO: task description` / `- [x]` when done.
|
|
|
|
---
|
|
|
|
## go-i18n (forge.lthn.ai/core/go-i18n)
|
|
|
|
### Phase 1: Harden the Engine
|
|
|
|
- [ ] **go-i18n: Add CLAUDE.md** — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
|
|
- [ ] **go-i18n: Ambiguity resolution for dual-class words** — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before → noun, after subject → verb).
|
|
- [ ] **go-i18n: Extend irregular verb coverage** — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
|
|
- [ ] **go-i18n: Add benchmarks** — `grammar_test.go` and `reversal/tokeniser_test.go` need `Benchmark*` functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
|
|
|
|
### Phase 2: Reference Distribution + 1B Classification Pipeline
|
|
|
|
#### 2a: 1B Pre-Classification (based on LEK-1B benchmarks)
|
|
|
|
- [ ] **go-i18n: Classification benchmark suite** — `classify_bench_test.go` with 200+ domain-tagged sentences. Categories: {technical, creative, ethical, casual}. Ground truth for calibrating 1B pre-tags.
|
|
- [ ] **go-i18n: 1B pre-sort pipeline tool** — CLI/func that reads JSONL corpus, classifies via LEK-Gemma3-1B, writes back with `domain_1b` field. Target: ~5K sentences/sec on M3.
|
|
- [ ] **go-i18n: 1B vs 27B calibration check** — Sample 500 sentences, classify with both, measure agreement. 75% baseline from benchmarks, technical↔creative is known weak spot.
|
|
- [x] **go-i18n: Article/irregular validator** — `validate.go` + `validate_test.go` (14 tests). `ValidateArticle()`, `ValidateIrregular()`, batch variants. Commit `3c55d91`.
|
|
|
|
#### 2b: Reference Distributions
|
|
|
|
- [ ] **go-i18n: Reference distribution builder** — Process 88K scored seeds through tokeniser + imprint. Pre-sort by `domain_1b` tag. Output: per-category reference distributions as JSON.
|
|
- [ ] **go-i18n: Imprint comparator** — Distance metrics (cosine, KL divergence, Mahalanobis) against reference distributions. Classification signal with confidence. Poindexter integration point.
|
|
- [ ] **go-i18n: Cross-domain anomaly detection** — Flag texts where 1B domain tag disagrees with imprint classification. Training signal or genuine cross-domain text — both valuable.
|
|
|
|
### Phase 3: Multi-Language
|
|
|
|
- [ ] **go-i18n: Grammar table format spec** — Document the exact JSON schema for `gram.*` keys so new languages can be added. Currently only inferred from `en.json`.
|
|
- [ ] **go-i18n: French grammar tables** — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
|
|
|
|
---
|
|
|
|
## core/go (this repo)
|
|
|
|
- [ ] **core/go: Remove pkg/i18n dependency** — Once core/cli imports go-i18n directly, remove `pkg/i18n/` from this repo. The locale files are bad data and shouldn't be migrated.
|
|
- [ ] **core/go: Update go.work** — Add go-i18n to the workspace for local development (`go work use ../go-i18n`).
|
|
|
|
---
|
|
|
|
## Workflow
|
|
|
|
1. Virgil (this session) writes tasks above after research
|
|
2. Second GoLand session opens the target repo and works from this TODO
|
|
3. When a task is done, mark `[x]` and note the commit/PR
|
|
4. If a task reveals new work, add it here
|
|
5. Scale to other repos once pattern is proven
|