docs: add orchestration dispatch queue and research findings
TODO.md tracks tasks dispatched to satellite repos (go-i18n phases 1-3). FINDINGS.md records go-i18n architecture assessment and CoreDeno PR #9 review. Phase 2 expanded with 1B classification pipeline based on LEK benchmarks. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
0192772ab5
commit
3ff7b8a773
2 changed files with 129 additions and 0 deletions
77
FINDINGS.md
Normal file
77
FINDINGS.md
Normal file
|
|
@ -0,0 +1,77 @@
|
|||
# FINDINGS.md — Core Go Research
|
||||
|
||||
## go-i18n (forge.lthn.ai/core/go-i18n)
|
||||
|
||||
**Explored**: 2026-02-19
|
||||
**Location**: `/Users/snider/Code/host-uk/go-i18n`
|
||||
**Module**: `forge.lthn.ai/core/go-i18n`
|
||||
**State**: 20 commits on main, clean, all tests pass
|
||||
**Lines**: ~5,800 across 32 files (14 test files)
|
||||
**Deps**: only `golang.org/x/text`
|
||||
|
||||
### What It Is
|
||||
|
||||
A **grammar engine** — not a translation file manager. Provides:
|
||||
|
||||
1. **Forward composition**: `PastTense()`, `Gerund()`, `Pluralize()`, `Article()`, handlers
|
||||
2. **Reverse grammar**: Tokeniser reads grammar tables backwards to extract structure
|
||||
3. **GrammarImprint**: Feature vector projection (content → grammar fingerprint, lossy)
|
||||
4. **Multiplier**: Deterministic training data augmentation (no LLM)
|
||||
|
||||
Consumers (core/cli, apps) bring their own translation files. go-i18n provides the grammar primitives.
|
||||
|
||||
### Current Capabilities
|
||||
|
||||
| Feature | Status | Notes |
|
||||
|---------|--------|-------|
|
||||
| Grammar primitives (past/gerund/plural/article) | Working | 100 irregular verbs, 40 irregular nouns |
|
||||
| Magic namespace handlers (i18n.label/progress/count/done/fail/numeric) | Working | 6 handler types |
|
||||
| Service + message lookup | Working | Thread-safe, fallback chain |
|
||||
| Subject builder (S()) | Working | Fluent API with count/gender/location/formality |
|
||||
| Plural categories (CLDR) | Working | 7+ languages |
|
||||
| RTL/LTR detection | Working | 12+ RTL languages |
|
||||
| Number formatting | Working | Locale-specific separators |
|
||||
| Reversal tokeniser | Working | 3-tier: JSON → irregular → regular morphology |
|
||||
| GrammarImprint similarity | Working | Weighted cosine (verbs 30%, tense 20%, nouns 25%) |
|
||||
| Multiplier expand | Working | Tense + number flipping, dedup, round-trip verify |
|
||||
|
||||
### What's Missing / Incomplete
|
||||
|
||||
| Gap | Priority | Notes |
|
||||
|-----|----------|-------|
|
||||
| Reference distribution builder | High | Process scored seeds → calibrate imprints |
|
||||
| Non-English grammar tables | Medium | Only en.json exists, reversal needs gram.* per language |
|
||||
| Ambiguity resolution | Medium | "run", "file", "test" are both verb and noun |
|
||||
| Domain vocabulary expansion | Low | 150+ words, needs legal/medical/financial |
|
||||
| Poindexter integration | Deferred | Awaiting Poindexter library |
|
||||
| TIM container image | Deferred | Distroless Go binary for confidential compute |
|
||||
|
||||
### Key Architecture Decisions
|
||||
|
||||
- **Bijective grammar tables**: Forward and reverse use same JSON → reversal is deterministic
|
||||
- **Lossy projection**: GrammarImprint intentionally loses content, preserves only structure
|
||||
- **No LLM dependency**: Multiplier generates variants purely from morphological rules
|
||||
- **Consumer translations are external**: go-i18n doesn't ship or manage app-specific locale files
|
||||
- **gram.* keys are sacred**: Agents MUST NOT flatten — grammar engine depends on nested structure
|
||||
|
||||
### pkg/i18n in core/go
|
||||
|
||||
- Full i18n framework with 34 locale files — but locale data is bad/stale
|
||||
- Only imported by `pkg/cli/` which has been extracted to `core/cli`
|
||||
- Effectively orphaned in core/go
|
||||
- Can be removed once core/cli imports go-i18n directly
|
||||
- The locale files need full rework, not migration
|
||||
|
||||
---
|
||||
|
||||
## CoreDeno (PR #9 — merged)
|
||||
|
||||
**Explored**: 2026-02-19
|
||||
|
||||
Deno sidecar for core-gui JS runtime. Go↔Deno bidirectional bridge:
|
||||
- Go→Deno: JSON-RPC over Unix socket (module lifecycle)
|
||||
- Deno→Go: gRPC over Unix socket (file I/O, store, manifest)
|
||||
- Each module in isolated Deno Worker with declared permissions
|
||||
- Marketplace: git clone + ed25519 manifest verification + SQLite registry
|
||||
|
||||
10 security/correctness issues found and fixed in review.
|
||||
52
TODO.md
Normal file
52
TODO.md
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
# TODO.md — Core Go Dispatch Queue
|
||||
|
||||
Tasks dispatched from core/go orchestration to satellite repos.
|
||||
Format: `- [ ] REPO: task description` / `- [x]` when done.
|
||||
|
||||
---
|
||||
|
||||
## go-i18n (forge.lthn.ai/core/go-i18n)
|
||||
|
||||
### Phase 1: Harden the Engine
|
||||
|
||||
- [ ] **go-i18n: Add CLAUDE.md** — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
|
||||
- [ ] **go-i18n: Ambiguity resolution for dual-class words** — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before → noun, after subject → verb).
|
||||
- [ ] **go-i18n: Extend irregular verb coverage** — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
|
||||
- [ ] **go-i18n: Add benchmarks** — `grammar_test.go` and `reversal/tokeniser_test.go` need `Benchmark*` functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
|
||||
|
||||
### Phase 2: Reference Distribution + 1B Classification Pipeline
|
||||
|
||||
#### 2a: 1B Pre-Classification (based on LEK-1B benchmarks)
|
||||
|
||||
- [ ] **go-i18n: Classification benchmark suite** — `classify_bench_test.go` with 200+ domain-tagged sentences. Categories: {technical, creative, ethical, casual}. Ground truth for calibrating 1B pre-tags.
|
||||
- [ ] **go-i18n: 1B pre-sort pipeline tool** — CLI/func that reads JSONL corpus, classifies via LEK-Gemma3-1B, writes back with `domain_1b` field. Target: ~5K sentences/sec on M3.
|
||||
- [ ] **go-i18n: 1B vs 27B calibration check** — Sample 500 sentences, classify with both, measure agreement. 75% baseline from benchmarks, technical↔creative is known weak spot.
|
||||
- [ ] **go-i18n: Article/irregular validator** — Lightweight funcs using 1B's strong article (100%) and irregular base form (100%) accuracy as fast validators.
|
||||
|
||||
#### 2b: Reference Distributions
|
||||
|
||||
- [ ] **go-i18n: Reference distribution builder** — Process 88K scored seeds through tokeniser + imprint. Pre-sort by `domain_1b` tag. Output: per-category reference distributions as JSON.
|
||||
- [ ] **go-i18n: Imprint comparator** — Distance metrics (cosine, KL divergence, Mahalanobis) against reference distributions. Classification signal with confidence. Poindexter integration point.
|
||||
- [ ] **go-i18n: Cross-domain anomaly detection** — Flag texts where 1B domain tag disagrees with imprint classification. Training signal or genuine cross-domain text — both valuable.
|
||||
|
||||
### Phase 3: Multi-Language
|
||||
|
||||
- [ ] **go-i18n: Grammar table format spec** — Document the exact JSON schema for `gram.*` keys so new languages can be added. Currently only inferred from `en.json`.
|
||||
- [ ] **go-i18n: French grammar tables** — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
|
||||
|
||||
---
|
||||
|
||||
## core/go (this repo)
|
||||
|
||||
- [ ] **core/go: Remove pkg/i18n dependency** — Once core/cli imports go-i18n directly, remove `pkg/i18n/` from this repo. The locale files are bad data and shouldn't be migrated.
|
||||
- [ ] **core/go: Update go.work** — Add go-i18n to the workspace for local development (`go work use ../go-i18n`).
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Virgil (this session) writes tasks above after research
|
||||
2. Second GoLand session opens the target repo and works from this TODO
|
||||
3. When a task is done, mark `[x]` and note the commit/PR
|
||||
4. If a task reveals new work, add it here
|
||||
5. Scale to other repos once pattern is proven
|
||||
Loading…
Add table
Reference in a new issue