docs: add CLAUDE.md, TODO.md, FINDINGS.md for agent workflow

CLAUDE.md documents the grammar engine contract and sacred rules. TODO.md is the task dispatch queue from core/go orchestration. FINDINGS.md captures research and architectural decisions. Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 15:03:40 +00:00 · 2026-02-19 15:03:40 +00:00 · d5b3eac258
commit d5b3eac258
parent c3b96a4ce1
3 changed files with 138 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,67 @@
+# CLAUDE.md
+
+## What This Is
+
+Grammar-aware internationalisation engine for Go. Module: `forge.lthn.ai/core/go-i18n`
+
+This is a **grammar engine** — it provides primitives for composing and reversing grammatically correct text. It is NOT a translation file manager. Consumers bring their own translations.
+
+## Commands
+
+```bash
+go test ./...                    # Run all tests
+go test -v ./reversal/           # Reversal engine tests
+go test -bench=. ./...           # Benchmarks (when added)
+```
+
+## Critical Rules
+
+### DO NOT flatten locale JSON files
+
+The grammar engine depends on nested `gram.*` structure:
+
+```json
+{
+  "gram": {
+    "verb": {
+      "delete": { "past": "deleted", "gerund": "deleting" }
+    }
+  }
+}
+```
+
+If you flatten this to `"gram.verb.delete.past": "deleted"`, the grammar engine breaks silently. **This is the #1 cause of agent-introduced bugs.**
+
+### This library does not manage consumer translations
+
+go-i18n provides grammar primitives. Apps using it (core/cli, etc.) manage their own translation files. Do not add app-specific translation keys to `locales/en.json` — only `gram.*` grammar data belongs there.
+
+## Architecture
+
+| Package | Purpose |
+|---------|---------|
+| Root | Forward composition: T(), grammar primitives, handlers, service |
+| `reversal/` | Reverse grammar: tokeniser, imprint, multiplier |
+| `locales/` | Grammar tables (JSON) — only `gram.*` data |
+| `docs/plans/` | Design documents |
+
+## Key API
+
+- `T(key, args...)` — Translate with namespace handlers
+- `PastTense(verb)`, `Gerund(verb)`, `Pluralize(noun, n)`, `Article(word)` — Grammar primitives
+- `reversal.NewTokeniser().Tokenise(text)` — Reverse grammar lookup
+- `reversal.NewImprint(tokens)` — Feature vector projection
+- `reversal.NewMultiplier().Expand(text)` — Training data augmentation
+
+## Coding Standards
+
+- UK English (colour, organisation, centre)
+- `go test ./...` must pass before commit
+- Conventional commits: `type(scope): description`
+- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
+
+## Task Queue
+
+See `TODO.md` for dispatched tasks from core/go orchestration.
+See `FINDINGS.md` for research notes and architectural decisions.
+See the [wiki](https://forge.lthn.ai/core/go-i18n/wiki) for full architecture docs.
--- a/FINDINGS.md
+++ b/FINDINGS.md
@ -0,0 +1,40 @@
+# FINDINGS.md — go-i18n Research & Discovery
+
+Record findings, gaps, and architectural decisions here as work progresses.
+
+---
+
+## 2026-02-19: Initial Assessment (Virgil)
+
+### Current State
+
+- 5,800 lines across 32 files (14 test files)
+- All tests pass
+- Only dependency: `golang.org/x/text`
+- Grammar engine is solid: forward composition + reversal + imprint + multiplier
+
+### Architecture
+
+go-i18n is a **grammar engine**, not a translation file manager. Consumers bring their own translations. The library provides:
+
+1. **Forward composition** — Grammar primitives that compose grammatically correct text
+2. **Reverse grammar** — Tokeniser reads grammar tables backwards to extract structure
+3. **GrammarImprint** — Lossy feature vector projection (content to grammar fingerprint)
+4. **Multiplier** — Deterministic training data augmentation (no LLM calls)
+
+### Key Gaps Identified
+
+| Gap | Impact | Notes |
+|-----|--------|-------|
+| No CLAUDE.md | High | Agents don't know the rules, will flatten locale files |
+| Dual-class word ambiguity | Medium | "file" as verb vs noun, "run" as verb vs noun |
+| No benchmarks | Medium | No perf baselines for hot-path usage (TIM, Poindexter) |
+| No reference distributions | High | Can't calibrate imprints without scored seed data |
+| Only English grammar tables | Medium | Reversal only works with loaded GrammarData |
+
+### Sacred Rules
+
+- `gram.*` keys in locale JSON MUST remain nested — flattening breaks the grammar engine
+- Irregular forms in grammar tables override regular morphological rules
+- Round-trip property must hold: forward(base) then reverse must recover base
+- go-i18n does NOT ship or manage consumer translation files
--- a/TODO.md
+++ b/TODO.md
@ -0,0 +1,31 @@
+# TODO.md — go-i18n Task Queue
+
+Dispatched from core/go orchestration. Pick up tasks in order.
+
+---
+
+## Phase 1: Harden the Engine
+
+- [ ] **Add CLAUDE.md** — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
+- [ ] **Ambiguity resolution for dual-class words** — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before = noun, after subject = verb).
+- [ ] **Extend irregular verb coverage** — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
+- [ ] **Add benchmarks** — `grammar_test.go` and `reversal/tokeniser_test.go` need `Benchmark*` functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
+
+## Phase 2: Reference Distribution
+
+- [ ] **Reference distribution builder** — Process the 88K scored seeds from LEM Phase 0 through the tokeniser + imprint pipeline. Output: per-category (ethical, technical, harmful) reference distributions stored as JSON. This calibrates what "normal" grammar looks like.
+- [ ] **Imprint comparator** — Given a new text and reference distributions, compute distance metrics (cosine, KL divergence, Mahalanobis). Return a classification signal with confidence score. This is the Poindexter integration point.
+
+## Phase 3: Multi-Language
+
+- [ ] **Grammar table format spec** — Document the exact JSON schema for `gram.*` keys so new languages can be added. Currently only inferred from `en.json`.
+- [ ] **French grammar tables** — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
+
+---
+
+## Workflow
+
+1. Virgil in core/go writes tasks here after research
+2. This repo's session picks up tasks in phase order
+3. Mark `[x]` when done, note commit hash
+4. New discoveries → add tasks, flag in FINDINGS.md