docs: add CLAUDE.md, TODO.md, FINDINGS.md for agent workflow

CLAUDE.md documents the grammar engine contract and sacred rules.
TODO.md is the task dispatch queue from core/go orchestration.
FINDINGS.md captures research and architectural decisions.

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-19 15:03:40 +00:00
parent c3b96a4ce1
commit d5b3eac258
3 changed files with 138 additions and 0 deletions

67
CLAUDE.md Normal file
View file

@ -0,0 +1,67 @@
# CLAUDE.md
## What This Is
Grammar-aware internationalisation engine for Go. Module: `forge.lthn.ai/core/go-i18n`
This is a **grammar engine** — it provides primitives for composing and reversing grammatically correct text. It is NOT a translation file manager. Consumers bring their own translations.
## Commands
```bash
go test ./... # Run all tests
go test -v ./reversal/ # Reversal engine tests
go test -bench=. ./... # Benchmarks (when added)
```
## Critical Rules
### DO NOT flatten locale JSON files
The grammar engine depends on nested `gram.*` structure:
```json
{
"gram": {
"verb": {
"delete": { "past": "deleted", "gerund": "deleting" }
}
}
}
```
If you flatten this to `"gram.verb.delete.past": "deleted"`, the grammar engine breaks silently. **This is the #1 cause of agent-introduced bugs.**
### This library does not manage consumer translations
go-i18n provides grammar primitives. Apps using it (core/cli, etc.) manage their own translation files. Do not add app-specific translation keys to `locales/en.json` — only `gram.*` grammar data belongs there.
## Architecture
| Package | Purpose |
|---------|---------|
| Root | Forward composition: T(), grammar primitives, handlers, service |
| `reversal/` | Reverse grammar: tokeniser, imprint, multiplier |
| `locales/` | Grammar tables (JSON) — only `gram.*` data |
| `docs/plans/` | Design documents |
## Key API
- `T(key, args...)` — Translate with namespace handlers
- `PastTense(verb)`, `Gerund(verb)`, `Pluralize(noun, n)`, `Article(word)` — Grammar primitives
- `reversal.NewTokeniser().Tokenise(text)` — Reverse grammar lookup
- `reversal.NewImprint(tokens)` — Feature vector projection
- `reversal.NewMultiplier().Expand(text)` — Training data augmentation
## Coding Standards
- UK English (colour, organisation, centre)
- `go test ./...` must pass before commit
- Conventional commits: `type(scope): description`
- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
## Task Queue
See `TODO.md` for dispatched tasks from core/go orchestration.
See `FINDINGS.md` for research notes and architectural decisions.
See the [wiki](https://forge.lthn.ai/core/go-i18n/wiki) for full architecture docs.

40
FINDINGS.md Normal file
View file

@ -0,0 +1,40 @@
# FINDINGS.md — go-i18n Research & Discovery
Record findings, gaps, and architectural decisions here as work progresses.
---
## 2026-02-19: Initial Assessment (Virgil)
### Current State
- 5,800 lines across 32 files (14 test files)
- All tests pass
- Only dependency: `golang.org/x/text`
- Grammar engine is solid: forward composition + reversal + imprint + multiplier
### Architecture
go-i18n is a **grammar engine**, not a translation file manager. Consumers bring their own translations. The library provides:
1. **Forward composition** — Grammar primitives that compose grammatically correct text
2. **Reverse grammar** — Tokeniser reads grammar tables backwards to extract structure
3. **GrammarImprint** — Lossy feature vector projection (content to grammar fingerprint)
4. **Multiplier** — Deterministic training data augmentation (no LLM calls)
### Key Gaps Identified
| Gap | Impact | Notes |
|-----|--------|-------|
| No CLAUDE.md | High | Agents don't know the rules, will flatten locale files |
| Dual-class word ambiguity | Medium | "file" as verb vs noun, "run" as verb vs noun |
| No benchmarks | Medium | No perf baselines for hot-path usage (TIM, Poindexter) |
| No reference distributions | High | Can't calibrate imprints without scored seed data |
| Only English grammar tables | Medium | Reversal only works with loaded GrammarData |
### Sacred Rules
- `gram.*` keys in locale JSON MUST remain nested — flattening breaks the grammar engine
- Irregular forms in grammar tables override regular morphological rules
- Round-trip property must hold: forward(base) then reverse must recover base
- go-i18n does NOT ship or manage consumer translation files

31
TODO.md Normal file
View file

@ -0,0 +1,31 @@
# TODO.md — go-i18n Task Queue
Dispatched from core/go orchestration. Pick up tasks in order.
---
## Phase 1: Harden the Engine
- [ ] **Add CLAUDE.md** — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
- [ ] **Ambiguity resolution for dual-class words** — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before = noun, after subject = verb).
- [ ] **Extend irregular verb coverage** — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
- [ ] **Add benchmarks**`grammar_test.go` and `reversal/tokeniser_test.go` need `Benchmark*` functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
## Phase 2: Reference Distribution
- [ ] **Reference distribution builder** — Process the 88K scored seeds from LEM Phase 0 through the tokeniser + imprint pipeline. Output: per-category (ethical, technical, harmful) reference distributions stored as JSON. This calibrates what "normal" grammar looks like.
- [ ] **Imprint comparator** — Given a new text and reference distributions, compute distance metrics (cosine, KL divergence, Mahalanobis). Return a classification signal with confidence score. This is the Poindexter integration point.
## Phase 3: Multi-Language
- [ ] **Grammar table format spec** — Document the exact JSON schema for `gram.*` keys so new languages can be added. Currently only inferred from `en.json`.
- [ ] **French grammar tables** — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
---
## Workflow
1. Virgil in core/go writes tasks here after research
2. This repo's session picks up tasks in phase order
3. Mark `[x]` when done, note commit hash
4. New discoveries → add tasks, flag in FINDINGS.md