docs: add CLAUDE.md, TODO.md, FINDINGS.md for agent workflow
CLAUDE.md documents the grammar engine contract and sacred rules. TODO.md is the task dispatch queue from core/go orchestration. FINDINGS.md captures research and architectural decisions. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
c3b96a4ce1
commit
d5b3eac258
3 changed files with 138 additions and 0 deletions
67
CLAUDE.md
Normal file
67
CLAUDE.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
# CLAUDE.md
|
||||
|
||||
## What This Is
|
||||
|
||||
Grammar-aware internationalisation engine for Go. Module: `forge.lthn.ai/core/go-i18n`
|
||||
|
||||
This is a **grammar engine** — it provides primitives for composing and reversing grammatically correct text. It is NOT a translation file manager. Consumers bring their own translations.
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
go test ./... # Run all tests
|
||||
go test -v ./reversal/ # Reversal engine tests
|
||||
go test -bench=. ./... # Benchmarks (when added)
|
||||
```
|
||||
|
||||
## Critical Rules
|
||||
|
||||
### DO NOT flatten locale JSON files
|
||||
|
||||
The grammar engine depends on nested `gram.*` structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"gram": {
|
||||
"verb": {
|
||||
"delete": { "past": "deleted", "gerund": "deleting" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you flatten this to `"gram.verb.delete.past": "deleted"`, the grammar engine breaks silently. **This is the #1 cause of agent-introduced bugs.**
|
||||
|
||||
### This library does not manage consumer translations
|
||||
|
||||
go-i18n provides grammar primitives. Apps using it (core/cli, etc.) manage their own translation files. Do not add app-specific translation keys to `locales/en.json` — only `gram.*` grammar data belongs there.
|
||||
|
||||
## Architecture
|
||||
|
||||
| Package | Purpose |
|
||||
|---------|---------|
|
||||
| Root | Forward composition: T(), grammar primitives, handlers, service |
|
||||
| `reversal/` | Reverse grammar: tokeniser, imprint, multiplier |
|
||||
| `locales/` | Grammar tables (JSON) — only `gram.*` data |
|
||||
| `docs/plans/` | Design documents |
|
||||
|
||||
## Key API
|
||||
|
||||
- `T(key, args...)` — Translate with namespace handlers
|
||||
- `PastTense(verb)`, `Gerund(verb)`, `Pluralize(noun, n)`, `Article(word)` — Grammar primitives
|
||||
- `reversal.NewTokeniser().Tokenise(text)` — Reverse grammar lookup
|
||||
- `reversal.NewImprint(tokens)` — Feature vector projection
|
||||
- `reversal.NewMultiplier().Expand(text)` — Training data augmentation
|
||||
|
||||
## Coding Standards
|
||||
|
||||
- UK English (colour, organisation, centre)
|
||||
- `go test ./...` must pass before commit
|
||||
- Conventional commits: `type(scope): description`
|
||||
- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
|
||||
|
||||
## Task Queue
|
||||
|
||||
See `TODO.md` for dispatched tasks from core/go orchestration.
|
||||
See `FINDINGS.md` for research notes and architectural decisions.
|
||||
See the [wiki](https://forge.lthn.ai/core/go-i18n/wiki) for full architecture docs.
|
||||
40
FINDINGS.md
Normal file
40
FINDINGS.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# FINDINGS.md — go-i18n Research & Discovery
|
||||
|
||||
Record findings, gaps, and architectural decisions here as work progresses.
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-19: Initial Assessment (Virgil)
|
||||
|
||||
### Current State
|
||||
|
||||
- 5,800 lines across 32 files (14 test files)
|
||||
- All tests pass
|
||||
- Only dependency: `golang.org/x/text`
|
||||
- Grammar engine is solid: forward composition + reversal + imprint + multiplier
|
||||
|
||||
### Architecture
|
||||
|
||||
go-i18n is a **grammar engine**, not a translation file manager. Consumers bring their own translations. The library provides:
|
||||
|
||||
1. **Forward composition** — Grammar primitives that compose grammatically correct text
|
||||
2. **Reverse grammar** — Tokeniser reads grammar tables backwards to extract structure
|
||||
3. **GrammarImprint** — Lossy feature vector projection (content to grammar fingerprint)
|
||||
4. **Multiplier** — Deterministic training data augmentation (no LLM calls)
|
||||
|
||||
### Key Gaps Identified
|
||||
|
||||
| Gap | Impact | Notes |
|
||||
|-----|--------|-------|
|
||||
| No CLAUDE.md | High | Agents don't know the rules, will flatten locale files |
|
||||
| Dual-class word ambiguity | Medium | "file" as verb vs noun, "run" as verb vs noun |
|
||||
| No benchmarks | Medium | No perf baselines for hot-path usage (TIM, Poindexter) |
|
||||
| No reference distributions | High | Can't calibrate imprints without scored seed data |
|
||||
| Only English grammar tables | Medium | Reversal only works with loaded GrammarData |
|
||||
|
||||
### Sacred Rules
|
||||
|
||||
- `gram.*` keys in locale JSON MUST remain nested — flattening breaks the grammar engine
|
||||
- Irregular forms in grammar tables override regular morphological rules
|
||||
- Round-trip property must hold: forward(base) then reverse must recover base
|
||||
- go-i18n does NOT ship or manage consumer translation files
|
||||
31
TODO.md
Normal file
31
TODO.md
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
# TODO.md — go-i18n Task Queue
|
||||
|
||||
Dispatched from core/go orchestration. Pick up tasks in order.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Harden the Engine
|
||||
|
||||
- [ ] **Add CLAUDE.md** — Document the grammar engine contract: what it is (grammar primitives + reversal), what it isn't (translation file manager). Include build/test commands, the gram.* sacred rule, and the agent-flattening prohibition.
|
||||
- [ ] **Ambiguity resolution for dual-class words** — Words like "run", "file", "test", "check", "build" are both verb and noun. Tokeniser currently picks first match. Need context-aware disambiguation (look at surrounding tokens: article before = noun, after subject = verb).
|
||||
- [ ] **Extend irregular verb coverage** — Audit against common dev/ops vocabulary. Missing forms cause silent fallback to regular rules which may produce wrong output (e.g. "builded" instead of "built").
|
||||
- [ ] **Add benchmarks** — `grammar_test.go` and `reversal/tokeniser_test.go` need `Benchmark*` functions. The engine will run in hot paths (TIM, Poindexter) — need baseline numbers.
|
||||
|
||||
## Phase 2: Reference Distribution
|
||||
|
||||
- [ ] **Reference distribution builder** — Process the 88K scored seeds from LEM Phase 0 through the tokeniser + imprint pipeline. Output: per-category (ethical, technical, harmful) reference distributions stored as JSON. This calibrates what "normal" grammar looks like.
|
||||
- [ ] **Imprint comparator** — Given a new text and reference distributions, compute distance metrics (cosine, KL divergence, Mahalanobis). Return a classification signal with confidence score. This is the Poindexter integration point.
|
||||
|
||||
## Phase 3: Multi-Language
|
||||
|
||||
- [ ] **Grammar table format spec** — Document the exact JSON schema for `gram.*` keys so new languages can be added. Currently only inferred from `en.json`.
|
||||
- [ ] **French grammar tables** — First non-English language. French has gendered nouns, complex verb conjugation, elision rules. Good stress test for the grammar engine's language-agnostic design.
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Virgil in core/go writes tasks here after research
|
||||
2. This repo's session picks up tasks in phase order
|
||||
3. Mark `[x]` when done, note commit hash
|
||||
4. New discoveries → add tasks, flag in FINDINGS.md
|
||||
Loading…
Add table
Reference in a new issue