go-i18n/docs/development.md
Snider 5fb98dcedd docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00

192 lines
8.4 KiB
Markdown

# Development Guide
## Prerequisites
- Go 1.25 or later (the module uses `go 1.25.5`)
- `golang.org/x/text` (only external dependency for the core engine)
- `forge.lthn.ai/core/go-inference` (replaced via local path `../go-inference` in `go.mod` — required for the `classify.go` and `calibrate.go` files and integration tests)
For integration tests only:
- Models on `/Volumes/Data/lem/` — specifically `LEM-Gemma3-1B-layered-v2` and `LEM-Gemma3-27B` (or compatible models served via the `go-inference` interface)
The `go-inference` package provides the `TextModel` interface used by `ClassifyCorpus()` and `CalibrateDomains()`. Unit tests use a mock implementation and do not require real models.
---
## Build and Test
```bash
# Run all tests
go test ./...
# Run tests with verbose output
go test -v ./...
# Run tests for a specific package
go test -v ./reversal/
# Run a single test by name
go test -run TestName ./...
# Run benchmarks
go test -bench=. ./...
# Run benchmarks for a specific package
go test -bench=. -benchmem ./reversal/
# Run with race detector
go test -race ./...
```
All tests must pass before committing. The race detector must report clean.
---
## Integration Tests
Integration tests require real model instances on `/Volumes/Data/lem/` and are kept in the `integration/` directory, separate from unit tests. They are not run by `go test ./...` from the module root (the integration package is excluded from the default build tag set).
```bash
# 1B classification pipeline (50 prompts, approximately 1 second on M3 Ultra)
cd integration && go test -v -run TestClassifyCorpus_Integration
# 1B vs 27B calibration (500 sentences, approximately 2-5 minutes with 27B)
cd integration && go test -v -run TestCalibrateDomains_1Bvs27B
```
If models are unavailable, the integration tests skip automatically via `testing.Short()` or an explicit model-presence check. Do not convert integration tests to unit tests — they have real runtime cost and external dependencies.
---
## Test Patterns
### Unit tests
Unit tests for the reversal package follow the `_Good`, `_Bad`, `_Ugly` naming pattern inherited from the broader Core Go ecosystem:
- `_Good`: happy path
- `_Bad`: expected error conditions
- `_Ugly`: panic or edge cases
Tests for the root package use standard Go test function naming.
### Mock models
`ClassifyCorpus()` and `CalibrateDomains()` accept the `inference.TextModel` interface. Unit tests construct a mock that returns controlled token sequences without loading any model. The mock implements `Classify(ctx, prompts, opts...) ([]Result, error)`.
### Round-trip tests
`reversal/roundtrip_test.go` validates the round-trip property: every verb in `irregularVerbs` and every noun in `irregularNouns` must survive a reverse lookup and recover the original base form. Add any new irregular entries to the maps in `types.go` and the round-trip tests will automatically cover them.
### Disambiguation tests
Nine named scenario tests cover the key disambiguation signal interactions:
- Noun after determiner (noun_determiner fires)
- Imperative verb at sentence start (sentence_position fires)
- Verb saturation within clause
- Clause boundary isolation
- Contraction auxiliary (`don't`, `can't`, etc.)
Twelve dual-class round-trip tests cover all six dual-class words (`commit`, `run`, `test`, `check`, `file`, `build`) in both verb and noun roles.
### Benchmark baselines
Benchmark baselines were measured on M3 Ultra, arm64. See `FINDINGS.md` (archived in `docs/history.md`) for the full table. When adding new benchmarks, include `b.ReportAllocs()` and compare against the baseline table.
---
## Coding Standards
### Language
UK English throughout. Correct spellings: `colour`, `organisation`, `centre`, `analyse`, `recognise`, `optimise`, `initialise`, `synchronise`, `cancelling`, `modelled`, `labelled`, `travelling`. These spellings appear in the `irregularVerbs` map and must remain consistent.
### Go style
- `declare(strict_types=1)` equivalent: use explicit types on all declarations where the type is not obvious from context
- All parameters and return types must be named and typed
- Prefer `fmt.Errorf("context: %w", err)` for error wrapping
- Use `errors.Is()` for error comparison, not string matching
- No global mutable state beyond the `grammarCache` and `templateCache` (which are already protected by synchronisation primitives)
### Grammar table rules
**Never flatten `gram.*` keys in locale JSON.** The loader (`flattenWithGrammar()`) depends on the nested `gram.verb.*`, `gram.noun.*` etc. path structure to route objects into typed Go structs. Flattening to `"gram.verb.delete.past": "deleted"` causes silent data loss — the key is treated as a plain translation message, not a verb form.
**Dual-class words** must appear in both `gram.verb` and `gram.noun` in the JSON. The tokeniser builds the `dualClass` index by intersecting `baseVerbs` and `baseNouns` at construction time.
**Only `gram.*` grammar data belongs in `locales/en.json` and `locales/fr.json`.** Consumer app translation keys (`prompt.*`, `time.*`, etc.) are managed by consumers, not this library.
### File organisation
| File | Contents |
|------|----------|
| `types.go` | All types, interfaces, package variables, irregular maps |
| `grammar.go` | Forward composition functions |
| `loader.go` | FSLoader, JSON parsing, flattenWithGrammar |
| `classify.go` | ClassifyCorpus, ClassifyStats, ClassifyOption |
| `calibrate.go` | CalibrateDomains, CalibrationStats, CalibrationResult |
| `reversal/tokeniser.go` | Tokeniser, Tokenise, two-pass disambiguation |
| `reversal/imprint.go` | GrammarImprint, NewImprint, Similar |
| `reversal/reference.go` | ReferenceSet, BuildReferences, Compare, Classify, distance metrics |
| `reversal/anomaly.go` | DetectAnomalies, AnomalyResult, AnomalyStats |
| `reversal/multiplier.go` | Multiplier, Expand |
Do not put grammar functions in `types.go` or type definitions in `grammar.go`. Keep the split clean.
---
## Conventional Commits
Format: `type(scope): description`
Common types: `feat`, `fix`, `test`, `bench`, `refactor`, `docs`, `chore`
Common scopes: `tokeniser`, `imprint`, `reference`, `anomaly`, `multiplier`, `grammar`, `loader`, `classify`, `calibrate`, `fr` (for French grammar table changes)
Examples:
```
feat(tokeniser): add two-pass disambiguation for dual-class words
fix(imprint): floor confidence at 0.55/0.45 when only prior fires
test(reference): add Mahalanobis fallback to Euclidean test
bench(grammar): add PastTense and Gerund baselines
```
---
## Co-Author
All commits must include the co-author trailer:
```
Co-Authored-By: Virgil <virgil@lethean.io>
```
---
## Licence
EUPL-1.2. Do not add dependencies with incompatible licences. The only external runtime dependency is `golang.org/x/text` (BSD-3-Clause, compatible). `go-inference` is an internal Core module.
---
## Adding a New Language
1. Create `locales/<lang>.json` with a complete `gram` block following `docs/grammar-table-spec.md`.
2. Populate `gram.verb` comprehensively — tiers 2 and 3 of the fallback chain are English-only.
3. Populate `gram.noun` with gender fields if the language has grammatical gender.
4. Set `gram.article.by_gender` for gendered article systems.
5. Set `gram.punct.label` correctly — French uses `" :"` (space before colon), English uses `":"`.
6. Populate `gram.signal` lists so the disambiguation tokeniser has language-appropriate determiners and auxiliaries. Without these, the tokeniser uses hardcoded English defaults.
7. Add a plural rule function to the `pluralRules` map in `types.go` if the language has non-standard plural categories (beyond one/other).
8. Run `go test ./...` and confirm all existing tests still pass. Add grammar data tests that verify the loaded counts and known values.
9. If the language needs reversal support, verify that `NewTokeniserForLang("<lang>")` builds indexes correctly and that `MatchVerb` / `MatchNoun` return correct results for a sample of forms.
---
## Performance Notes
- `Imprint.Similar` is zero-alloc. Keep it that way — it is called in tight loops during reference comparison.
- `WithSignals()` allocates `SignalBreakdown` on every ambiguous token. It is for diagnostics only; never enable it in the hot path.
- `Multiplier.Expand` allocates heavily (63 allocs for a four-word sentence). If it becomes a bottleneck, pool the token slices.
- The `grammarCache` uses `sync.RWMutex` with read-biased locking. Languages are loaded once at startup and then read-only; this is the intended pattern.