3 minor fixes required before PR: - R1: Remove passed/failed/skipped from gram.noun (dead data) - R2: Add tests for DisambiguationStats and WithWeights - R3: Guard buildSignalIndex per-field for locale robustness Core implementation approved — B1-B3, D1-D2, F1, F3 all correct. Co-Authored-By: Virgil <virgil@lethean.io>
14 KiB
REVIEW.md — Dual-Class Disambiguation Plan Review
Reviewer: Virgil (core/go orchestrator, go-i18n domain expert)
Date: 2026-02-19
Plans reviewed: docs/plans/2026-02-19-dual-class-disambiguation-design.md, docs/plans/2026-02-19-dual-class-disambiguation-plan.md
Verdict: Strong design, approve with fixes below
Overall Assessment
The two-pass probabilistic disambiguation approach is the right call. The signal weights are well-reasoned, the worked examples are correct, and the backwards-compatibility story (variadic options, conf == 0 → 1.0 guard) is solid. The TDD task breakdown is clean and the commit granularity is good.
This is real engineering, not a cargo-culted NLP textbook. The insight that confidence should flow into imprint distributions rather than being a hard classification is exactly right for the scoring use case.
Bugs to Fix Before Implementation
B1: Loader type assertion missing
Task 1, Step 4 — the flattenWithGrammar handler for gram.signal indexes v["noun_determiner"] directly, but v is typed any in the range iteration. Needs a type assertion first:
if grammar != nil && fullKey == "gram.signal" {
signalMap, ok := v.(map[string]any)
if !ok {
continue
}
if nd, ok := signalMap["noun_determiner"]; ok {
// ...
}
}
Without this, the code panics at runtime. The existing gram.verb and gram.noun handlers in the loader use nested if val, ok := v.(map[string]any) patterns — match those.
B2: Dual-class noun entries — verify "test" and "check"
The design doc says "test, check, file already present as nouns". I can confirm file and commit are in gram.noun in en.json, but test and check need verification. The en.json noun list I see includes: file, repo, repository, commit, branch, change, item, issue, task, person, child, package. If test and check are NOT present, Task 2 needs to add them as nouns too, not just as verbs.
B3: Default prior gives misleading confidence of 1.0
When no signals fire except the default prior (0.02 verb), the confidence calculation gives 0.02 / 0.02 = 1.0 — a token classified as verb with "100% confidence" when in reality it's a coin flip. Add a confidence floor check:
// In resolveToken:
if total < 0.10 {
// Only default prior fired — low-information classification
tok.Confidence = 0.55 // barely above chance
tok.AltConf = 0.45
return
}
Or alternatively, don't fire the default prior when zero other signals fired — make the verb-first fallback explicit with a fixed low confidence rather than deriving it from the score ratio.
Design Improvements
D1: Contraction handling for Signal 2
The verb_auxiliary set includes "do", "does", "did", "will", "would", etc. but misses contractions: "don't", "can't", "won't", "shouldn't", "couldn't", "wouldn't", "doesn't", "didn't", "isn't", "aren't", "wasn't", "weren't", "hasn't", "hadn't", "haven't".
In dev text, contractions are common: "don't run the tests", "can't build on Windows", "shouldn't commit to main". Currently splitTrailingPunct won't strip the apostrophe (it's mid-word), so "don't" hits MatchVerb/MatchNoun as-is, fails both, becomes TokenUnknown. Signal 2 then misses the auxiliary entirely.
Options (pick one):
- Add contractions to
gram.signal.verb_auxiliarydirectly:["don't", "can't", "won't", ...] - Add a contraction expansion step before classification:
"don't" → ["do", "not"]
The first is simpler and doesn't change the tokeniser's word count. Recommend that.
D2: Clause boundary for verb saturation (Signal 5)
Signal 5 scans ALL tokens for a confident verb, but multi-clause sentences can have multiple verbs legitimately:
"The test passed and we should commit the fix"
Here "passed" is a confident verb, so Signal 5 pushes "commit" toward noun — but "commit" is actually a verb in the second clause. Simple fix: only scan tokens within the same "clause", where clause boundaries are punctuation tokens (comma, semicolon, period, dash) or coordinating conjunctions ("and", "or", "but"). Even a rough boundary is better than scanning the whole array.
D3: Negation awareness
"Not" / "never" / "no longer" before a dual-class word should weakly signal verb (negated imperative/statement). Currently "not" would be TokenUnknown and invisible to all signals. Low priority, but worth a gram.signal.verb_negation entry for future use.
Feature Requests (Phase 2 integration)
F1: Export DisambiguationStats for benchmarking
Add a method to the tokeniser that returns aggregate stats after processing a batch:
type DisambiguationStats struct {
TotalTokens int
AmbiguousTokens int
ResolvedAsVerb int
ResolvedAsNoun int
AvgConfidence float64
LowConfidence int // count where confidence < 0.7
}
This feeds directly into Phase 2's calibration work — we need to know how many tokens in the 88K seeds are ambiguous and how confident the engine is. Without this, we'd have to manually iterate tokens and count, which every caller will end up doing anyway.
F2: Corpus-derived priors (replace Signal 7)
The 88K Phase 0 seeds, once pre-tagged by 1B domain classification, can provide actual frequency priors. "commit" in technical text is ~60% noun (in commit messages, PR descriptions). Replace the static 0.02 verb-first prior with per-word priors loaded from a calibration file:
"gram.signal.prior": {
"commit": { "verb": 0.40, "noun": 0.60 },
"test": { "verb": 0.35, "noun": 0.65 },
"run": { "verb": 0.70, "noun": 0.30 }
}
This is Phase 2 work, not Phase 1. But the data model should anticipate it — leave room in SignalData for an optional Priors map[string]map[string]float64 field. Don't implement now, just reserve the slot.
F3: Signal weight tuning via calibration
Hardcoded weights (0.35, 0.25, etc.) are a good starting point, but once we have the 88K seeds with known correct classifications (via 27B ground truth), we can tune weights to maximise accuracy. Worth adding a WithWeights(map[string]float64) option to the tokeniser so weights are configurable without code changes:
tok := NewTokeniser(WithWeights(map[string]float64{
"noun_determiner": 0.35,
"verb_auxiliary": 0.25,
"following_class": 0.15,
"sentence_position": 0.10,
"verb_saturation": 0.10,
"inflection_echo": 0.03,
"default_prior": 0.02,
}))
Again, don't implement now. But the signal scoring function should read weights from the tokeniser struct rather than using literals, so the option is a one-liner to add later.
F4: Expand dual-class set
The plan targets {commit, run, test, check, file, build}. These are the highest-frequency ones in dev text, but consider also: patch, release, update, change, merge, push, pull, tag, log, watch, link, host, import, export, process, function, handle, trigger, stream, queue.
Not all need adding now, but the FINDINGS.md should note the expanded candidate list so Phase 2 can measure which ones actually cause imprint drift in the 88K seeds.
Implementation Notes
- The plan's test pattern is correct:
setup(t)helper exists intokeniser_test.goand callsi18n.New()+SetDefault(). The imprint and roundtrip tests inline this — don't mix patterns, usesetup(t)consistently. splitTrailingPunct()in the tokeniser strips trailing punctuation. Verify it handles: period, comma, semicolon, colon, exclamation, question mark, closing paren/bracket. If it doesn't handle semicolons, the clause boundary detection (D2) won't work as expected.- The
MatchVerbandMatchNounmethods returnVerbMatch/NounMatchstructs. Verify theVerbMatch.Tensefield is "base" for base forms — the plan's Pass 1 logic depends onvm.Tense != "base"to distinguish inflected forms. If the tense field is empty string for base forms, that check silently breaks. - The
NounMatch.Pluralfield — verify this istruefor plural forms andfalsefor singular. The plan's Pass 1 usesnm.Pluralto distinguish inflected nouns.
Priority Order
- Fix B1 (loader panic) — blocks everything
- Fix B3 (confidence floor) — correctness
- Implement D1 (contractions) — easy, high impact for dev text
- Verify B2 (noun entries) — data correctness
- Add F1 (stats export) — enables Phase 2
- Implement D2 (clause boundaries) — accuracy improvement
- Reserve F2/F3 struct fields — forward compatibility
Everything else is Phase 2+.
Approved
The plan is approved for implementation with the B1-B3 fixes applied. D1 (contractions) should be added to Task 2's en.json changes. The rest are suggestions, not blockers.
Post-Implementation Review (2026-02-19, Virgil)
Reviewed commits: f0c4beb..f11c127 (8 commits)
Tests: All pass, race detector clean (go test ./... -count=1 -race)
Verdict: Approve for PR with 3 minor fixes below
What Was Done Well
The implementation addressed every item from the original review:
| Item | Status | Notes |
|---|---|---|
| B1 (loader type assertion) | Fixed | Correct pattern, no panic risk |
| B2 (test/check noun entries) | Fixed | Both added to gram.noun |
| B3 (confidence floor) | Fixed | Clean 0.55/0.45 when total < 0.10 |
| D1 (contractions) | Fixed | 15 contractions in verb_auxiliary signal data |
| D2 (clause boundaries) | Fixed | clauseBoundaries map + hasConfidentVerbInClause scoped correctly |
| F1 (DisambiguationStats) | Implemented | Struct + DisambiguationStatsFromTokens() |
| F3 (WithWeights) | Implemented | Configurable signal weights via option |
The confidence-weighted imprint contributions are exactly right. The clause boundary implementation handles conjunctions and punctuation correctly. The code comments reference the review items (B3, D2) which is good traceability.
Fixes Required Before PR
R1: Remove "passed", "failed", "skipped" from gram.noun
File: locales/en.json
These are past participles, not nouns. They appear in gram.noun as:
"passed": { "one": "passed", "other": "passed" },
"failed": { "one": "failed", "other": "failed" },
"skipped": { "one": "skipped", "other": "skipped" },
This doesn't break tests today because "pass", "fail", "skip" aren't in gram.verb, so MatchVerb("failed") returns false and these words never trigger the dual-class path. But it's wrong data that will bite when we expand the verb table (Phase 2 F4 candidates include these verbs).
Also remove "passed", "failed", "skipped" from gram.word — they're duplicated there too and are dead data (unreachable given the lookup order: verb → noun → word).
Action: Delete the 3 entries from gram.noun and the 3 from gram.word. Run tests to confirm nothing breaks.
R2: Add tests for DisambiguationStats and WithWeights
File: reversal/tokeniser_test.go
Both features are implemented but have zero test coverage. FINDINGS.md claims "DisambiguationStats tests" exist — they don't. Add at minimum:
func TestDisambiguationStats_WithAmbiguous(t *testing.T) {
setup(t)
tok := NewTokeniser()
tokens := tok.Tokenise("The commit passed the test")
stats := DisambiguationStatsFromTokens(tokens)
if stats.AmbiguousTokens == 0 {
t.Error("expected ambiguous tokens for dual-class words")
}
if stats.TotalTokens != len(tokens) {
t.Errorf("TotalTokens = %d, want %d", stats.TotalTokens, len(tokens))
}
}
func TestDisambiguationStats_NoAmbiguous(t *testing.T) {
setup(t)
tok := NewTokeniser()
tokens := tok.Tokenise("The cat sat on the mat")
stats := DisambiguationStatsFromTokens(tokens)
if stats.AmbiguousTokens != 0 {
t.Errorf("AmbiguousTokens = %d, want 0", stats.AmbiguousTokens)
}
}
func TestWithWeights_Override(t *testing.T) {
setup(t)
// Override noun_determiner to 0 — "The commit" should no longer resolve as noun
tok := NewTokeniser(WithWeights(map[string]float64{
"noun_determiner": 0.0,
"verb_auxiliary": 0.25,
"following_class": 0.15,
"sentence_position": 0.10,
"verb_saturation": 0.10,
"inflection_echo": 0.03,
"default_prior": 0.02,
}))
tokens := tok.Tokenise("The commit")
// With noun_determiner zeroed, default_prior (verb) should win
if tokens[1].Type != TokenVerb {
t.Errorf("with noun_determiner=0, 'commit' Type = %v, want TokenVerb", tokens[1].Type)
}
}
Also note: WithWeights with a partial map silently disables omitted signals (missing keys → if w, ok := t.weights[key]; ok is false → signal skipped). This is arguably fine but should be documented in a code comment on WithWeights.
R3: Guard buildSignalIndex per-field
File: reversal/tokeniser.go, function buildSignalIndex
Current code:
if data != nil && len(data.Signals.NounDeterminers) > 0 {
// loads ALL signal lists, then returns
return
}
// fallback hardcodes
If a future locale has noun_determiner populated but verb_auxiliary empty, the function returns early with t.verbAux empty — Signal 2 silently dies. Fix: guard each list independently so partial locale data falls back per-field:
if data != nil && len(data.Signals.NounDeterminers) > 0 {
for _, w := range data.Signals.NounDeterminers {
t.nounDet[strings.ToLower(w)] = true
}
} else {
// fallback: hardcoded English noun determiners
for _, w := range []string{"the", "a", "an", ...} {
t.nounDet[w] = true
}
}
if data != nil && len(data.Signals.VerbAuxiliaries) > 0 {
for _, w := range data.Signals.VerbAuxiliaries {
t.verbAux[strings.ToLower(w)] = true
}
} else {
// fallback: hardcoded English verb auxiliaries
for _, w := range []string{"will", "would", "can", ...} {
t.verbAux[w] = true
}
}
// same for VerbInfinitive
This is defensive but prevents a class of silent failure for new locales.
Update FINDINGS.md
After fixing R1-R3, update the "Dual-Class Word Disambiguation" section in FINDINGS.md:
- Remove the claim about DisambiguationStats tests (they didn't exist, now they do)
- Note R1 cleanup (removed dead noun entries)
- Note R3 per-field fallback
After Fixes
Once R1-R3 are done and tests pass, create the PR targeting main. The disambiguation feature is solid — these are cleanup items, not architectural issues.