Add test, check, file as verbs and run, build as nouns so the
tokeniser can detect them in both grammatical roles. Add 15
contractions to verb_auxiliary signal list for dev text support.
Update reversal tests to use noun-only words (branch) in test
phrases to avoid dual-class ambiguity until disambiguation (Task 5).
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Normalise signal words to lowercase on load (defensive against
mixed-case entries in locale JSON). Strengthen test assertions
with expected counts and spot-checks. Clarify Priors field comment.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Load noun_determiner, verb_auxiliary, and verb_infinitive word lists
from gram.signal in locale JSON. Reserve Priors field for future
corpus-derived per-word disambiguation priors.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-signal probabilistic disambiguation with two-pass tokenisation.
Seven weighted signals resolve verb/noun ambiguity for words like
"commit", "run", "test", "check", "file", "build". Confidence scores
flow into imprints for the scoring/comprehension use case.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLAUDE.md documents the grammar engine contract and sacred rules.
TODO.md is the task dispatch queue from core/go orchestration.
FINDINGS.md captures research and architectural decisions.
Co-Authored-By: Virgil <virgil@lethean.io>
Add !, ;, and , to splitTrailingPunct and matchPunctuation.
Previously only ..., ?, and : were recognised.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverse grammar tables into pattern matchers. 3-tier lookup:
JSON grammar data → irregular verb maps → regular morphology rules.
Verified by round-tripping through forward functions.
Export IrregularVerbs() and IrregularNouns() so the reversal engine
reads from the authoritative source instead of a duplicate list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grammar engine as zero-cost data augmentation: tense/number/formality
flips across 88K seeds = 528K+ verified training examples with no API
spend. Reversal engine provides automatic QA on transformed variants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Captures the bidirectional grammar engine idea: using go-i18n tables
in reverse as a deterministic parser to extract semantic imprints from
documents without retaining content. Covers TIM/DataNode architecture,
88K seed calibration, Poindexter integration, and privacy properties.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>