Table of Contents
Reversal Engine Architecture
The reversal engine (reversal/ package) converts inflected text back to base forms with grammatical metadata. It powers GrammarImprint, the Multiplier, and the upcoming Poindexter classification pipeline.
3-Tier Lookup Strategy
The tokeniser classifies words through three tiers, stopping at the first match:
| Tier | Source | Example |
|---|---|---|
| 1 | JSON grammar data (gram.verb.*, gram.noun.*) |
"committed" → past of "commit" |
| 2 | Irregular verb/noun Go maps (IrregularVerbs()) |
"went" → past of "go" |
| 3 | Regular morphology rules + round-trip verification | "processed" → past of "process" |
JSON takes precedence — if a verb is in both en.json and the irregular Go map, the JSON form wins. This lets locale files override built-in rules.
Token Types
TokenUnknown = 0 // Unclassified word
TokenVerb = 1 // Matched verb (VerbInfo populated)
TokenNoun = 2 // Matched noun (NounInfo populated)
TokenArticle = 3 // "a", "an", "the"
TokenWord = 4 // Domain word from gram.word map
TokenPunctuation = 5 // "...", "?", "!", ":", ";"
Classification Priority
When tokenising, words are checked in this order:
- Article — "a", "an", "the"
- Verb — base forms, past tense, gerunds
- Noun — base forms, plurals
- Word — domain vocabulary from
gram.word - Unknown — fallback
This means dual-class words like "commit" (both verb and noun) currently always classify as verbs. The dual-class disambiguation design addresses this.
Matching Methods
MatchVerb(word) → (VerbMatch, bool)
Returns the base form, tense, and original form:
type VerbMatch struct {
Base string // "delete"
Tense string // "past", "gerund", "base"
Form string // Original inflected form
}
Tier 1: Check baseVerbs[word] (base form lookup)
Tier 2: Check pastToBase[word] and gerundToBase[word] (inverse maps)
Tier 3: Apply reverse morphology rules, then round-trip verify
MatchNoun(word) → (NounMatch, bool)
type NounMatch struct {
Base string // Singular form
Plural bool // Whether matched form was plural
Form string // Original form
}
Same 3-tier pattern with pluralToBase inverse map and reverseRegularPlural().
Reverse Morphology Rules
Past Tense Reversal
| Pattern | Rule | Example |
|---|---|---|
-ied |
→ consonant + y | copied → copy |
doubled consonant + ed |
→ single consonant | stopped → stop |
stem + d (stem ends in e) |
→ stem | created → create |
stem + ed |
→ stem | walked → walk |
Gerund Reversal
| Pattern | Rule | Example |
|---|---|---|
-ying |
→ -ie |
dying → die |
doubled consonant + ing |
→ single consonant | stopping → stop |
direct -ing strip |
→ stem | walking → walk |
add -e back |
→ stem + e | creating → create |
Plural Reversal
| Pattern | Rule | Example |
|---|---|---|
consonant + -ies |
→ consonant + y | entries → entry |
-ves |
→ -f or -fe |
wolves → wolf |
sibilant + -es |
→ sibilant | processes → process |
-s |
→ stem | servers → server |
Round-Trip Verification
When Tier 3 produces multiple candidate base forms, bestRoundTrip() selects the best one:
- Priority 1: Candidate is a known base verb/noun (in the index)
- Priority 2: Candidate ends in VCe pattern (vowel-consonant-e, like "delete")
- Priority 3: Candidate doesn't end in "e"
- Fallback: First match
The round-trip test applies the forward function (e.g., PastTense()) to each candidate and checks if it produces the original inflected form. Only verified candidates are accepted.
Index Building
NewTokeniser() builds six inverse lookup maps at construction time:
| Map | Direction | Example |
|---|---|---|
pastToBase |
"deleted" → "delete" | Inverse of gram.verb.*.past |
gerundToBase |
"deleting" → "delete" | Inverse of gram.verb.*.gerund |
baseVerbs |
"delete" → true | All known verb bases |
pluralToBase |
"files" → "file" | Inverse of gram.noun.*.other |
baseNouns |
"file" → true | All known noun bases |
words |
"url" → "URL" | Domain vocabulary |
These maps are built from JSON grammar data first, then supplemented by the irregular verb/noun Go maps (skipping duplicates).