Reversal Engine Architecture

The reversal engine (reversal/ package) converts inflected text back to base forms with grammatical metadata. It powers GrammarImprint, the Multiplier, and the upcoming Poindexter classification pipeline.

3-Tier Lookup Strategy

The tokeniser classifies words through three tiers, stopping at the first match:

Tier	Source	Example
1	JSON grammar data (`gram.verb.`, `gram.noun.`)	"committed" → past of "commit"
2	Irregular verb/noun Go maps (`IrregularVerbs()`)	"went" → past of "go"
3	Regular morphology rules + round-trip verification	"processed" → past of "process"

JSON takes precedence — if a verb is in both en.json and the irregular Go map, the JSON form wins. This lets locale files override built-in rules.

Token Types

TokenUnknown      = 0  // Unclassified word
TokenVerb         = 1  // Matched verb (VerbInfo populated)
TokenNoun         = 2  // Matched noun (NounInfo populated)
TokenArticle      = 3  // "a", "an", "the"
TokenWord         = 4  // Domain word from gram.word map
TokenPunctuation  = 5  // "...", "?", "!", ":", ";"

Classification Priority

When tokenising, words are checked in this order:

Article — "a", "an", "the"
Verb — base forms, past tense, gerunds
Noun — base forms, plurals
Word — domain vocabulary from gram.word
Unknown — fallback

This means dual-class words like "commit" (both verb and noun) currently always classify as verbs. The dual-class disambiguation design addresses this.

Matching Methods

MatchVerb(word) → (VerbMatch, bool)

Returns the base form, tense, and original form:

type VerbMatch struct {
    Base  string // "delete"
    Tense string // "past", "gerund", "base"
    Form  string // Original inflected form
}

Tier 1: Check baseVerbs[word] (base form lookup) Tier 2: Check pastToBase[word] and gerundToBase[word] (inverse maps) Tier 3: Apply reverse morphology rules, then round-trip verify

MatchNoun(word) → (NounMatch, bool)

type NounMatch struct {
    Base   string // Singular form
    Plural bool   // Whether matched form was plural
    Form   string // Original form
}

Same 3-tier pattern with pluralToBase inverse map and reverseRegularPlural().

Reverse Morphology Rules

Past Tense Reversal

Pattern	Rule	Example
`-ied`	→ consonant + y	copied → copy
doubled consonant + `ed`	→ single consonant	stopped → stop
stem + `d` (stem ends in e)	→ stem	created → create
stem + `ed`	→ stem	walked → walk

Gerund Reversal

Pattern	Rule	Example
`-ying`	→ `-ie`	dying → die
doubled consonant + `ing`	→ single consonant	stopping → stop
direct `-ing` strip	→ stem	walking → walk
add `-e` back	→ stem + e	creating → create

Plural Reversal

Pattern	Rule	Example
consonant + `-ies`	→ consonant + y	entries → entry
`-ves`	→ `-f` or `-fe`	wolves → wolf
sibilant + `-es`	→ sibilant	processes → process
`-s`	→ stem	servers → server

Round-Trip Verification

When Tier 3 produces multiple candidate base forms, bestRoundTrip() selects the best one:

Priority 1: Candidate is a known base verb/noun (in the index)
Priority 2: Candidate ends in VCe pattern (vowel-consonant-e, like "delete")
Priority 3: Candidate doesn't end in "e"
Fallback: First match

The round-trip test applies the forward function (e.g., PastTense()) to each candidate and checks if it produces the original inflected form. Only verified candidates are accepted.

Index Building

NewTokeniser() builds six inverse lookup maps at construction time:

Map	Direction	Example
`pastToBase`	"deleted" → "delete"	Inverse of gram.verb.*.past
`gerundToBase`	"deleting" → "delete"	Inverse of gram.verb.*.gerund
`baseVerbs`	"delete" → true	All known verb bases
`pluralToBase`	"files" → "file"	Inverse of gram.noun.*.other
`baseNouns`	"file" → true	All known noun bases
`words`	"url" → "URL"	Domain vocabulary

These maps are built from JSON grammar data first, then supplemented by the irregular verb/noun Go maps (skipping duplicates).