1 Reversal-Engine
Virgil edited this page 2026-02-19 15:55:09 +00:00

Reversal Engine Architecture

The reversal engine (reversal/ package) converts inflected text back to base forms with grammatical metadata. It powers GrammarImprint, the Multiplier, and the upcoming Poindexter classification pipeline.

3-Tier Lookup Strategy

The tokeniser classifies words through three tiers, stopping at the first match:

Tier Source Example
1 JSON grammar data (gram.verb.*, gram.noun.*) "committed" → past of "commit"
2 Irregular verb/noun Go maps (IrregularVerbs()) "went" → past of "go"
3 Regular morphology rules + round-trip verification "processed" → past of "process"

JSON takes precedence — if a verb is in both en.json and the irregular Go map, the JSON form wins. This lets locale files override built-in rules.

Token Types

TokenUnknown      = 0  // Unclassified word
TokenVerb         = 1  // Matched verb (VerbInfo populated)
TokenNoun         = 2  // Matched noun (NounInfo populated)
TokenArticle      = 3  // "a", "an", "the"
TokenWord         = 4  // Domain word from gram.word map
TokenPunctuation  = 5  // "...", "?", "!", ":", ";"

Classification Priority

When tokenising, words are checked in this order:

  1. Article — "a", "an", "the"
  2. Verb — base forms, past tense, gerunds
  3. Noun — base forms, plurals
  4. Word — domain vocabulary from gram.word
  5. Unknown — fallback

This means dual-class words like "commit" (both verb and noun) currently always classify as verbs. The dual-class disambiguation design addresses this.

Matching Methods

MatchVerb(word) → (VerbMatch, bool)

Returns the base form, tense, and original form:

type VerbMatch struct {
    Base  string // "delete"
    Tense string // "past", "gerund", "base"
    Form  string // Original inflected form
}

Tier 1: Check baseVerbs[word] (base form lookup) Tier 2: Check pastToBase[word] and gerundToBase[word] (inverse maps) Tier 3: Apply reverse morphology rules, then round-trip verify

MatchNoun(word) → (NounMatch, bool)

type NounMatch struct {
    Base   string // Singular form
    Plural bool   // Whether matched form was plural
    Form   string // Original form
}

Same 3-tier pattern with pluralToBase inverse map and reverseRegularPlural().

Reverse Morphology Rules

Past Tense Reversal

Pattern Rule Example
-ied → consonant + y copied → copy
doubled consonant + ed → single consonant stopped → stop
stem + d (stem ends in e) → stem created → create
stem + ed → stem walked → walk

Gerund Reversal

Pattern Rule Example
-ying -ie dying → die
doubled consonant + ing → single consonant stopping → stop
direct -ing strip → stem walking → walk
add -e back → stem + e creating → create

Plural Reversal

Pattern Rule Example
consonant + -ies → consonant + y entries → entry
-ves -f or -fe wolves → wolf
sibilant + -es → sibilant processes → process
-s → stem servers → server

Round-Trip Verification

When Tier 3 produces multiple candidate base forms, bestRoundTrip() selects the best one:

  1. Priority 1: Candidate is a known base verb/noun (in the index)
  2. Priority 2: Candidate ends in VCe pattern (vowel-consonant-e, like "delete")
  3. Priority 3: Candidate doesn't end in "e"
  4. Fallback: First match

The round-trip test applies the forward function (e.g., PastTense()) to each candidate and checks if it produces the original inflected form. Only verified candidates are accepted.

Index Building

NewTokeniser() builds six inverse lookup maps at construction time:

Map Direction Example
pastToBase "deleted" → "delete" Inverse of gram.verb.*.past
gerundToBase "deleting" → "delete" Inverse of gram.verb.*.gerund
baseVerbs "delete" → true All known verb bases
pluralToBase "files" → "file" Inverse of gram.noun.*.other
baseNouns "file" → true All known noun bases
words "url" → "URL" Domain vocabulary

These maps are built from JSON grammar data first, then supplemented by the irregular verb/noun Go maps (skipping duplicates).