1 Locale-JSON-Schema
Virgil edited this page 2026-02-19 15:57:01 +00:00

Locale JSON Schema

Grammar data lives in locales/{lang}.json. Currently only en.json exists. This page documents the exact structure the loader expects.

Top-Level Structure

{
  "gram": {
    "verb": { ... },
    "noun": { ... },
    "article": { ... },
    "word": { ... },
    "punct": { ... },
    "signal": { ... },
    "number": { ... }
  },
  "prompt": { ... },
  "time": { ... },
  "lang": { ... }
}

Only the gram.* namespace is processed by the grammar engine. Everything outside gram.* is flattened into the message lookup table as plain key-value strings.

gram.verb — Verb Conjugation Tables

Each entry maps a base verb to its inflected forms:

"verb": {
  "delete": { "base": "delete", "past": "deleted", "gerund": "deleting" },
  "commit": { "base": "commit", "past": "committed", "gerund": "committing" },
  "go":     { "base": "go",     "past": "went",      "gerund": "going" },
  "run":    { "base": "run",    "past": "ran",        "gerund": "running" },
  "set":    { "base": "set",    "past": "set",        "gerund": "setting" }
}

Required fields: At least one of base, past, gerund must be present.

The loader detects verb objects via isVerbFormObject() — any map with base, past, or gerund keys (and NOT a plural object).

Currently ~45 verbs covering irregulars and common dev/ops vocabulary. Regular verbs (add → added → adding) don't need entries — the morphology engine handles them. Only add entries for:

  • Irregular verbs (go/went/going)
  • Doubling verbs (commit/committed/committing)
  • Verbs where the engine would guess wrong

gram.noun — Noun Pluralisation Tables

Each entry maps a noun to its singular and plural forms:

"noun": {
  "file":          { "one": "file",          "other": "files" },
  "person":        { "one": "person",        "other": "people" },
  "child":         { "one": "child",         "other": "children" },
  "vulnerability": { "one": "vulnerability", "other": "vulnerabilities" },
  "commit":        { "one": "commit",        "other": "commits" }
}

Required fields: one and other. Optional field: gender (for gendered languages like French).

The loader detects noun objects by checking for one + other keys, or the presence of a gender key.

Currently ~24 nouns covering irregulars and tech vocabulary. Regular plurals (server → servers) don't need entries.

gram.article — Article Configuration

"article": {
  "indefinite": {
    "default": "a",
    "vowel": "an"
  },
  "definite": "the"
}

Mapped to ArticleForms struct. The Article() function uses phonetic rules (consonant/vowel sound maps) to choose between default and vowel.

gram.word — Domain Vocabulary

Maps lowercase keys to display forms:

"word": {
  "url": "URL",
  "ssh": "SSH",
  "api": "API",
  "id": "ID",
  "ci": "CI",
  "qa": "QA",
  "blocked_by": "blocked by",
  "up_to_date": "up to date"
}

These are classified as TokenWord by the tokeniser and tracked in DomainVocabulary in imprints. Add entries for:

  • Acronyms with specific capitalisation (URL, SSH, API)
  • Multi-word phrases (blocked_by → "blocked by")
  • Domain-specific terms that need consistent display

gram.punct — Punctuation Rules

"punct": {
  "label": ":",
  "progress": "..."
}

Language-specific punctuation. French uses : (with space before colon). The LabelHandler and ProgressHandler use these suffixes.

gram.signal — Disambiguation Signal Words

"signal": {
  "noun_determiner": [
    "the", "a", "an", "this", "that", "these", "those",
    "my", "your", "his", "her", "its", "our", "their",
    "every", "each", "some", "any", "no",
    "many", "few", "several", "all", "both"
  ],
  "verb_auxiliary": [
    "is", "are", "was", "were", "has", "had", "have",
    "do", "does", "did", "will", "would", "could", "should",
    "can", "may", "might", "shall", "must"
  ],
  "verb_infinitive": ["to"]
}

Used by the dual-class disambiguation system to classify ambiguous words like "commit", "test", "run".

gram.number — Number Formatting

"number": {
  "thousands": ",",
  "decimal": ".",
  "percent": "%"
}

Used by NumericHandler for locale-aware number formatting. German would use "thousands": "." and "decimal": ",".

Sacred Rules

  1. NEVER flatten gram.* keys — the grammar engine depends on nested structure. Flattening gram.verb.delete.past to a flat key breaks the loader silently.
  2. Only gram.* data belongs in locale files — consumer translations are external.
  3. Irregular forms override regular morphology — if a verb is in gram.verb, its forms take precedence over the rule-based engine.
  4. The one/other keys overlap with CLDR plural categories — the loader distinguishes noun objects from plural message objects by checking for nested maps.