6.8 KiB
| title | description |
|---|---|
| Locale JSON Schema | Structure of locale JSON files and the grammar table contract. |
Locale JSON Schema
Grammar data lives in locales/{lang}.json. This page documents the exact structure the loader expects and the sacred rules that must not be violated.
Top-Level Structure
{
"gram": {
"verb": { ... },
"noun": { ... },
"article": { ... },
"word": { ... },
"punct": { ... },
"signal": { ... }
},
"time": { ... },
"prompt": { ... }
}
Only the gram.* namespace is processed by the grammar engine. Everything outside gram.* is flattened into the message lookup table as plain key-value strings.
gram.verb -- Verb Conjugation Tables
Each entry maps a base verb to its inflected forms:
"verb": {
"delete": { "base": "delete", "past": "deleted", "gerund": "deleting" },
"commit": { "base": "commit", "past": "committed", "gerund": "committing" },
"go": { "base": "go", "past": "went", "gerund": "going" },
"run": { "base": "run", "past": "ran", "gerund": "running" }
}
Required fields: At least one of base, past, gerund must be present.
The loader detects verb objects via isVerbFormObject() -- any map with base, past, or gerund keys (and NOT a plural object).
When to add entries: Only for verbs where the regular morphology engine would produce the wrong result:
- Irregular verbs (go/went/going)
- Consonant doubling verbs (commit/committed/committing)
- Verbs the engine guesses wrong
Regular verbs like add/added/adding do not need entries.
gram.noun -- Noun Pluralisation Tables
Each entry maps a noun to its singular and plural forms:
"noun": {
"file": { "one": "file", "other": "files" },
"person": { "one": "person", "other": "people" },
"child": { "one": "child", "other": "children" },
"vulnerability": { "one": "vulnerability", "other": "vulnerabilities" },
"commit": { "one": "commit", "other": "commits" }
}
Required fields: one and other.
Optional field: gender (for gendered languages).
The loader detects noun objects by checking for one + other keys, or the presence of a gender key.
As with verbs, only add entries for irregular plurals or cases where the engine guesses wrong. Regular plurals (server -> servers) do not need entries.
gram.article -- Article Configuration
"article": {
"indefinite": {
"default": "a",
"vowel": "an"
},
"definite": "the"
}
Maps to the ArticleForms struct. The Article() function uses phonetic rules (consonant/vowel sound maps) to choose between default and vowel.
For gendered languages, add a by_gender map:
"article": {
"definite": "the",
"by_gender": {
"masculine": "le",
"feminine": "la"
}
}
gram.word -- Domain Vocabulary
Maps lowercase keys to display forms:
"word": {
"url": "URL",
"ssh": "SSH",
"api": "API",
"id": "ID",
"ci": "CI",
"qa": "QA",
"blocked_by": "blocked by",
"up_to_date": "up to date"
}
These are classified as TokenWord by the reversal tokeniser and tracked in DomainVocabulary in imprints. Add entries for:
- Acronyms with specific capitalisation (URL, SSH, API)
- Multi-word phrases (
blocked_by-> "blocked by") - Domain-specific terms that need consistent display
gram.punct -- Punctuation Rules
"punct": {
"label": ":",
"progress": "..."
}
Language-specific punctuation suffixes. French would use " :" (space before colon) for the label suffix. The LabelHandler and ProgressHandler read these values.
gram.signal -- Disambiguation Signal Words
"signal": {
"noun_determiner": [
"the", "a", "an", "this", "that", "these", "those",
"my", "your", "his", "her", "its", "our", "their",
"every", "each", "some", "any", "no",
"many", "few", "several", "all", "both"
],
"verb_auxiliary": [
"is", "are", "was", "were", "has", "had", "have",
"do", "does", "did", "will", "would", "could", "should",
"can", "may", "might", "shall", "must"
],
"verb_infinitive": ["to"]
}
Used by the dual-class disambiguation system to classify ambiguous words like "commit", "test", "run". Each signal list falls back to hardcoded English defaults when absent from the locale file.
Translation Messages
Everything outside gram.* is flattened into the message lookup table:
{
"time": {
"just_now": "just now",
"ago": {
"minute": {
"one": "{{.Count}} minute ago",
"other": "{{.Count}} minutes ago"
}
}
}
}
Nested objects with CLDR plural keys (zero, one, two, few, many, other) are detected and stored as Message structs with plural forms. All other strings are flattened to dot-notation keys (e.g. time.just_now).
Template Syntax
Message values support Go text/template syntax:
"welcome": "Hello, {{.Subject}}!"
Grammar Table Contract
Sacred Rules
-
NEVER flatten
gram.*keys. The grammar engine depends on nested structure. Flatteninggram.verb.delete.pastto a flat string key breaks the loader silently. Agents and tooling must preserve the nested JSON objects. -
Only
gram.*data belongs in locale files. Consumer translations are external -- packages register their own locale files viaRegisterLocales(). -
Irregular forms override regular morphology. If a verb is in
gram.verb, its forms take precedence over the rule-based engine. -
The
one/otherkeys overlap with CLDR plural categories. The loader distinguishes noun objects (undergram.noun.*) from plural message objects by checking for nested maps. -
Values must be the inflected form, not rules. Store
"deleted", not"d suffix". -
Round-trip must hold.
PastTense(base)then reverse must recoverbase.
Adding a New Language
- Create
locales/{lang}.jsonwith agramsection - Populate
gram.verbwith irregular verbs for that language - Populate
gram.nounwith irregular nouns - Define
gram.articlerules (if the language has articles) - Define
gram.punct(language-specific punctuation) - Add
gram.signalword lists for disambiguation - Add CLDR plural rules in
language.goif not already present - Run reversal round-trip tests to verify bijective property
Loader Behaviour
The FSLoader reads all .json files from the locales directory. For each file:
- Parse as JSON
- Walk the key tree, detecting grammar objects (
gram.*) and plural objects - Grammar data populates the
GrammarDatastruct (verbs, nouns, articles, words, punctuation, signals) - Everything else is flattened into
map[string]Messagefor translation lookup - Language tags support both
-and_separators (en-GBanden_GBboth work)