Commit graph

21 commits

Author SHA1 Message Date
Snider
f0c4bebfb3 feat(grammar): add dual-class verb/noun entries and contractions
Add test, check, file as verbs and run, build as nouns so the
tokeniser can detect them in both grammatical roles. Add 15
contractions to verb_auxiliary signal list for dev text support.

Update reversal tests to use noun-only words (branch) in test
phrases to avoid dual-class ambiguity until disambiguation (Task 5).

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 16:00:42 +00:00
Snider
05d0483fd7 fix(grammar): review fixes for SignalData loading
Normalise signal words to lowercase on load (defensive against
mixed-case entries in locale JSON). Strengthen test assertions
with expected counts and spot-checks. Clarify Priors field comment.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:54:59 +00:00
Snider
5d3558b383 docs: add orchestrator review of dual-class disambiguation plan
3 bugs (loader type assertion, noun entry verification, confidence
floor), 3 design improvements (contractions, clause boundaries,
negation), 4 Phase 2 feature requests (stats export, corpus priors,
weight tuning, expanded dual-class set).

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 15:50:51 +00:00
Snider
cb7404456f feat(grammar): add SignalData for disambiguation signals
Load noun_determiner, verb_auxiliary, and verb_infinitive word lists
from gram.signal in locale JSON. Reserve Priors field for future
corpus-derived per-word disambiguation priors.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:50:45 +00:00
Snider
d7fc2cda7d docs(reversal): add dual-class disambiguation implementation plan
10-task TDD plan: SignalData loading, dual-class entries, Token
confidence fields, TokeniserOption API, two-pass Tokenise with
7-signal scoring, imprint confidence weighting, multiplier compat,
round-trip tests, and documentation updates.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:38:38 +00:00
Snider
3653383889 docs(reversal): add dual-class word disambiguation design
Multi-signal probabilistic disambiguation with two-pass tokenisation.
Seven weighted signals resolve verb/noun ambiguity for words like
"commit", "run", "test", "check", "file", "build". Confidence scores
flow into imprints for the scoring/comprehension use case.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:34:54 +00:00
Snider
bf08d4a58f docs: add Phase 2 tasks from LEK-1B benchmark findings
Domain classification at 75%/0.17s (~5K sentences/sec) is the sweet spot.
Added 1B pre-sort pipeline, calibration checks, cross-domain anomaly
detection, and expanded reference distribution builder tasks.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 15:30:07 +00:00
Snider
d5b3eac258 docs: add CLAUDE.md, TODO.md, FINDINGS.md for agent workflow
CLAUDE.md documents the grammar engine contract and sacred rules.
TODO.md is the task dispatch queue from core/go orchestration.
FINDINGS.md captures research and architectural decisions.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 15:03:40 +00:00
Claude
c3b96a4ce1
fix(reversal): extend punctuation handling
Add !, ;, and , to splitTrailingPunct and matchPunctuation.
Previously only ..., ?, and : were recognised.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:53:29 +00:00
Claude
9474edde6d
test(reversal): add round-trip validation tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:32:08 +00:00
Claude
b3f6c817d4
feat(reversal): add training data Multiplier
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:30:11 +00:00
Claude
a9c6672b12
feat(reversal): add imprint similarity comparison
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:26:29 +00:00
Claude
8b23600632
feat(reversal): add GrammarImprint struct and constructor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:25:08 +00:00
Claude
f09cff894f
feat(reversal): add Token type and Tokenise function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:22:40 +00:00
Claude
6d72540530
feat(reversal): add word map and article detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:21:04 +00:00
Claude
786909c193
feat(reversal): add noun matching to Tokeniser
Inverse noun lookup: JSON grammar data → irregular nouns → regular
morphology rules. Round-trip verified via forward PluralForm().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:18:08 +00:00
Claude
f1aa4adbc4
feat(reversal): add Tokeniser with verb matching
Reverse grammar tables into pattern matchers. 3-tier lookup:
JSON grammar data → irregular verb maps → regular morphology rules.
Verified by round-tripping through forward functions.

Export IrregularVerbs() and IrregularNouns() so the reversal engine
reads from the authoritative source instead of a duplicate list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 23:15:13 +00:00
Claude
20ab172f5b
docs: add go-i18n reversal + go-html combined design
Bottom-up approach: grammar reversal (Layers 1-2) first,
then go-html HLCRF rendering on top. Both modules share
grammar tables and compose into the same binary.

Phase 1: go-i18n/reversal/ (tokeniser + imprint + multiplier)
Phase 2: go-html (HLCRF parser + Flexy-heritage rendering)
Phase 3: Integration + WASM
Phase 4: CoreDeno + Web Components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 22:45:18 +00:00
Claude
5877431286
docs: add training data multiplier use case
Grammar engine as zero-cost data augmentation: tense/number/formality
flips across 88K seeds = 528K+ verified training examples with no API
spend. Reversal engine provides automatic QA on transformed variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 21:07:26 +00:00
Claude
811e8f7502
docs: grammar reversal engine — linguistic hash function concept
Captures the bidirectional grammar engine idea: using go-i18n tables
in reverse as a deterministic parser to extract semantic imprints from
documents without retaining content. Covers TIM/DataNode architecture,
88K seed calibration, Poindexter integration, and privacy properties.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:11:22 +00:00
Claude
e8a87b0f50
feat: grammar-aware i18n module extracted from core
Standalone grammar-aware translation engine with:
- 3-tier verb/noun fallback (JSON locale → irregular maps → regular rules)
- 6 built-in i18n.* namespace handlers (label, progress, count, done, fail, numeric)
- Nested en.json with gram/prompt/time/lang sections (no flat command keys)
- CLDR plural rules for 10 languages
- Subject fluent API, number/time formatting, RTL detection
- 55 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 19:51:27 +00:00