go-i18n

Author	SHA1	Message	Date
Snider	c3e9153cf3	feat(reversal): Phase 2b — reference distributions, comparator, anomaly detection Reference distribution builder: - BuildReferences() tokenises samples, computes per-domain centroid imprints - Per-key variance for Mahalanobis distance normalisation Imprint comparator: - Compare() returns cosine, KL divergence, Mahalanobis per domain - Classify() picks best domain with confidence margin - Symmetric KL with epsilon smoothing, component-weighted Cross-domain anomaly detection: - DetectAnomalies() flags model vs imprint domain disagreements - AnomalyStats tracks rate and confusion pair counts 17 new tests, all race-clean. Phase 2b complete. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 13:57:51 +00:00
Snider	3b7ef9d26a	feat(calibrate): 1B vs 27B domain calibration tool CalibrateDomains() accepts two inference.TextModel instances and a corpus of CalibrationSamples, classifies all with both models, and computes agreement rate, per-domain distribution, confusion pairs, and accuracy vs ground truth. - calibrate.go: CalibrateDomains + classifyAll batch helper - calibrate_test.go: 7 mock tests (agreement, disagreement, mixed, no ground truth, empty, batch boundary, results slice) - integration/calibrate_test.go: 500-sample corpus (220 ground-truth + 280 unlabelled) for real 1B vs 27B model comparison - TODO.md: Phase 2a calibration task marked complete Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 13:51:11 +00:00
Snider	2e586aedc5	chore: add gitignore entries for IDE/build artifacts Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 09:15:26 +00:00
Snider	3c55d91cb8	feat(validate): article and irregular verb validators using 1B model Single-token generation validators for article correctness and irregular verb form accuracy, leveraging the 1B model's 100% accuracy on both tasks. Includes batch variants and 14 mock tests covering correct/wrong/empty/ context-cancellation/whitespace cases. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 09:09:23 +00:00
Claude	32a55f5d35	test(i18n): add tests for 8 uncovered files, coverage 69.9% -> 86.6% Add dedicated test files for compose.go, context.go, debug.go, hooks.go, i18n.go, localise.go, time.go, and transform.go. Uses testify assert/require with table-driven tests and Good/Bad/Ugly naming. Key coverage improvements: - transform.go: toInt/toInt64/toFloat64 18.8% -> 100% - time.go: TimeAgo/FormatAgo 0% -> 100%/87.5% - compose.go: newTemplateData/SetFormality/IsInformal 0% -> 100% - context.go: all functions now 100% - debug.go: package-level SetDebug 0% -> 100% - hooks.go: RegisterLocales 0% -> 100% - i18n.go: T/Raw/N/SetMode/AddHandler/PrependHandler all covered Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 02:09:08 +00:00
Snider	ff376830c0	fix: address Virgil review — 5 fixes for classify pipeline - Remove go-mlx from go.mod (breaks non-darwin builds) - Fix go-inference pseudo-version for CI compatibility - Fix mapTokenToDomain prefix collision (castle, credential) - Add testing.Short() skip to slow classification benchmarks - Add 80% accuracy threshold to integration test Integration test moved to integration/ sub-module with its own go.mod to cleanly isolate go-mlx dependency from the main module. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 00:44:35 +00:00
Snider	c23a271716	docs: Virgil review — 5 fixes before continuing Phase 2a Review of ClassifyCorpus implementation found: - go-mlx in go.mod breaks non-Mac builds (should be go.work only) - go-inference v0.0.0 invalid pseudo-version - mapTokenToDomain prefix collision (cas → castle/cascade) - classify_bench_test.go runs slow tests on every go test - integration test has no domain accuracy assertion Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 00:15:45 +00:00
Snider	a8628dfeb9	docs: mark 1B pre-sort pipeline complete Integration tested at 80 prompts/sec with Gemma3-1B on M3 Ultra. 100% domain accuracy on technical prompts. go-ai dependency resolved via direct go-inference + go-mlx imports. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:09:49 +00:00
Snider	8f6130f6d7	test: add integration test for ClassifyCorpus with real model 50-prompt throughput test behind //go:build integration tag. Validates end-to-end pipeline with Gemma3-1B via go-mlx Metal backend. Result: 80 prompts/sec, 100% correct domain classification. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:08:53 +00:00
Snider	94fd2f463f	feat: implement ClassifyCorpus with streaming batch classification Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:06:44 +00:00
Snider	a5f3eb4777	feat: add classify types and token-to-domain mapper Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:03:25 +00:00
Snider	c05e3fc283	build: add go-inference dependency Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 00:01:36 +00:00
Snider	0c496a0a17	docs: add 1B pre-sort pipeline implementation plan 5-task TDD plan: dependency setup, types+mapper, ClassifyCorpus with mock tests, integration test with real model, docs update. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:59:41 +00:00
Snider	c028c81c13	docs: add 1B pre-sort pipeline design Streaming batch classification via go-inference Classify() API. Package-level ClassifyCorpus() function with configurable batch size, prompt template, and mock-friendly TextModel interface. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:57:01 +00:00
Snider	97f4983c90	docs: unblock Phase 2a — go-mlx inference now available go-mlx completed Phases 2-5: Gemma3-1B at 46 tok/s, batch Classify at 152 prompts/sec. Updated TODO.md with direct go-inference import path (skip go-ai), usage examples, model path, and key types. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 23:51:09 +00:00
Snider	24e60104d1	feat: add grammar table spec and French locale Document full JSON schema for gram.* keys in docs/grammar-table-spec.md. Add French grammar tables (50 verbs, 24 gendered nouns, signals). Extend loader to parse article by_gender map. Completes Phase 3. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 17:57:22 +00:00
Snider	21a60c23a4	docs: flag Phase 2a go-ai dependency, add classification findings Phase 2a tasks (1B pre-sort, calibration, validator) blocked on go-ai MLX bindings. Classification benchmark shows grammar engine handles technical/creative well but needs 1B for ethical/casual disambiguation. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 17:51:07 +00:00
Snider	7d5ab809f0	feat(reversal): add classification benchmark suite 220 domain-tagged sentences across {technical, creative, ethical, casual} with leave-one-out classification, domain separation, token coverage, tense profile, and top-verb diagnostics. Grammar-based accuracy: 54% overall (technical 78%, creative 82%, ethical 46%, casual 11%). Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 17:48:48 +00:00
Snider	65cf099517	feat: extend irregular verb coverage and add benchmarks Add 44 irregular verbs (compound, simple, CVC doubling overrides) and 15 benchmark functions across forward composition and reversal engine. Completes Phase 1 hardening tasks. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 17:41:29 +00:00
Snider	26b7dd4224	fix(reversal): post-implementation review fixes R1-R3 R1: Remove "passed", "failed", "skipped" from gram.noun and gram.word — these are past participles, not nouns. R2: Add DisambiguationStats and WithWeights tests to tokeniser_test.go using setup(t) pattern. Remove duplicates from roundtrip_test.go. R3: Guard buildSignalIndex per-field so partial locale data falls back independently rather than silently disabling signals. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:54:29 +00:00
Snider	0aadc92497	review: post-implementation findings for dual-class disambiguation 3 minor fixes required before PR: - R1: Remove passed/failed/skipped from gram.noun (dead data) - R2: Add tests for DisambiguationStats and WithWeights - R3: Guard buildSignalIndex per-field for locale robustness Core implementation approved — B1-B3, D1-D2, F1, F3 all correct. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 16:50:18 +00:00
Snider	f11c1272dc	docs: mark dual-class disambiguation complete Update TODO.md and FINDINGS.md with implementation details, signal weight table, and test coverage summary. Note expanded dual-class candidates for Phase 2. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:22:50 +00:00
Snider	3848297496	feat(reversal): add DisambiguationStats and dual-class round-trip tests DisambiguationStatsFromTokens provides aggregate disambiguation metrics for Phase 2 calibration. Round-trip tests verify all 6 dual-class words disambiguate correctly in both verb and noun contexts, and that same-role imprints converge while different-role imprints diverge. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:21:34 +00:00
Snider	4ffe614840	fix(reversal): preserve Confidence on multiplier-transformed tokens Transformed tokens get Confidence 1.0 since the transformation is deterministic and unambiguous. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:19:54 +00:00
Snider	c1d347f079	feat(reversal): confidence-weighted imprint contributions Dual-class tokens contribute to both verb and noun distributions weighted by Confidence and AltConf. Non-ambiguous tokens (Confidence 1.0, AltConf 0.0) behave identically to before. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:18:44 +00:00
Snider	4cdd6e59d4	test(reversal): add WithSignals breakdown coverage Verify SignalBreakdown is populated when WithSignals() is set and nil when not. Check individual signal components fire correctly. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:17:22 +00:00
Snider	d296fd22ca	feat(reversal): implement two-pass disambiguation with 7 signals Pass 1 classifies unambiguous tokens and marks dual-class base forms. Pass 2 evaluates noun_determiner, verb_auxiliary, following_class, sentence_position, verb_saturation (with clause boundaries), inflection_echo, and default_prior signals. B3 confidence floor for low-information classifications. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:12:57 +00:00
Snider	03cd99e092	feat(reversal): add TokeniserOption, dual-class and signal indexes NewTokeniser now accepts variadic options (backwards compatible). Builds dual-class index from verb∩noun overlap and signal word lookup sets from gram.signal data. Configurable weights via WithWeights() for future calibration. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:08:23 +00:00
Snider	a5489504cf	feat(reversal): add Token confidence and SignalBreakdown types Every classified token now carries a Confidence score (1.0 for unambiguous tokens). SignalBreakdown and SignalComponent types provide detailed scoring for dual-class disambiguation. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:04:53 +00:00
Snider	f0c4bebfb3	feat(grammar): add dual-class verb/noun entries and contractions Add test, check, file as verbs and run, build as nouns so the tokeniser can detect them in both grammatical roles. Add 15 contractions to verb_auxiliary signal list for dev text support. Update reversal tests to use noun-only words (branch) in test phrases to avoid dual-class ambiguity until disambiguation (Task 5). Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:00:42 +00:00
Snider	05d0483fd7	fix(grammar): review fixes for SignalData loading Normalise signal words to lowercase on load (defensive against mixed-case entries in locale JSON). Strengthen test assertions with expected counts and spot-checks. Clarify Priors field comment. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:54:59 +00:00
Snider	5d3558b383	docs: add orchestrator review of dual-class disambiguation plan 3 bugs (loader type assertion, noun entry verification, confidence floor), 3 design improvements (contractions, clause boundaries, negation), 4 Phase 2 feature requests (stats export, corpus priors, weight tuning, expanded dual-class set). Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 15:50:51 +00:00
Snider	cb7404456f	feat(grammar): add SignalData for disambiguation signals Load noun_determiner, verb_auxiliary, and verb_infinitive word lists from gram.signal in locale JSON. Reserve Priors field for future corpus-derived per-word disambiguation priors. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:50:45 +00:00
Snider	d7fc2cda7d	docs(reversal): add dual-class disambiguation implementation plan 10-task TDD plan: SignalData loading, dual-class entries, Token confidence fields, TokeniserOption API, two-pass Tokenise with 7-signal scoring, imprint confidence weighting, multiplier compat, round-trip tests, and documentation updates. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:38:38 +00:00
Snider	3653383889	docs(reversal): add dual-class word disambiguation design Multi-signal probabilistic disambiguation with two-pass tokenisation. Seven weighted signals resolve verb/noun ambiguity for words like "commit", "run", "test", "check", "file", "build". Confidence scores flow into imprints for the scoring/comprehension use case. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:34:54 +00:00
Snider	bf08d4a58f	docs: add Phase 2 tasks from LEK-1B benchmark findings Domain classification at 75%/0.17s (~5K sentences/sec) is the sweet spot. Added 1B pre-sort pipeline, calibration checks, cross-domain anomaly detection, and expanded reference distribution builder tasks. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 15:30:07 +00:00
Snider	d5b3eac258	docs: add CLAUDE.md, TODO.md, FINDINGS.md for agent workflow CLAUDE.md documents the grammar engine contract and sacred rules. TODO.md is the task dispatch queue from core/go orchestration. FINDINGS.md captures research and architectural decisions. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 15:03:40 +00:00
Claude	c3b96a4ce1	fix(reversal): extend punctuation handling Add !, ;, and , to splitTrailingPunct and matchPunctuation. Previously only ..., ?, and : were recognised. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:53:29 +00:00
Claude	9474edde6d	test(reversal): add round-trip validation tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:32:08 +00:00
Claude	b3f6c817d4	feat(reversal): add training data Multiplier Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:30:11 +00:00
Claude	a9c6672b12	feat(reversal): add imprint similarity comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:26:29 +00:00
Claude	8b23600632	feat(reversal): add GrammarImprint struct and constructor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:25:08 +00:00
Claude	f09cff894f	feat(reversal): add Token type and Tokenise function Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:22:40 +00:00
Claude	6d72540530	feat(reversal): add word map and article detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:21:04 +00:00
Claude	786909c193	feat(reversal): add noun matching to Tokeniser Inverse noun lookup: JSON grammar data → irregular nouns → regular morphology rules. Round-trip verified via forward PluralForm(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:18:08 +00:00
Claude	f1aa4adbc4	feat(reversal): add Tokeniser with verb matching Reverse grammar tables into pattern matchers. 3-tier lookup: JSON grammar data → irregular verb maps → regular morphology rules. Verified by round-tripping through forward functions. Export IrregularVerbs() and IrregularNouns() so the reversal engine reads from the authoritative source instead of a duplicate list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 23:15:13 +00:00
Claude	20ab172f5b	docs: add go-i18n reversal + go-html combined design Bottom-up approach: grammar reversal (Layers 1-2) first, then go-html HLCRF rendering on top. Both modules share grammar tables and compose into the same binary. Phase 1: go-i18n/reversal/ (tokeniser + imprint + multiplier) Phase 2: go-html (HLCRF parser + Flexy-heritage rendering) Phase 3: Integration + WASM Phase 4: CoreDeno + Web Components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 22:45:18 +00:00
Claude	5877431286	docs: add training data multiplier use case Grammar engine as zero-cost data augmentation: tense/number/formality flips across 88K seeds = 528K+ verified training examples with no API spend. Reversal engine provides automatic QA on transformed variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 21:07:26 +00:00
Claude	811e8f7502	docs: grammar reversal engine — linguistic hash function concept Captures the bidirectional grammar engine idea: using go-i18n tables in reverse as a deterministic parser to extract semantic imprints from documents without retaining content. Covers TIM/DataNode architecture, 88K seed calibration, Poindexter integration, and privacy properties. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:11:22 +00:00
Claude	e8a87b0f50	feat: grammar-aware i18n module extracted from core Standalone grammar-aware translation engine with: - 3-tier verb/noun fallback (JSON locale → irregular maps → regular rules) - 6 built-in i18n.* namespace handlers (label, progress, count, done, fail, numeric) - Nested en.json with gram/prompt/time/lang sections (no flat command keys) - CLDR plural rules for 10 languages - Subject fluent API, number/time formatting, RTL detection - 55 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 19:51:27 +00:00

50 commits