CalibrateDomains() accepts two inference.TextModel instances and a corpus
of CalibrationSamples, classifies all with both models, and computes
agreement rate, per-domain distribution, confusion pairs, and accuracy
vs ground truth.
- calibrate.go: CalibrateDomains + classifyAll batch helper
- calibrate_test.go: 7 mock tests (agreement, disagreement, mixed,
no ground truth, empty, batch boundary, results slice)
- integration/calibrate_test.go: 500-sample corpus (220 ground-truth
+ 280 unlabelled) for real 1B vs 27B model comparison
- TODO.md: Phase 2a calibration task marked complete
Co-Authored-By: Virgil <virgil@lethean.io>
Single-token generation validators for article correctness and irregular
verb form accuracy, leveraging the 1B model's 100% accuracy on both tasks.
Includes batch variants and 14 mock tests covering correct/wrong/empty/
context-cancellation/whitespace cases.
Co-Authored-By: Virgil <virgil@lethean.io>
- Remove go-mlx from go.mod (breaks non-darwin builds)
- Fix go-inference pseudo-version for CI compatibility
- Fix mapTokenToDomain prefix collision (castle, credential)
- Add testing.Short() skip to slow classification benchmarks
- Add 80% accuracy threshold to integration test
Integration test moved to integration/ sub-module with its own go.mod
to cleanly isolate go-mlx dependency from the main module.
Co-Authored-By: Virgil <virgil@lethean.io>
Review of ClassifyCorpus implementation found:
- go-mlx in go.mod breaks non-Mac builds (should be go.work only)
- go-inference v0.0.0 invalid pseudo-version
- mapTokenToDomain prefix collision (cas → castle/cascade)
- classify_bench_test.go runs slow tests on every go test
- integration test has no domain accuracy assertion
Co-Authored-By: Virgil <virgil@lethean.io>
Integration tested at 80 prompts/sec with Gemma3-1B on M3 Ultra.
100% domain accuracy on technical prompts. go-ai dependency resolved
via direct go-inference + go-mlx imports.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5-task TDD plan: dependency setup, types+mapper, ClassifyCorpus with
mock tests, integration test with real model, docs update.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Streaming batch classification via go-inference Classify() API.
Package-level ClassifyCorpus() function with configurable batch size,
prompt template, and mock-friendly TextModel interface.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
R1: Remove "passed", "failed", "skipped" from gram.noun and
gram.word — these are past participles, not nouns.
R2: Add DisambiguationStats and WithWeights tests to
tokeniser_test.go using setup(t) pattern. Remove duplicates
from roundtrip_test.go.
R3: Guard buildSignalIndex per-field so partial locale data
falls back independently rather than silently disabling signals.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update TODO.md and FINDINGS.md with implementation details,
signal weight table, and test coverage summary. Note expanded
dual-class candidates for Phase 2.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DisambiguationStatsFromTokens provides aggregate disambiguation
metrics for Phase 2 calibration. Round-trip tests verify all 6
dual-class words disambiguate correctly in both verb and noun
contexts, and that same-role imprints converge while different-role
imprints diverge.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Transformed tokens get Confidence 1.0 since the transformation
is deterministic and unambiguous.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dual-class tokens contribute to both verb and noun distributions
weighted by Confidence and AltConf. Non-ambiguous tokens (Confidence
1.0, AltConf 0.0) behave identically to before.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify SignalBreakdown is populated when WithSignals() is set and
nil when not. Check individual signal components fire correctly.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NewTokeniser now accepts variadic options (backwards compatible).
Builds dual-class index from verb∩noun overlap and signal word
lookup sets from gram.signal data. Configurable weights via
WithWeights() for future calibration.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every classified token now carries a Confidence score (1.0 for
unambiguous tokens). SignalBreakdown and SignalComponent types
provide detailed scoring for dual-class disambiguation.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test, check, file as verbs and run, build as nouns so the
tokeniser can detect them in both grammatical roles. Add 15
contractions to verb_auxiliary signal list for dev text support.
Update reversal tests to use noun-only words (branch) in test
phrases to avoid dual-class ambiguity until disambiguation (Task 5).
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Normalise signal words to lowercase on load (defensive against
mixed-case entries in locale JSON). Strengthen test assertions
with expected counts and spot-checks. Clarify Priors field comment.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Load noun_determiner, verb_auxiliary, and verb_infinitive word lists
from gram.signal in locale JSON. Reserve Priors field for future
corpus-derived per-word disambiguation priors.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-signal probabilistic disambiguation with two-pass tokenisation.
Seven weighted signals resolve verb/noun ambiguity for words like
"commit", "run", "test", "check", "file", "build". Confidence scores
flow into imprints for the scoring/comprehension use case.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CLAUDE.md documents the grammar engine contract and sacred rules.
TODO.md is the task dispatch queue from core/go orchestration.
FINDINGS.md captures research and architectural decisions.
Co-Authored-By: Virgil <virgil@lethean.io>
Add !, ;, and , to splitTrailingPunct and matchPunctuation.
Previously only ..., ?, and : were recognised.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverse grammar tables into pattern matchers. 3-tier lookup:
JSON grammar data → irregular verb maps → regular morphology rules.
Verified by round-tripping through forward functions.
Export IrregularVerbs() and IrregularNouns() so the reversal engine
reads from the authoritative source instead of a duplicate list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grammar engine as zero-cost data augmentation: tense/number/formality
flips across 88K seeds = 528K+ verified training examples with no API
spend. Reversal engine provides automatic QA on transformed variants.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Captures the bidirectional grammar engine idea: using go-i18n tables
in reverse as a deterministic parser to extract semantic imprints from
documents without retaining content. Covers TIM/DataNode architecture,
88K seed calibration, Poindexter integration, and privacy properties.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>