CalibrateDomains() accepts two inference.TextModel instances and a corpus
of CalibrationSamples, classifies all with both models, and computes
agreement rate, per-domain distribution, confusion pairs, and accuracy
vs ground truth.
- calibrate.go: CalibrateDomains + classifyAll batch helper
- calibrate_test.go: 7 mock tests (agreement, disagreement, mixed,
no ground truth, empty, batch boundary, results slice)
- integration/calibrate_test.go: 500-sample corpus (220 ground-truth
+ 280 unlabelled) for real 1B vs 27B model comparison
- TODO.md: Phase 2a calibration task marked complete
Co-Authored-By: Virgil <virgil@lethean.io>
- Remove go-mlx from go.mod (breaks non-darwin builds)
- Fix go-inference pseudo-version for CI compatibility
- Fix mapTokenToDomain prefix collision (castle, credential)
- Add testing.Short() skip to slow classification benchmarks
- Add 80% accuracy threshold to integration test
Integration test moved to integration/ sub-module with its own go.mod
to cleanly isolate go-mlx dependency from the main module.
Co-Authored-By: Virgil <virgil@lethean.io>