CalibrateDomains() accepts two inference.TextModel instances and a corpus
of CalibrationSamples, classifies all with both models, and computes
agreement rate, per-domain distribution, confusion pairs, and accuracy
vs ground truth.
- calibrate.go: CalibrateDomains + classifyAll batch helper
- calibrate_test.go: 7 mock tests (agreement, disagreement, mixed,
no ground truth, empty, batch boundary, results slice)
- integration/calibrate_test.go: 500-sample corpus (220 ground-truth
+ 280 unlabelled) for real 1B vs 27B model comparison
- TODO.md: Phase 2a calibration task marked complete
Co-Authored-By: Virgil <virgil@lethean.io>