Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go and targeted tests for all uncovered branches in search.go. Add BenchmarkSearch with 150 topics (~745us/op baseline). Phase 1: Implement three search improvements: - Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars) - Quoted phrase search via extractPhrases() with +8.0 boost - Tag boost (+3.0) and multi-word bonus (+2.0) scoring - Named scoring constants replacing magic numbers All changes are backward-compatible; Search() signature unchanged. Co-Authored-By: Charon <developers@lethean.io>
3.2 KiB
3.2 KiB
FINDINGS.md -- go-help
2026-02-19: Split from core/go (Virgil)
Origin
Extracted from forge.lthn.ai/core/go pkg/help/ on 19 Feb 2026.
Architecture
Topicstruct with title, body, tags, related topicsSectiongroups topics under a headingFrontmatterfor YAML metadata in topic filesCatalogloads topics from YAML files on diskSearchprovides keyword search across topics with scoring:- Title match: +10
- Section match: +5
- Partial/body match: +0.5
Dependencies
- Pure Go, only external dependency is
gopkg.in/yaml.v3
Tests
- 2 test files covering catalog loading and search behaviour
2026-02-20: Phase 0 + Phase 1 (Charon)
Phase 0: Coverage 92.1% -> 100%
- Created
catalog_test.go— the entirecatalog.gowas untested (0%) - Added targeted search tests for previously uncovered branches:
- Nil topic guard in
Search()(stale index references) - Alphabetical tie-breaking when scores are equal
- Headings-only content in snippet extraction (no body text)
- Whitespace-only content trimmed to empty in snippets
- Empty regex slice in
highlight() - Overlapping match extension in highlight merging
- Nil topic guard in
- Added
BenchmarkSearchwith 150 generated topics- Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
go vet ./...clean
Phase 1: Search Improvements
Fuzzy Matching (Levenshtein distance)
- Implemented
levenshtein()using two-row DP (memory-efficient) - Integrated into
Search()with max edit distance of 2 - Only applied to query words >= 3 characters (avoids noise from short words)
- Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
- Skips words already matched as exact or prefix (no double-counting)
Phrase Search
extractPhrases()pulls"quoted strings"from the query- Remaining text is tokenised normally for keyword search
- Phrase matching checks title + content + all section content (case-insensitive)
- Phrase boost: +8.0 per matching phrase
- Phrase terms are also compiled as regexes for snippet highlighting
- Empty quotes
""are left as-is (regex requires[^"]+) - Whitespace-only quotes are ignored
Improved Scoring Weights
- Replaced magic numbers with named constants for clarity:
scoreExactWord = 1.0-- exact word in indexscorePrefixWord = 0.5-- prefix/partial word matchscoreFuzzyWord = 0.3-- Levenshtein fuzzy matchscoreTitleBoost = 10.0-- query word in topic titlescoreSectionBoost = 5.0-- query word in section titlescoreTagBoost = 3.0-- query word matches a tag (NEW)scorePhraseBoost = 8.0-- exact phrase match (NEW)scoreAllWords = 2.0-- all query words present (NEW)fuzzyMaxDistance = 2-- max Levenshtein distance
New Scoring Features
- Tag boost (+3.0): topics with tags matching query words rank higher
- Multi-word bonus (+2.0): topics containing ALL query words get a bonus
- Both are additive with existing boosts (title, section, exact/prefix)
API Compatibility
Search(query string) []*SearchResultsignature unchanged- All existing behaviour preserved; new features are additive
- Existing tests pass without modification