feat(search): add fuzzy matching, phrase search, and improved scoring

Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go
and targeted tests for all uncovered branches in search.go. Add
BenchmarkSearch with 150 topics (~745us/op baseline).

Phase 1: Implement three search improvements:
- Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars)
- Quoted phrase search via extractPhrases() with +8.0 boost
- Tag boost (+3.0) and multi-word bonus (+2.0) scoring
- Named scoring constants replacing magic numbers

All changes are backward-compatible; Search() signature unchanged.

Co-Authored-By: Charon <developers@lethean.io>

2026-02-20 01:21:35 +00:00

3.2 KiB

Raw Blame History

FINDINGS.md -- go-help

2026-02-19: Split from core/go (Virgil)

Origin

Extracted from forge.lthn.ai/core/go pkg/help/ on 19 Feb 2026.

Architecture

Topic struct with title, body, tags, related topics
Section groups topics under a heading
Frontmatter for YAML metadata in topic files
Catalog loads topics from YAML files on disk
Search provides keyword search across topics with scoring:
- Title match: +10
- Section match: +5
- Partial/body match: +0.5

Dependencies

Pure Go, only external dependency is gopkg.in/yaml.v3

Tests

2 test files covering catalog loading and search behaviour

2026-02-20: Phase 0 + Phase 1 (Charon)

Phase 0: Coverage 92.1% -> 100%

Created catalog_test.go — the entire catalog.go was untested (0%)
Added targeted search tests for previously uncovered branches:
- Nil topic guard in Search() (stale index references)
- Alphabetical tie-breaking when scores are equal
- Headings-only content in snippet extraction (no body text)
- Whitespace-only content trimmed to empty in snippets
- Empty regex slice in highlight()
- Overlapping match extension in highlight merging
Added BenchmarkSearch with 150 generated topics
- Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
go vet ./... clean

Phase 1: Search Improvements

Fuzzy Matching (Levenshtein distance)

Implemented levenshtein() using two-row DP (memory-efficient)
Integrated into Search() with max edit distance of 2
Only applied to query words >= 3 characters (avoids noise from short words)
Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
Skips words already matched as exact or prefix (no double-counting)

Phrase Search

extractPhrases() pulls "quoted strings" from the query
Remaining text is tokenised normally for keyword search
Phrase matching checks title + content + all section content (case-insensitive)
Phrase boost: +8.0 per matching phrase
Phrase terms are also compiled as regexes for snippet highlighting
Empty quotes "" are left as-is (regex requires [^"]+)
Whitespace-only quotes are ignored

Improved Scoring Weights

Replaced magic numbers with named constants for clarity:
- scoreExactWord = 1.0 -- exact word in index
- scorePrefixWord = 0.5 -- prefix/partial word match
- scoreFuzzyWord = 0.3 -- Levenshtein fuzzy match
- scoreTitleBoost = 10.0 -- query word in topic title
- scoreSectionBoost = 5.0 -- query word in section title
- scoreTagBoost = 3.0 -- query word matches a tag (NEW)
- scorePhraseBoost = 8.0 -- exact phrase match (NEW)
- scoreAllWords = 2.0 -- all query words present (NEW)
- fuzzyMaxDistance = 2 -- max Levenshtein distance

New Scoring Features

Tag boost (+3.0): topics with tags matching query words rank higher
Multi-word bonus (+2.0): topics containing ALL query words get a bonus
Both are additive with existing boosts (title, section, exact/prefix)

API Compatibility

Search(query string) []*SearchResult signature unchanged
All existing behaviour preserved; new features are additive
Existing tests pass without modification

3.2 KiB Raw Blame History

FINDINGS.md -- go-help

2026-02-19: Split from core/go (Virgil)

Origin

Architecture

Dependencies

Tests

2026-02-20: Phase 0 + Phase 1 (Charon)

Phase 0: Coverage 92.1% -> 100%

Phase 1: Search Improvements

Fuzzy Matching (Levenshtein distance)

Phrase Search

Improved Scoring Weights

New Scoring Features

API Compatibility

3.2 KiB

Raw Blame History