go-help/FINDINGS.md
Claude 3e91510bcf
feat(search): add fuzzy matching, phrase search, and improved scoring
Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go
and targeted tests for all uncovered branches in search.go. Add
BenchmarkSearch with 150 topics (~745us/op baseline).

Phase 1: Implement three search improvements:
- Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars)
- Quoted phrase search via extractPhrases() with +8.0 boost
- Tag boost (+3.0) and multi-word bonus (+2.0) scoring
- Named scoring constants replacing magic numbers

All changes are backward-compatible; Search() signature unchanged.

Co-Authored-By: Charon <developers@lethean.io>
2026-02-20 01:21:35 +00:00

3.2 KiB

FINDINGS.md -- go-help

2026-02-19: Split from core/go (Virgil)

Origin

Extracted from forge.lthn.ai/core/go pkg/help/ on 19 Feb 2026.

Architecture

  • Topic struct with title, body, tags, related topics
  • Section groups topics under a heading
  • Frontmatter for YAML metadata in topic files
  • Catalog loads topics from YAML files on disk
  • Search provides keyword search across topics with scoring:
    • Title match: +10
    • Section match: +5
    • Partial/body match: +0.5

Dependencies

  • Pure Go, only external dependency is gopkg.in/yaml.v3

Tests

  • 2 test files covering catalog loading and search behaviour

2026-02-20: Phase 0 + Phase 1 (Charon)

Phase 0: Coverage 92.1% -> 100%

  • Created catalog_test.go — the entire catalog.go was untested (0%)
  • Added targeted search tests for previously uncovered branches:
    • Nil topic guard in Search() (stale index references)
    • Alphabetical tie-breaking when scores are equal
    • Headings-only content in snippet extraction (no body text)
    • Whitespace-only content trimmed to empty in snippets
    • Empty regex slice in highlight()
    • Overlapping match extension in highlight merging
  • Added BenchmarkSearch with 150 generated topics
    • Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
  • go vet ./... clean

Phase 1: Search Improvements

Fuzzy Matching (Levenshtein distance)

  • Implemented levenshtein() using two-row DP (memory-efficient)
  • Integrated into Search() with max edit distance of 2
  • Only applied to query words >= 3 characters (avoids noise from short words)
  • Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
  • Skips words already matched as exact or prefix (no double-counting)
  • extractPhrases() pulls "quoted strings" from the query
  • Remaining text is tokenised normally for keyword search
  • Phrase matching checks title + content + all section content (case-insensitive)
  • Phrase boost: +8.0 per matching phrase
  • Phrase terms are also compiled as regexes for snippet highlighting
  • Empty quotes "" are left as-is (regex requires [^"]+)
  • Whitespace-only quotes are ignored

Improved Scoring Weights

  • Replaced magic numbers with named constants for clarity:
    • scoreExactWord = 1.0 -- exact word in index
    • scorePrefixWord = 0.5 -- prefix/partial word match
    • scoreFuzzyWord = 0.3 -- Levenshtein fuzzy match
    • scoreTitleBoost = 10.0 -- query word in topic title
    • scoreSectionBoost = 5.0 -- query word in section title
    • scoreTagBoost = 3.0 -- query word matches a tag (NEW)
    • scorePhraseBoost = 8.0 -- exact phrase match (NEW)
    • scoreAllWords = 2.0 -- all query words present (NEW)
    • fuzzyMaxDistance = 2 -- max Levenshtein distance

New Scoring Features

  • Tag boost (+3.0): topics with tags matching query words rank higher
  • Multi-word bonus (+2.0): topics containing ALL query words get a bonus
  • Both are additive with existing boosts (title, section, exact/prefix)

API Compatibility

  • Search(query string) []*SearchResult signature unchanged
  • All existing behaviour preserved; new features are additive
  • Existing tests pass without modification