Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go and targeted tests for all uncovered branches in search.go. Add BenchmarkSearch with 150 topics (~745us/op baseline). Phase 1: Implement three search improvements: - Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars) - Quoted phrase search via extractPhrases() with +8.0 boost - Tag boost (+3.0) and multi-word bonus (+2.0) scoring - Named scoring constants replacing magic numbers All changes are backward-compatible; Search() signature unchanged. Co-Authored-By: Charon <developers@lethean.io>
87 lines
3.2 KiB
Markdown
87 lines
3.2 KiB
Markdown
# FINDINGS.md -- go-help
|
|
|
|
## 2026-02-19: Split from core/go (Virgil)
|
|
|
|
### Origin
|
|
|
|
Extracted from `forge.lthn.ai/core/go` `pkg/help/` on 19 Feb 2026.
|
|
|
|
### Architecture
|
|
|
|
- `Topic` struct with title, body, tags, related topics
|
|
- `Section` groups topics under a heading
|
|
- `Frontmatter` for YAML metadata in topic files
|
|
- `Catalog` loads topics from YAML files on disk
|
|
- `Search` provides keyword search across topics with scoring:
|
|
- Title match: +10
|
|
- Section match: +5
|
|
- Partial/body match: +0.5
|
|
|
|
### Dependencies
|
|
|
|
- Pure Go, only external dependency is `gopkg.in/yaml.v3`
|
|
|
|
### Tests
|
|
|
|
- 2 test files covering catalog loading and search behaviour
|
|
|
|
## 2026-02-20: Phase 0 + Phase 1 (Charon)
|
|
|
|
### Phase 0: Coverage 92.1% -> 100%
|
|
|
|
- Created `catalog_test.go` — the entire `catalog.go` was untested (0%)
|
|
- Added targeted search tests for previously uncovered branches:
|
|
- Nil topic guard in `Search()` (stale index references)
|
|
- Alphabetical tie-breaking when scores are equal
|
|
- Headings-only content in snippet extraction (no body text)
|
|
- Whitespace-only content trimmed to empty in snippets
|
|
- Empty regex slice in `highlight()`
|
|
- Overlapping match extension in highlight merging
|
|
- Added `BenchmarkSearch` with 150 generated topics
|
|
- Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
|
|
- `go vet ./...` clean
|
|
|
|
### Phase 1: Search Improvements
|
|
|
|
#### Fuzzy Matching (Levenshtein distance)
|
|
|
|
- Implemented `levenshtein()` using two-row DP (memory-efficient)
|
|
- Integrated into `Search()` with max edit distance of 2
|
|
- Only applied to query words >= 3 characters (avoids noise from short words)
|
|
- Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
|
|
- Skips words already matched as exact or prefix (no double-counting)
|
|
|
|
#### Phrase Search
|
|
|
|
- `extractPhrases()` pulls `"quoted strings"` from the query
|
|
- Remaining text is tokenised normally for keyword search
|
|
- Phrase matching checks title + content + all section content (case-insensitive)
|
|
- Phrase boost: +8.0 per matching phrase
|
|
- Phrase terms are also compiled as regexes for snippet highlighting
|
|
- Empty quotes `""` are left as-is (regex requires `[^"]+`)
|
|
- Whitespace-only quotes are ignored
|
|
|
|
#### Improved Scoring Weights
|
|
|
|
- Replaced magic numbers with named constants for clarity:
|
|
- `scoreExactWord = 1.0` -- exact word in index
|
|
- `scorePrefixWord = 0.5` -- prefix/partial word match
|
|
- `scoreFuzzyWord = 0.3` -- Levenshtein fuzzy match
|
|
- `scoreTitleBoost = 10.0` -- query word in topic title
|
|
- `scoreSectionBoost = 5.0` -- query word in section title
|
|
- `scoreTagBoost = 3.0` -- query word matches a tag (NEW)
|
|
- `scorePhraseBoost = 8.0` -- exact phrase match (NEW)
|
|
- `scoreAllWords = 2.0` -- all query words present (NEW)
|
|
- `fuzzyMaxDistance = 2` -- max Levenshtein distance
|
|
|
|
#### New Scoring Features
|
|
|
|
- **Tag boost** (+3.0): topics with tags matching query words rank higher
|
|
- **Multi-word bonus** (+2.0): topics containing ALL query words get a bonus
|
|
- Both are additive with existing boosts (title, section, exact/prefix)
|
|
|
|
### API Compatibility
|
|
|
|
- `Search(query string) []*SearchResult` signature unchanged
|
|
- All existing behaviour preserved; new features are additive
|
|
- Existing tests pass without modification
|