go-help/FINDINGS.md

# FINDINGS.md -- go-help

## 2026-02-19: Split from core/go (Virgil)

### Origin

Extracted from `forge.lthn.ai/core/go` `pkg/help/` on 19 Feb 2026.

### Architecture

- `Topic` struct with title, body, tags, related topics
- `Section` groups topics under a heading
- `Frontmatter` for YAML metadata in topic files
- `Catalog` loads topics from YAML files on disk
- `Search` provides keyword search across topics with scoring:
  - Title match: +10
  - Section match: +5
  - Partial/body match: +0.5

### Dependencies

- Pure Go, only external dependency is `gopkg.in/yaml.v3`

### Tests

- 2 test files covering catalog loading and search behaviour

## 2026-02-20: Phase 0 + Phase 1 (Charon)

### Phase 0: Coverage 92.1% -> 100%

- Created `catalog_test.go` — the entire `catalog.go` was untested (0%)
- Added targeted search tests for previously uncovered branches:
  - Nil topic guard in `Search()` (stale index references)
  - Alphabetical tie-breaking when scores are equal
  - Headings-only content in snippet extraction (no body text)
  - Whitespace-only content trimmed to empty in snippets
  - Empty regex slice in `highlight()`
  - Overlapping match extension in highlight merging
- Added `BenchmarkSearch` with 150 generated topics
  - Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
- `go vet ./...` clean

### Phase 1: Search Improvements

#### Fuzzy Matching (Levenshtein distance)

- Implemented `levenshtein()` using two-row DP (memory-efficient)
- Integrated into `Search()` with max edit distance of 2
- Only applied to query words >= 3 characters (avoids noise from short words)
- Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
- Skips words already matched as exact or prefix (no double-counting)

#### Phrase Search

- `extractPhrases()` pulls `"quoted strings"` from the query
- Remaining text is tokenised normally for keyword search
- Phrase matching checks title + content + all section content (case-insensitive)
- Phrase boost: +8.0 per matching phrase
- Phrase terms are also compiled as regexes for snippet highlighting
- Empty quotes `""` are left as-is (regex requires `[^"]+`)
- Whitespace-only quotes are ignored

#### Improved Scoring Weights

- Replaced magic numbers with named constants for clarity:
  - `scoreExactWord = 1.0` -- exact word in index
  - `scorePrefixWord = 0.5` -- prefix/partial word match
  - `scoreFuzzyWord = 0.3` -- Levenshtein fuzzy match
  - `scoreTitleBoost = 10.0` -- query word in topic title
  - `scoreSectionBoost = 5.0` -- query word in section title
  - `scoreTagBoost = 3.0` -- query word matches a tag (NEW)
  - `scorePhraseBoost = 8.0` -- exact phrase match (NEW)
  - `scoreAllWords = 2.0` -- all query words present (NEW)
  - `fuzzyMaxDistance = 2` -- max Levenshtein distance

#### New Scoring Features

- **Tag boost** (+3.0): topics with tags matching query words rank higher
- **Multi-word bonus** (+2.0): topics containing ALL query words get a bonus
- Both are additive with existing boosts (title, section, exact/prefix)

### API Compatibility

- `Search(query string) []*SearchResult` signature unchanged
- All existing behaviour preserved; new features are additive
- Existing tests pass without modification