feat(search): add fuzzy matching, phrase search, and improved scoring

Phase 0: Push test coverage from 92.1% to 100% by adding catalog_test.go
and targeted tests for all uncovered branches in search.go. Add
BenchmarkSearch with 150 topics (~745us/op baseline).

Phase 1: Implement three search improvements:
- Levenshtein-based fuzzy matching (max distance 2, words >= 3 chars)
- Quoted phrase search via extractPhrases() with +8.0 boost
- Tag boost (+3.0) and multi-word bonus (+2.0) scoring
- Named scoring constants replacing magic numbers

All changes are backward-compatible; Search() signature unchanged.

Co-Authored-By: Charon <developers@lethean.io>
This commit is contained in:
Claude 2026-02-20 01:21:35 +00:00
parent b6da16c717
commit 3e91510bcf
No known key found for this signature in database
GPG key ID: AF404715446AEB41
5 changed files with 907 additions and 19 deletions

View file

@ -24,3 +24,64 @@ Extracted from `forge.lthn.ai/core/go` `pkg/help/` on 19 Feb 2026.
### Tests
- 2 test files covering catalog loading and search behaviour
## 2026-02-20: Phase 0 + Phase 1 (Charon)
### Phase 0: Coverage 92.1% -> 100%
- Created `catalog_test.go` — the entire `catalog.go` was untested (0%)
- Added targeted search tests for previously uncovered branches:
- Nil topic guard in `Search()` (stale index references)
- Alphabetical tie-breaking when scores are equal
- Headings-only content in snippet extraction (no body text)
- Whitespace-only content trimmed to empty in snippets
- Empty regex slice in `highlight()`
- Overlapping match extension in highlight merging
- Added `BenchmarkSearch` with 150 generated topics
- Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X)
- `go vet ./...` clean
### Phase 1: Search Improvements
#### Fuzzy Matching (Levenshtein distance)
- Implemented `levenshtein()` using two-row DP (memory-efficient)
- Integrated into `Search()` with max edit distance of 2
- Only applied to query words >= 3 characters (avoids noise from short words)
- Score: +0.3 per fuzzy match (lower than prefix +0.5 and exact +1.0)
- Skips words already matched as exact or prefix (no double-counting)
#### Phrase Search
- `extractPhrases()` pulls `"quoted strings"` from the query
- Remaining text is tokenised normally for keyword search
- Phrase matching checks title + content + all section content (case-insensitive)
- Phrase boost: +8.0 per matching phrase
- Phrase terms are also compiled as regexes for snippet highlighting
- Empty quotes `""` are left as-is (regex requires `[^"]+`)
- Whitespace-only quotes are ignored
#### Improved Scoring Weights
- Replaced magic numbers with named constants for clarity:
- `scoreExactWord = 1.0` -- exact word in index
- `scorePrefixWord = 0.5` -- prefix/partial word match
- `scoreFuzzyWord = 0.3` -- Levenshtein fuzzy match
- `scoreTitleBoost = 10.0` -- query word in topic title
- `scoreSectionBoost = 5.0` -- query word in section title
- `scoreTagBoost = 3.0` -- query word matches a tag (NEW)
- `scorePhraseBoost = 8.0` -- exact phrase match (NEW)
- `scoreAllWords = 2.0` -- all query words present (NEW)
- `fuzzyMaxDistance = 2` -- max Levenshtein distance
#### New Scoring Features
- **Tag boost** (+3.0): topics with tags matching query words rank higher
- **Multi-word bonus** (+2.0): topics containing ALL query words get a bonus
- Both are additive with existing boosts (title, section, exact/prefix)
### API Compatibility
- `Search(query string) []*SearchResult` signature unchanged
- All existing behaviour preserved; new features are additive
- Existing tests pass without modification

19
TODO.md
View file

@ -6,17 +6,22 @@ Dispatched from core/go orchestration. Pick up tasks in order.
## Phase 0: Hardening & Test Coverage
- [ ] **Expand parser tests**`parser_test.go` exists but coverage unknown. Add tests for: empty input, frontmatter-only (no body), malformed YAML frontmatter, deeply nested headings (####, #####), Unicode content, very long documents (10K+ lines).
- [ ] **Expand search tests**`search_test.go` exists. Add tests for: empty query, no results, case sensitivity, multi-word queries, special characters in query, overlapping matches, scoring boundary cases (exact title match vs partial body match).
- [ ] **Benchmark search** — Add `BenchmarkSearch` with catalog of 100+ topics. Measure search latency and allocation count. Baseline for Phase 1 improvements.
- [ ] **`go vet ./...` clean** — Verify no vet warnings. Fix any found.
- [x] **Expand parser tests** — Parser already at 100%. Existing tests cover: empty input, frontmatter-only, malformed YAML, all heading levels (H1-H6), Unicode content, path-derived IDs.
- [x] **Expand search tests** — Added tests for: empty query, no results, case sensitivity, multi-word queries, special characters, overlapping matches, scoring boundary cases, nil-topic guard, snippet edge cases (headings-only, whitespace-only).
- [x] **Add catalog tests** — Created `catalog_test.go` covering: DefaultCatalog, Add, List, Search, Get (found/not-found), score tie-breaking.
- [x] **Benchmark search**`BenchmarkSearch` with 150 topics. Baseline: ~745us/op, ~392KB/op, 4114 allocs/op (Ryzen 9 9950X).
- [x] **`go vet ./...` clean** — No warnings.
- [x] **Coverage: 100%** — Up from 92.1%.
## Phase 1: Search Improvements
- [ ] Add fuzzy matching (Levenshtein distance or similar)
- [x] **Fuzzy matching** — Levenshtein distance with max edit distance of 2. Words under 3 chars skip fuzzy. Score: +0.3 per fuzzy match (below prefix +0.5 and exact +1.0).
- [ ] Add stemming support for English search terms
- [ ] Add phrase search (quoted multi-word queries)
- [ ] Improve scoring weights — currently title +10, section +5, partial +0.5
- [x] **Phrase search** — Quoted multi-word queries via `extractPhrases()`. Phrase boost: +8.0. Searches title, content, and section content.
- [x] **Improved scoring weights** — Named constants: title +10, section +5, tag +3, phrase +8, all-words bonus +2, exact +1.0, prefix +0.5, fuzzy +0.3.
- [x] **Tag boost** — Query words matching tags add +3.0 per matching tag.
- [x] **Multi-word bonus** — All query words present in topic adds +2.0.
- [x] **Tests for all new features** — Levenshtein, min3, extractPhrases, fuzzy search, phrase search, tag boost, multi-word bonus, scoring constants, phrase highlighting, section phrase matching.
## Phase 2: core.help Integration

212
catalog_test.go Normal file
View file

@ -0,0 +1,212 @@
package help
import (
"fmt"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestDefaultCatalog_Good(t *testing.T) {
c := DefaultCatalog()
require.NotNil(t, c)
require.NotNil(t, c.topics)
require.NotNil(t, c.index)
t.Run("contains built-in topics", func(t *testing.T) {
topics := c.List()
assert.GreaterOrEqual(t, len(topics), 2, "should have at least 2 default topics")
})
t.Run("getting-started topic exists", func(t *testing.T) {
topic, err := c.Get("getting-started")
require.NoError(t, err)
assert.Equal(t, "Getting Started", topic.Title)
assert.Contains(t, topic.Content, "Common Commands")
})
t.Run("config topic exists", func(t *testing.T) {
topic, err := c.Get("config")
require.NoError(t, err)
assert.Equal(t, "Configuration", topic.Title)
assert.Contains(t, topic.Content, "Environment Variables")
})
}
func TestCatalog_Add_Good(t *testing.T) {
c := &Catalog{
topics: make(map[string]*Topic),
index: newSearchIndex(),
}
topic := &Topic{
ID: "test-topic",
Title: "Test Topic",
Content: "This is a test topic for unit testing.",
Tags: []string{"test", "unit"},
}
c.Add(topic)
t.Run("topic is retrievable after add", func(t *testing.T) {
got, err := c.Get("test-topic")
require.NoError(t, err)
assert.Equal(t, topic, got)
})
t.Run("topic is searchable after add", func(t *testing.T) {
results := c.Search("test")
assert.NotEmpty(t, results)
})
t.Run("overwrite existing topic", func(t *testing.T) {
replacement := &Topic{
ID: "test-topic",
Title: "Replaced Topic",
Content: "Replacement content.",
}
c.Add(replacement)
got, err := c.Get("test-topic")
require.NoError(t, err)
assert.Equal(t, "Replaced Topic", got.Title)
})
}
func TestCatalog_List_Good(t *testing.T) {
c := &Catalog{
topics: make(map[string]*Topic),
index: newSearchIndex(),
}
t.Run("empty catalog returns empty list", func(t *testing.T) {
list := c.List()
assert.Empty(t, list)
})
t.Run("returns all added topics", func(t *testing.T) {
c.Add(&Topic{ID: "alpha", Title: "Alpha"})
c.Add(&Topic{ID: "beta", Title: "Beta"})
c.Add(&Topic{ID: "gamma", Title: "Gamma"})
list := c.List()
assert.Len(t, list, 3)
// Collect IDs (order is not guaranteed from map)
ids := make(map[string]bool)
for _, t := range list {
ids[t.ID] = true
}
assert.True(t, ids["alpha"])
assert.True(t, ids["beta"])
assert.True(t, ids["gamma"])
})
}
func TestCatalog_Search_Good(t *testing.T) {
c := DefaultCatalog()
t.Run("finds default topics", func(t *testing.T) {
results := c.Search("configuration")
assert.NotEmpty(t, results)
})
t.Run("empty query returns nil", func(t *testing.T) {
results := c.Search("")
assert.Nil(t, results)
})
t.Run("no match returns empty", func(t *testing.T) {
results := c.Search("zzzyyyxxx")
assert.Empty(t, results)
})
}
func TestCatalog_Get_Good(t *testing.T) {
c := &Catalog{
topics: make(map[string]*Topic),
index: newSearchIndex(),
}
c.Add(&Topic{ID: "exists", Title: "Existing Topic"})
t.Run("existing topic", func(t *testing.T) {
topic, err := c.Get("exists")
require.NoError(t, err)
assert.Equal(t, "Existing Topic", topic.Title)
})
t.Run("missing topic returns error", func(t *testing.T) {
topic, err := c.Get("does-not-exist")
assert.Nil(t, topic)
assert.Error(t, err)
assert.Contains(t, err.Error(), "topic not found")
assert.Contains(t, err.Error(), "does-not-exist")
})
}
func TestCatalog_Search_Good_ScoreTiebreaking(t *testing.T) {
// Tests the alphabetical tie-breaking in search result sorting (search.go:165).
c := &Catalog{
topics: make(map[string]*Topic),
index: newSearchIndex(),
}
// Add topics with identical content so they receive the same score.
c.Add(&Topic{
ID: "zebra-topic",
Title: "Zebra",
Content: "Unique keyword zephyr.",
})
c.Add(&Topic{
ID: "alpha-topic",
Title: "Alpha",
Content: "Unique keyword zephyr.",
})
results := c.Search("zephyr")
require.Len(t, results, 2)
// With equal scores, results should be sorted alphabetically by title.
assert.Equal(t, "Alpha", results[0].Topic.Title)
assert.Equal(t, "Zebra", results[1].Topic.Title)
assert.Equal(t, results[0].Score, results[1].Score,
"scores should be equal for tie-breaking to apply")
}
func BenchmarkSearch(b *testing.B) {
// Build a catalog with 100+ topics for benchmarking.
c := &Catalog{
topics: make(map[string]*Topic),
index: newSearchIndex(),
}
for i := 0; i < 150; i++ {
c.Add(&Topic{
ID: fmt.Sprintf("topic-%d", i),
Title: fmt.Sprintf("Topic Number %d About Various Subjects", i),
Content: fmt.Sprintf("This is the content of topic %d. It covers installation, configuration, deployment, and testing of the system.", i),
Tags: []string{"generated", fmt.Sprintf("tag%d", i%10)},
Sections: []Section{
{
ID: fmt.Sprintf("section-%d-a", i),
Title: "Overview",
Content: "An overview of the topic and its purpose.",
},
{
ID: fmt.Sprintf("section-%d-b", i),
Title: "Details",
Content: "Detailed information about the topic including examples and usage.",
},
},
})
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
c.Search("installation configuration")
}
}

180
search.go
View file

@ -7,6 +7,19 @@ import (
"unicode"
)
// Scoring weights for search result ranking.
const (
scoreExactWord = 1.0 // Exact word match in the index
scorePrefixWord = 0.5 // Prefix/partial word match
scoreFuzzyWord = 0.3 // Fuzzy (Levenshtein) match
scoreTitleBoost = 10.0 // Query word appears in topic title
scoreSectionBoost = 5.0 // Query word appears in section title
scoreTagBoost = 3.0 // Query word appears in topic tags
scorePhraseBoost = 8.0 // Exact phrase match in content
scoreAllWords = 2.0 // All query words present (multi-word bonus)
fuzzyMaxDistance = 2 // Maximum edit distance for fuzzy matching
)
// SearchResult represents a search match.
type SearchResult struct {
Topic *Topic
@ -72,10 +85,17 @@ func (i *searchIndex) addToIndex(word, topicID string) {
i.index[word] = append(i.index[word], topicID)
}
// Search finds topics matching the query.
// Search finds topics matching the query. Supports:
// - Single and multi-word keyword queries
// - Quoted phrase search (e.g. `"rate limit"`)
// - Fuzzy matching via Levenshtein distance for typo tolerance
// - Prefix matching for partial words
func (i *searchIndex) Search(query string) []*SearchResult {
queryWords := tokenize(query)
if len(queryWords) == 0 {
// Extract quoted phrases before tokenising.
phrases, stripped := extractPhrases(query)
queryWords := tokenize(stripped)
if len(queryWords) == 0 && len(phrases) == 0 {
return nil
}
@ -86,7 +106,7 @@ func (i *searchIndex) Search(query string) []*SearchResult {
// Exact matches
if topicIDs, ok := i.index[word]; ok {
for _, topicID := range topicIDs {
scores[topicID] += 1.0
scores[topicID] += scoreExactWord
}
}
@ -94,7 +114,25 @@ func (i *searchIndex) Search(query string) []*SearchResult {
for indexWord, topicIDs := range i.index {
if strings.HasPrefix(indexWord, word) && indexWord != word {
for _, topicID := range topicIDs {
scores[topicID] += 0.5 // Lower score for partial matches
scores[topicID] += scorePrefixWord
}
}
}
// Fuzzy matches (Levenshtein distance)
if len(word) >= 3 {
for indexWord, topicIDs := range i.index {
if indexWord == word {
continue // Already scored as exact match
}
if strings.HasPrefix(indexWord, word) {
continue // Already scored as prefix match
}
dist := levenshtein(word, indexWord)
if dist > 0 && dist <= fuzzyMaxDistance {
for _, topicID := range topicIDs {
scores[topicID] += scoreFuzzyWord
}
}
}
}
@ -109,8 +147,28 @@ func (i *searchIndex) Search(query string) []*SearchResult {
}
}
}
// Also add phrase regexes for highlighting
for _, phrase := range phrases {
if re, err := regexp.Compile("(?i)" + regexp.QuoteMeta(phrase)); err == nil {
res = append(res, re)
}
}
// Build results with title boost and snippet extraction
// Phrase matching: boost topics that contain the exact phrase.
for _, phrase := range phrases {
phraseLower := strings.ToLower(phrase)
for topicID, topic := range i.topics {
text := strings.ToLower(topic.Title + " " + topic.Content)
for _, section := range topic.Sections {
text += " " + strings.ToLower(section.Title+" "+section.Content)
}
if strings.Contains(text, phraseLower) {
scores[topicID] += scorePhraseBoost
}
}
}
// Build results with title/section/tag boosts and snippet extraction
var results []*SearchResult
for topicID, score := range scores {
topic := i.topics[topicID]
@ -120,15 +178,40 @@ func (i *searchIndex) Search(query string) []*SearchResult {
// Title boost: if query words appear in title
titleLower := strings.ToLower(topic.Title)
hasTitleMatch := false
titleMatchCount := 0
for _, word := range queryWords {
if strings.Contains(titleLower, word) {
hasTitleMatch = true
break
titleMatchCount++
}
}
if hasTitleMatch {
score += 10.0
if titleMatchCount > 0 {
score += scoreTitleBoost
}
// Tag boost: if query words match tags
for _, tag := range topic.Tags {
tagLower := strings.ToLower(tag)
for _, word := range queryWords {
if tagLower == word || strings.Contains(tagLower, word) {
score += scoreTagBoost
break
}
}
}
// Multi-word bonus: if all query words are present in the topic
if len(queryWords) > 1 {
allPresent := true
fullText := strings.ToLower(topic.Title + " " + topic.Content)
for _, word := range queryWords {
if !strings.Contains(fullText, word) {
allPresent = false
break
}
}
if allPresent {
score += scoreAllWords
}
}
// Find matching section and extract snippet
@ -145,7 +228,7 @@ func (i *searchIndex) Search(query string) []*SearchResult {
}
}
if hasSectionTitleMatch {
score += 5.0
score += scoreSectionBoost
}
}
@ -168,6 +251,79 @@ func (i *searchIndex) Search(query string) []*SearchResult {
return results
}
// extractPhrases pulls quoted substrings from the query and returns them
// alongside the remaining query text with quotes removed.
// For example: `hello "rate limit" world` returns
// phrases=["rate limit"], remaining="hello world".
func extractPhrases(query string) (phrases []string, remaining string) {
re := regexp.MustCompile(`"([^"]+)"`)
matches := re.FindAllStringSubmatch(query, -1)
for _, m := range matches {
phrase := strings.TrimSpace(m[1])
if phrase != "" {
phrases = append(phrases, phrase)
}
}
remaining = re.ReplaceAllString(query, "")
return phrases, remaining
}
// levenshtein computes the edit distance between two strings.
// Used for fuzzy matching to tolerate typos in search queries.
func levenshtein(a, b string) int {
aRunes := []rune(a)
bRunes := []rune(b)
aLen := len(aRunes)
bLen := len(bRunes)
if aLen == 0 {
return bLen
}
if bLen == 0 {
return aLen
}
// Use two rows instead of full matrix to save memory.
prev := make([]int, bLen+1)
curr := make([]int, bLen+1)
for j := 0; j <= bLen; j++ {
prev[j] = j
}
for i := 1; i <= aLen; i++ {
curr[0] = i
for j := 1; j <= bLen; j++ {
cost := 1
if aRunes[i-1] == bRunes[j-1] {
cost = 0
}
curr[j] = min3(
prev[j]+1, // deletion
curr[j-1]+1, // insertion
prev[j-1]+cost, // substitution
)
}
prev, curr = curr, prev
}
return prev[bLen]
}
// min3 returns the minimum of three integers.
func min3(a, b, c int) int {
if a < b {
if a < c {
return a
}
return c
}
if b < c {
return b
}
return c
}
// findBestMatch finds the section with the best match and extracts a snippet.
func (i *searchIndex) findBestMatch(topic *Topic, queryWords []string, res []*regexp.Regexp) (*Section, string) {
var bestSection *Section

View file

@ -338,3 +338,457 @@ func TestSearchResult_Score_Good(t *testing.T) {
assert.Equal(t, "topic-in-title", results[0].Topic.ID)
assert.Greater(t, results[0].Score, results[1].Score)
}
func TestExtractSnippet_Good_HeadingsOnly(t *testing.T) {
// Content with only headings and no body text should return empty snippet
// when no regexes are provided. Covers the empty-return branch.
content := "# Heading One\n## Heading Two\n### Heading Three"
snippet := extractSnippet(content, nil)
assert.Empty(t, snippet, "headings-only content without regexes should return empty snippet")
}
func TestExtractSnippet_Good_SnippetTrimmedToEmpty(t *testing.T) {
// After word-boundary trimming the snippet could become empty.
// This exercises the snippet=="" guard after TrimSpace.
content := strings.Repeat(" ", 200)
snippet := extractSnippet(content, compileRegexes([]string{"zz"}))
assert.Empty(t, snippet, "whitespace-only content should yield empty snippet")
}
func TestHighlight_Good_EmptyRegexes(t *testing.T) {
// Calling highlight with an empty regex slice should return the
// text unchanged. Covers the early-return branch.
result := highlight("some text here", nil)
assert.Equal(t, "some text here", result)
result = highlight("some text here", []*regexp.Regexp{})
assert.Equal(t, "some text here", result)
}
func TestHighlight_Good_OverlappingExtension(t *testing.T) {
// Test overlapping matches where the second match extends past the
// first, exercising the curr.end extension branch in merging.
text := "abcdefghij"
// First regex matches "abcdef", second matches "cdefghij"
// They overlap and the second extends the merged range.
re1, _ := regexp.Compile("abcdef")
re2, _ := regexp.Compile("cdefghij")
result := highlight(text, []*regexp.Regexp{re1, re2})
assert.Equal(t, "**abcdefghij**", result)
}
func TestSearchIndex_Search_Good_NilTopicGuard(t *testing.T) {
// Manually inject a stale reference in the scores map so the
// nil-topic guard is exercised. We do this by manipulating the
// index directly.
idx := newSearchIndex()
idx.Add(&Topic{
ID: "real-topic",
Title: "Real Topic",
Content: "Contains testword for matching.",
})
// Inject a mapping from "testword" to a non-existent topic ID.
idx.index["testword"] = append(idx.index["testword"], "ghost-topic")
results := idx.Search("testword")
// Should still find the real topic, ghost-topic should be skipped.
assert.Len(t, results, 1)
assert.Equal(t, "real-topic", results[0].Topic.ID)
}
func TestSearchIndex_Search_Good_SpecialCharacters(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "special-chars",
Title: "Rate Limiting (v2.0)",
Content: "Configure rate-limiting rules with special characters: @#$%.",
})
t.Run("query with special characters", func(t *testing.T) {
results := idx.Search("rate limiting")
assert.NotEmpty(t, results)
assert.Equal(t, "special-chars", results[0].Topic.ID)
})
t.Run("query with punctuation stripped", func(t *testing.T) {
results := idx.Search("v2.0")
assert.NotEmpty(t, results)
})
}
func TestSearchIndex_Search_Good_CaseInsensitive(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "case-test",
Title: "UPPERCASE Title",
Content: "MiXeD CaSe content here.",
})
t.Run("lowercase query finds uppercase content", func(t *testing.T) {
results := idx.Search("uppercase")
assert.NotEmpty(t, results)
assert.Equal(t, "case-test", results[0].Topic.ID)
})
t.Run("uppercase query finds content", func(t *testing.T) {
results := idx.Search("MIXED")
assert.NotEmpty(t, results)
})
}
func TestSearchIndex_Search_Good_SingleCharQuery(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "single-char",
Title: "Test Topic",
Content: "Some content.",
})
// Single-character queries are filtered out by tokenize (min 2 chars),
// so searching for "a" should return nil.
results := idx.Search("a")
assert.Nil(t, results)
}
// --- Phase 1: Fuzzy matching tests ---
func TestLevenshtein_Good(t *testing.T) {
tests := []struct {
name string
a string
b string
expected int
}{
{name: "identical strings", a: "hello", b: "hello", expected: 0},
{name: "single substitution", a: "hello", b: "hallo", expected: 1},
{name: "single insertion", a: "hello", b: "helloo", expected: 1},
{name: "single deletion", a: "hello", b: "helo", expected: 1},
{name: "two edits", a: "kitten", b: "sitting", expected: 3},
{name: "completely different", a: "abc", b: "xyz", expected: 3},
{name: "empty first string", a: "", b: "hello", expected: 5},
{name: "empty second string", a: "hello", b: "", expected: 5},
{name: "both empty", a: "", b: "", expected: 0},
{name: "single char strings", a: "a", b: "b", expected: 1},
{name: "same single char", a: "a", b: "a", expected: 0},
{name: "transposition", a: "ab", b: "ba", expected: 2},
{name: "unicode strings", a: "cafe", b: "cafe", expected: 0},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := levenshtein(tt.a, tt.b)
assert.Equal(t, tt.expected, result)
})
}
}
func TestMin3_Good(t *testing.T) {
tests := []struct {
name string
a, b, c int
expected int
}{
{name: "first smallest", a: 1, b: 2, c: 3, expected: 1},
{name: "second smallest", a: 2, b: 1, c: 3, expected: 1},
{name: "third smallest", a: 3, b: 2, c: 1, expected: 1},
{name: "all equal", a: 5, b: 5, c: 5, expected: 5},
{name: "first and second equal smallest", a: 1, b: 1, c: 3, expected: 1},
{name: "second and third equal smallest", a: 3, b: 1, c: 1, expected: 1},
{name: "first and third equal smallest", a: 1, b: 3, c: 1, expected: 1},
{name: "negative values", a: -3, b: -1, c: -2, expected: -3},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := min3(tt.a, tt.b, tt.c)
assert.Equal(t, tt.expected, result)
})
}
}
func TestSearchIndex_Search_Good_FuzzyMatching(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "configuration",
Title: "Configuration Guide",
Content: "Learn how to configure the application settings.",
})
idx.Add(&Topic{
ID: "deployment",
Title: "Deployment Process",
Content: "Deploy your application to production servers.",
})
t.Run("typo in query finds correct topic", func(t *testing.T) {
// "configuraton" is 1 edit away from "configuration"
results := idx.Search("configuraton")
assert.NotEmpty(t, results, "fuzzy match should find results for typo")
found := false
for _, r := range results {
if r.Topic.ID == "configuration" {
found = true
break
}
}
assert.True(t, found, "should find configuration topic with typo")
})
t.Run("two-edit typo still matches", func(t *testing.T) {
// "deplymnt" is within 2 edits of "deployment" — but first check
// that "deploymnt" (1 edit) works.
results := idx.Search("deploymnt")
assert.NotEmpty(t, results, "fuzzy match should find results for 1-edit typo")
})
t.Run("too many edits returns no fuzzy match", func(t *testing.T) {
// "zzzzzzz" is very far from any indexed word.
results := idx.Search("zzzzzzz")
assert.Empty(t, results, "large edit distance should not produce results")
})
t.Run("short words skip fuzzy matching", func(t *testing.T) {
// Words shorter than 3 characters skip fuzzy matching.
// "to" is in the index but "tx" (1 edit) should not fuzzy-match
// because "tx" is only 2 chars.
results := idx.Search("tx")
// May or may not find results via prefix, but should not crash.
_ = results
})
t.Run("fuzzy scores lower than exact", func(t *testing.T) {
// Exact match on "configure" should score higher than fuzzy.
exactResults := idx.Search("configure")
fuzzyResults := idx.Search("configurr")
if len(exactResults) > 0 && len(fuzzyResults) > 0 {
assert.GreaterOrEqual(t, exactResults[0].Score, fuzzyResults[0].Score,
"exact match should score at least as high as fuzzy match")
}
})
}
// --- Phase 1: Phrase search tests ---
func TestExtractPhrases_Good(t *testing.T) {
tests := []struct {
name string
query string
expectedPhrases []string
expectedRemaining string
}{
{
name: "no phrases",
query: "hello world",
expectedPhrases: nil,
expectedRemaining: "hello world",
},
{
name: "single phrase",
query: `"rate limit"`,
expectedPhrases: []string{"rate limit"},
expectedRemaining: "",
},
{
name: "phrase with surrounding words",
query: `configure "rate limit" rules`,
expectedPhrases: []string{"rate limit"},
expectedRemaining: "configure rules",
},
{
name: "multiple phrases",
query: `"rate limit" and "error handling"`,
expectedPhrases: []string{"rate limit", "error handling"},
expectedRemaining: " and ",
},
{
name: "empty quotes left in remaining",
query: `"" hello`,
expectedPhrases: nil,
expectedRemaining: `"" hello`,
},
{
name: "whitespace-only quotes ignored",
query: `" " hello`,
expectedPhrases: nil,
expectedRemaining: " hello",
},
{
name: "unclosed quote treated as plain text",
query: `"unclosed phrase`,
expectedPhrases: nil,
expectedRemaining: `"unclosed phrase`,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
phrases, remaining := extractPhrases(tt.query)
assert.Equal(t, tt.expectedPhrases, phrases)
assert.Equal(t, tt.expectedRemaining, remaining)
})
}
}
func TestSearchIndex_Search_Good_PhraseSearch(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "rate-limiting",
Title: "Rate Limiting",
Content: "Configure rate limit rules for API endpoints. The rate limit protects against abuse.",
Tags: []string{"api", "security"},
})
idx.Add(&Topic{
ID: "error-handling",
Title: "Error Handling",
Content: "Handle errors and rate responses correctly. Limit retries to avoid loops.",
Tags: []string{"errors"},
})
t.Run("quoted phrase matches exact sequence", func(t *testing.T) {
results := idx.Search(`"rate limit"`)
assert.NotEmpty(t, results)
// rate-limiting topic has "rate limit" as exact phrase
assert.Equal(t, "rate-limiting", results[0].Topic.ID)
})
t.Run("unquoted words match both topics", func(t *testing.T) {
results := idx.Search("rate limit")
assert.GreaterOrEqual(t, len(results), 2,
"unquoted query should match both topics that contain the words separately")
})
t.Run("phrase not found yields no phrase boost", func(t *testing.T) {
results := idx.Search(`"nonexistent phrase here"`)
assert.Empty(t, results, "phrase with no tokenisable words and no match should return empty")
})
t.Run("phrase with surrounding keywords", func(t *testing.T) {
results := idx.Search(`"rate limit" api`)
assert.NotEmpty(t, results)
assert.Equal(t, "rate-limiting", results[0].Topic.ID)
})
t.Run("phrase-only query with no loose words", func(t *testing.T) {
// Query is only a quoted phrase; tokenize of remaining is empty,
// but phrase matching should still score topics.
results := idx.Search(`"rate limit"`)
assert.NotEmpty(t, results)
})
t.Run("phrase found in section content", func(t *testing.T) {
sectionIdx := newSearchIndex()
sectionIdx.Add(&Topic{
ID: "section-phrase",
Title: "Advanced Guide",
Content: "Overview of the system.",
Sections: []Section{
{
ID: "limits",
Title: "Limits",
Content: "The rate limit is set per client.",
},
},
})
results := sectionIdx.Search(`"rate limit"`)
assert.NotEmpty(t, results)
assert.Equal(t, "section-phrase", results[0].Topic.ID)
})
}
// --- Phase 1: Improved scoring tests ---
func TestSearchIndex_Search_Good_TagBoost(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "tagged-topic",
Title: "Some Guide",
Content: "General content about the system.",
Tags: []string{"deployment", "production"},
})
idx.Add(&Topic{
ID: "content-topic",
Title: "Other Guide",
Content: "Information about deployment processes.",
})
results := idx.Search("deployment")
assert.NotEmpty(t, results)
// The tagged topic should rank higher because of the tag boost.
assert.Equal(t, "tagged-topic", results[0].Topic.ID,
"topic with matching tag should rank higher")
}
func TestSearchIndex_Search_Good_MultiWordBonus(t *testing.T) {
idx := newSearchIndex()
// Both topics have neutral titles (no query words in title) to
// isolate the multi-word bonus effect.
idx.Add(&Topic{
ID: "both-words",
Title: "Complete Guide",
Content: "Learn about deploying and monitoring in one place.",
})
idx.Add(&Topic{
ID: "one-word",
Title: "Other Guide",
Content: "Just deploying steps without monitoring.",
})
results := idx.Search("deploying monitoring")
assert.NotEmpty(t, results)
// Topic with both words should score higher due to multi-word bonus.
assert.Equal(t, "both-words", results[0].Topic.ID,
"topic containing all query words should rank higher")
}
func TestSearchIndex_Search_Good_ScoringConstants(t *testing.T) {
// Verify the scoring constants are sensible relative to each other.
assert.Greater(t, scoreTitleBoost, scoreSectionBoost,
"title boost should exceed section boost")
assert.Greater(t, scoreSectionBoost, scoreTagBoost,
"section boost should exceed tag boost")
assert.Greater(t, scoreExactWord, scorePrefixWord,
"exact match should score higher than prefix match")
assert.Greater(t, scorePrefixWord, scoreFuzzyWord,
"prefix match should score higher than fuzzy match")
assert.Greater(t, scorePhraseBoost, scoreAllWords,
"phrase boost should exceed multi-word bonus")
}
func TestSearchIndex_Search_Good_PhraseHighlighting(t *testing.T) {
idx := newSearchIndex()
idx.Add(&Topic{
ID: "phrase-highlight",
Title: "API Guide",
Content: "Configure rate limit rules for the API gateway.",
})
results := idx.Search(`"rate limit" api`)
assert.NotEmpty(t, results)
// The snippet should highlight both the phrase and keyword.
if results[0].Snippet != "" {
assert.Contains(t, results[0].Snippet, "**rate limit**",
"phrase should be highlighted in snippet")
}
}