Add "Search-Engine"

Virgil 2026-02-19 16:58:56 +00:00
parent 81cefa9f31
commit d0b53acff0

118
Search-Engine.-.md Normal file

@ -0,0 +1,118 @@
# Search Engine
go-help includes a full-text search engine built on an inverted index. It supports word tokenisation, prefix matching, title and section boosting, and snippet extraction with match highlighting.
## Architecture
The search index is an inverted map from words to topic IDs:
```go
type searchIndex struct {
topics map[string]*Topic // topicID -> Topic
index map[string][]string // word -> []topicID
}
```
When a topic is added via `Catalog.Add`, the index processes:
1. **Title words** (indexed for title-boost scoring)
2. **Content words** (full body text)
3. **Section titles and content** (each heading and its body)
4. **Tags** (all tag values)
## Tokenisation
The `tokenize` function splits text into lowercase words:
- Characters: letters and digits are kept
- Separators: everything else acts as a word boundary
- Minimum length: single-character words are discarded
```
"Getting Started with Go" -> ["getting", "started", "with", "go"]
"API Reference (v2)" -> ["api", "reference", "v2"]
```
This produces a flat list suitable for both indexing and query processing.
## Scoring Algorithm
When `Search(query)` is called, the query is tokenised and scored against the index:
### Base Score
For each query word:
- **Exact match**: +1.0 per topic found in the index for that word
- **Prefix match**: +0.5 per topic where an indexed word starts with the query word (but is not an exact match)
### Boosts
After base scoring, additional boosts are applied:
| Condition | Boost |
|-----------|-------|
| Query word appears in topic title | +10.0 |
| Query word appears in matching section title | +5.0 |
### Sorting
Results are sorted by descending score. Ties are broken alphabetically by topic title.
## Search Results
```go
type SearchResult struct {
Topic *Topic // The matched topic
Section *Section // Best matching section (nil if topic-level)
Score float64 // Relevance score
Snippet string // Context around match with highlighting
}
```
### Example
```go
results := catalog.Search("deploy docker")
for _, r := range results {
section := ""
if r.Section != nil {
section = " > " + r.Section.Title
}
fmt.Printf("[%.1f] %s%s\n %s\n", r.Score, r.Topic.Title, section, r.Snippet)
}
```
## Snippet Extraction
The search engine extracts a ~150-character snippet around the first match in the best-matching section:
1. **Find** the position of the first regex match in the content
2. **Extract** a window of 150 runes centred on the match
3. **Trim** to word boundaries (adding `...` prefix/suffix as needed)
4. **Highlight** all matches by wrapping them in `**bold**` markers
If no regex matches are found, the snippet falls back to the first non-empty, non-heading line of the section content.
## Highlighting
The `highlight` function wraps matched text in `**` markers:
```
Input: "How to deploy with docker compose"
Query: "deploy docker"
Output: "How to **deploy** with **docker** compose"
```
Overlapping or adjacent matches are merged before highlighting to avoid nested markers.
## Best Match Selection
For each topic in the results, the engine selects the single best-matching section:
1. Score each section by counting query word matches in its title (weighted 2x) and content
2. The section with the highest combined score is selected
3. The snippet is extracted from that section's content
If no section matches, the snippet is extracted from the topic-level content.
See also: [[Home]] | [[Topics-and-Catalog]]