Search Engine
go-help includes a full-text search engine built on an inverted index. It supports word tokenisation, prefix matching, title and section boosting, and snippet extraction with match highlighting.
Architecture
The search index is an inverted map from words to topic IDs:
type searchIndex struct {
topics map[string]*Topic // topicID -> Topic
index map[string][]string // word -> []topicID
}
When a topic is added via Catalog.Add, the index processes:
- Title words (indexed for title-boost scoring)
- Content words (full body text)
- Section titles and content (each heading and its body)
- Tags (all tag values)
Tokenisation
The tokenize function splits text into lowercase words:
- Characters: letters and digits are kept
- Separators: everything else acts as a word boundary
- Minimum length: single-character words are discarded
"Getting Started with Go" -> ["getting", "started", "with", "go"]
"API Reference (v2)" -> ["api", "reference", "v2"]
This produces a flat list suitable for both indexing and query processing.
Scoring Algorithm
When Search(query) is called, the query is tokenised and scored against the index:
Base Score
For each query word:
- Exact match: +1.0 per topic found in the index for that word
- Prefix match: +0.5 per topic where an indexed word starts with the query word (but is not an exact match)
Boosts
After base scoring, additional boosts are applied:
| Condition | Boost |
|---|---|
| Query word appears in topic title | +10.0 |
| Query word appears in matching section title | +5.0 |
Sorting
Results are sorted by descending score. Ties are broken alphabetically by topic title.
Search Results
type SearchResult struct {
Topic *Topic // The matched topic
Section *Section // Best matching section (nil if topic-level)
Score float64 // Relevance score
Snippet string // Context around match with highlighting
}
Example
results := catalog.Search("deploy docker")
for _, r := range results {
section := ""
if r.Section != nil {
section = " > " + r.Section.Title
}
fmt.Printf("[%.1f] %s%s\n %s\n", r.Score, r.Topic.Title, section, r.Snippet)
}
Snippet Extraction
The search engine extracts a ~150-character snippet around the first match in the best-matching section:
- Find the position of the first regex match in the content
- Extract a window of 150 runes centred on the match
- Trim to word boundaries (adding
...prefix/suffix as needed) - Highlight all matches by wrapping them in
**bold**markers
If no regex matches are found, the snippet falls back to the first non-empty, non-heading line of the section content.
Highlighting
The highlight function wraps matched text in ** markers:
Input: "How to deploy with docker compose"
Query: "deploy docker"
Output: "How to **deploy** with **docker** compose"
Overlapping or adjacent matches are merged before highlighting to avoid nested markers.
Best Match Selection
For each topic in the results, the engine selects the single best-matching section:
- Score each section by counting query word matches in its title (weighted 2x) and content
- The section with the highest combined score is selected
- The snippet is extracted from that section's content
If no section matches, the snippet is extracted from the topic-level content.
See also: Home | Topics-and-Catalog