3 Search-Engine
Virgil edited this page 2026-02-19 16:58:56 +00:00

Search Engine

go-help includes a full-text search engine built on an inverted index. It supports word tokenisation, prefix matching, title and section boosting, and snippet extraction with match highlighting.

Architecture

The search index is an inverted map from words to topic IDs:

type searchIndex struct {
    topics map[string]*Topic   // topicID -> Topic
    index  map[string][]string // word -> []topicID
}

When a topic is added via Catalog.Add, the index processes:

  1. Title words (indexed for title-boost scoring)
  2. Content words (full body text)
  3. Section titles and content (each heading and its body)
  4. Tags (all tag values)

Tokenisation

The tokenize function splits text into lowercase words:

  • Characters: letters and digits are kept
  • Separators: everything else acts as a word boundary
  • Minimum length: single-character words are discarded
"Getting Started with Go"  ->  ["getting", "started", "with", "go"]
"API Reference (v2)"       ->  ["api", "reference", "v2"]

This produces a flat list suitable for both indexing and query processing.

Scoring Algorithm

When Search(query) is called, the query is tokenised and scored against the index:

Base Score

For each query word:

  • Exact match: +1.0 per topic found in the index for that word
  • Prefix match: +0.5 per topic where an indexed word starts with the query word (but is not an exact match)

Boosts

After base scoring, additional boosts are applied:

Condition Boost
Query word appears in topic title +10.0
Query word appears in matching section title +5.0

Sorting

Results are sorted by descending score. Ties are broken alphabetically by topic title.

Search Results

type SearchResult struct {
    Topic   *Topic   // The matched topic
    Section *Section // Best matching section (nil if topic-level)
    Score   float64  // Relevance score
    Snippet string   // Context around match with highlighting
}

Example

results := catalog.Search("deploy docker")
for _, r := range results {
    section := ""
    if r.Section != nil {
        section = " > " + r.Section.Title
    }
    fmt.Printf("[%.1f] %s%s\n  %s\n", r.Score, r.Topic.Title, section, r.Snippet)
}

Snippet Extraction

The search engine extracts a ~150-character snippet around the first match in the best-matching section:

  1. Find the position of the first regex match in the content
  2. Extract a window of 150 runes centred on the match
  3. Trim to word boundaries (adding ... prefix/suffix as needed)
  4. Highlight all matches by wrapping them in **bold** markers

If no regex matches are found, the snippet falls back to the first non-empty, non-heading line of the section content.

Highlighting

The highlight function wraps matched text in ** markers:

Input:  "How to deploy with docker compose"
Query:  "deploy docker"
Output: "How to **deploy** with **docker** compose"

Overlapping or adjacent matches are merged before highlighting to avoid nested markers.

Best Match Selection

For each topic in the results, the engine selects the single best-matching section:

  1. Score each section by counting query word matches in its title (weighted 2x) and content
  2. The section with the highest combined score is selected
  3. The snippet is extracted from that section's content

If no section matches, the snippet is extracted from the topic-level content.

See also: Home | Topics-and-Catalog