Add "Search-Engine"

2026-02-19 16:58:56 +00:00 · 2026-02-19 16:58:56 +00:00 · d0b53acff0
commit d0b53acff0
parent 81cefa9f31
1 changed files with 118 additions and 0 deletions
--- a/Search-Engine.-.md
+++ b/Search-Engine.-.md
@ -0,0 +1,118 @@
+# Search Engine
+
+go-help includes a full-text search engine built on an inverted index. It supports word tokenisation, prefix matching, title and section boosting, and snippet extraction with match highlighting.
+
+## Architecture
+
+The search index is an inverted map from words to topic IDs:
+
+```go
+type searchIndex struct {
+    topics map[string]*Topic   // topicID -> Topic
+    index  map[string][]string // word -> []topicID
+}
+```
+
+When a topic is added via `Catalog.Add`, the index processes:
+
+1. **Title words** (indexed for title-boost scoring)
+2. **Content words** (full body text)
+3. **Section titles and content** (each heading and its body)
+4. **Tags** (all tag values)
+
+## Tokenisation
+
+The `tokenize` function splits text into lowercase words:
+
+- Characters: letters and digits are kept
+- Separators: everything else acts as a word boundary
+- Minimum length: single-character words are discarded
+
+```
+"Getting Started with Go"  ->  ["getting", "started", "with", "go"]
+"API Reference (v2)"       ->  ["api", "reference", "v2"]
+```
+
+This produces a flat list suitable for both indexing and query processing.
+
+## Scoring Algorithm
+
+When `Search(query)` is called, the query is tokenised and scored against the index:
+
+### Base Score
+
+For each query word:
+- **Exact match**: +1.0 per topic found in the index for that word
+- **Prefix match**: +0.5 per topic where an indexed word starts with the query word (but is not an exact match)
+
+### Boosts
+
+After base scoring, additional boosts are applied:
+
+| Condition | Boost |
+|-----------|-------|
+| Query word appears in topic title | +10.0 |
+| Query word appears in matching section title | +5.0 |
+
+### Sorting
+
+Results are sorted by descending score. Ties are broken alphabetically by topic title.
+
+## Search Results
+
+```go
+type SearchResult struct {
+    Topic   *Topic   // The matched topic
+    Section *Section // Best matching section (nil if topic-level)
+    Score   float64  // Relevance score
+    Snippet string   // Context around match with highlighting
+}
+```
+
+### Example
+
+```go
+results := catalog.Search("deploy docker")
+for _, r := range results {
+    section := ""
+    if r.Section != nil {
+        section = " > " + r.Section.Title
+    }
+    fmt.Printf("[%.1f] %s%s\n  %s\n", r.Score, r.Topic.Title, section, r.Snippet)
+}
+```
+
+## Snippet Extraction
+
+The search engine extracts a ~150-character snippet around the first match in the best-matching section:
+
+1. **Find** the position of the first regex match in the content
+2. **Extract** a window of 150 runes centred on the match
+3. **Trim** to word boundaries (adding `...` prefix/suffix as needed)
+4. **Highlight** all matches by wrapping them in `**bold**` markers
+
+If no regex matches are found, the snippet falls back to the first non-empty, non-heading line of the section content.
+
+## Highlighting
+
+The `highlight` function wraps matched text in `**` markers:
+
+```
+Input:  "How to deploy with docker compose"
+Query:  "deploy docker"
+Output: "How to **deploy** with **docker** compose"
+```
+
+Overlapping or adjacent matches are merged before highlighting to avoid nested markers.
+
+## Best Match Selection
+
+For each topic in the results, the engine selects the single best-matching section:
+
+1. Score each section by counting query word matches in its title (weighted 2x) and content
+2. The section with the highest combined score is selected
+3. The snippet is extracted from that section's content
+
+If no section matches, the snippet is extracted from the topic-level content.
+
+See also: [[Home]] | [[Topics-and-Catalog]]