# Architecture — go-help

Module: `forge.lthn.ai/core/go-help`

## Overview

go-help is a display-agnostic help content management library. It provides a YAML/Markdown catalog of help topics, full-text search with relevance scoring, HTTP serving (both HTML and JSON), goldmark-based Markdown rendering, a static site generator, and CLI help text ingestion. The package has no runtime dependency on a database or external service; it is entirely self-contained.

## Core Types

### Topic

`Topic` (`topic.go`) is the fundamental unit of content:

```go
type Topic struct {
    ID       string    // URL-safe slug, e.g. "getting-started"
    Title    string    // Human-readable title
    Path     string    // Source file path (empty for programmatic topics)
    Content  string    // Raw Markdown body (without frontmatter)
    Sections []Section // Parsed heading hierarchy
    Tags     []string  // Arbitrary classification labels
    Related  []string  // IDs of related topics
    Order    int       // Sort weight (lower = earlier)
}
```

`Section` captures each Markdown heading with its nested body text:

```go
type Section struct {
    ID      string // GenerateID(Title)
    Title   string
    Level   int    // 1–6, matching H1–H6
    Line    int    // 1-indexed line number in Content
    Content string // All text beneath this heading until the next heading
}
```

### Frontmatter

Topic files may begin with a YAML frontmatter block delimited by `---`. The `Frontmatter` struct maps to the supported fields:

```go
type Frontmatter struct {
    Title   string   `yaml:"title"`
    Tags    []string `yaml:"tags"`
    Related []string `yaml:"related"`
    Order   int      `yaml:"order"`
}
```

If frontmatter is absent, the parser falls back to the filename and the first H1 heading for the title and ID.

## Parser (`parser.go`)

`ParseTopic(path string, content []byte) (*Topic, error)` parses a Markdown file into a `Topic`:

1. `ExtractFrontmatter` matches the leading `---\n...\n---\n` block via a compiled regex, unmarshals it with `gopkg.in/yaml.v3`, and returns the remaining body.
2. `ExtractSections` iterates over body lines, matching `^(#{1,6})\s+(.+)$`, and accumulates the text between consecutive headings as `Section.Content`.
3. `GenerateID(title string) string` produces a URL-safe slug: lowercase, letters and digits preserved, spaces/hyphens/underscores collapsed to a single hyphen, leading and trailing hyphens trimmed.
4. `pathToTitle(path string) string` converts a filename (e.g. `getting-started.md`) to a title string (`Getting Started`) for use when frontmatter is absent.

### ID generation rules

- `"Getting Started"` → `"getting-started"`
- `"API / Rate Limits"` → `"api-rate-limits"`
- Non-letter, non-digit characters other than space, hyphen, and underscore are dropped silently.
- Consecutive separators collapse to a single hyphen.

## Catalog (`catalog.go`)

`Catalog` is the in-memory store of all topics. It holds a `map[string]*Topic` keyed by topic ID and an attached `searchIndex`.

```go
func DefaultCatalog() *Catalog  // Returns a catalog pre-loaded with two built-in topics
func (c *Catalog) Add(t *Topic)
func (c *Catalog) List() []*Topic
func (c *Catalog) Get(id string) (*Topic, error)
func (c *Catalog) Search(query string) []*SearchResult
```

`Add` both stores the topic in the map and calls `searchIndex.Add` to index it. There is no lazy indexing; every `Add` is immediately reflected in search results.

## Search (`search.go`)

### Index structure

`searchIndex` maintains two maps:

- `topics map[string]*Topic` — a reference copy for snippet extraction and scoring passes
- `index map[string][]string` — inverted index: token → []topicID

`tokenize(text string) []string` lowercases the input, splits on non-alphanumeric characters, discards tokens shorter than two characters, and for each token also emits its stemmed form if the stem differs from the original. This means the index naturally holds both raw and stemmed variants.

### Scoring constants

| Constant | Value | Purpose |
|---|---|---|
| `scoreTitleBoost` | 10.0 | Query word appears in topic title |
| `scorePhraseBoost` | 8.0 | Exact quoted phrase present in topic |
| `scoreSectionBoost` | 5.0 | Query word appears in a section heading |
| `scoreTagBoost` | 3.0 | Query word matches a topic tag |
| `scoreAllWords` | 2.0 | All query words present (multi-word bonus) |
| `scoreExactWord` | 1.0 | Exact token match in inverted index |
| `scoreStemWord` | 0.7 | Stemmed variant match |
| `scorePrefixWord` | 0.5 | Prefix (partial) match |
| `scoreFuzzyWord` | 0.3 | Levenshtein fuzzy match |

### Search pipeline

1. `extractPhrases(query)` strips `"quoted strings"` from the query and returns them separately. The remaining text is tokenised normally.
2. For each query token: exact lookup in the inverted index; prefix scan (token is a prefix of an indexed word); Levenshtein fuzzy scan for tokens of three or more characters with a maximum edit distance of 2. Stemmed tokens score at `scoreStemWord` rather than `scoreExactWord`.
3. After initial scoring, a second pass over matched topics applies title boost, tag boost, multi-word bonus, and section title boost.
4. Phrase matching scans the concatenated title + content + section content of every topic that received any score.
5. Results are sorted by score descending; ties are broken alphabetically by title.

### Snippet extraction and highlighting

`findBestMatch` selects the section whose title and content best match the query words (title matches weighted double), then calls `extractSnippet` to pull a 150-character window centred on the first regex match. `highlight` wraps matched spans in `**...**` (Markdown bold), merging overlapping matches to avoid double-wrapping.

### Stemmer (`stemmer.go`)

`stem(word string) string` implements a subset of Porter-style suffix stripping. Words shorter than four characters are returned unchanged. The result is always at least two characters. Two passes are applied:

- `stemInflectional`: handles `-sses`, `-ies`, `-eed`, `-ing`, `-ed`, `-s` (but not `-ss`).
- `stemDerivational`: longest-match suffix rules including `-fulness`, `-ational`, `-tional`, `-ously`, `-ively`, `-ingly`, `-ation`, `-ness`, `-ment`, `-ably`, `-ally`, `-izer`.

### Fuzzy matching

`levenshtein(a, b string) int` uses a two-row dynamic programming approach (O(min(m,n)) space). Fuzzy matching is only applied to index words that are neither an exact nor a prefix match for the query token, keeping the common-case query path fast.

## Markdown Rendering (`render.go`)

`RenderMarkdown(content string) (string, error)` converts Markdown to an HTML fragment using `github.com/yuin/goldmark` configured with:

- `extension.GFM` — GitHub Flavoured Markdown: tables, strikethrough, autolinks
- `extension.Typographer` — smart quotes and dashes
- `html.WithUnsafe()` — raw HTML in source is passed through (required for embedded code examples in catalog content)

The function returns a fragment only; `<html>` and `<body>` wrappers are provided by the templates.

## HTTP Server (`server.go`)

`NewServer(catalog *Catalog, addr string) *Server` creates an HTTP server and registers six routes on construction.

### Routes

| Method | Pattern | Handler | Response |
|---|---|---|---|
| `GET` | `/` | `handleIndex` | HTML — all topics grouped by first tag, sorted by Order then Title |
| `GET` | `/topics/{id}` | `handleTopic` | HTML — rendered Markdown body, section ToC, related topics; 404 if unknown |
| `GET` | `/search?q=` | `handleSearch` | HTML — ranked results with highlighted snippets; 400 if `q` is absent |
| `GET` | `/api/topics` | `handleAPITopics` | JSON array of all topics |
| `GET` | `/api/topics/{id}` | `handleAPITopic` | JSON single topic; 404 if unknown |
| `GET` | `/api/search?q=` | `handleAPISearch` | JSON array of `SearchResult`; 400 if `q` is absent |

All routes set `X-Content-Type-Options: nosniff`. HTML routes set `Content-Type: text/html; charset=utf-8`. JSON routes set `Content-Type: application/json`.

`Server` implements `http.Handler` via `ServeHTTP`, allowing it to be embedded into an existing mux or used standalone via `ListenAndServe`.

## Templates (`templates.go`)

Templates are embedded at compile time via `//go:embed templates/*.html`. The directory contains five files:

- `base.html` — shared layout: dark theme CSS (background `#0d1117`, foreground `#c9d1d9`, accent `#58a6ff`), navigation bar with search input, footer
- `index.html` — topic listing: cards grouped by first tag, topic count
- `topic.html` — single topic: rendered Markdown body, table of contents from sections, related topics sidebar
- `search.html` — search results: query echo, result count, ranked list with highlighted snippets, empty-state message
- `404.html` — not found page with search suggestion

Template functions available to all templates:

| Function | Signature | Purpose |
|---|---|---|
| `renderMarkdown` | `func(string) template.HTML` | Calls `RenderMarkdown`; returns empty paragraph on error |
| `truncate` | `func(string, int) string` | Strips Markdown headings, joins remaining lines, truncates to N runes |
| `pluralise` | `func(int, string, string) string` | Returns singular or plural form based on count |
| `multiply` | `func(int, int) int` | Integer multiplication for template arithmetic |
| `sub` | `func(int, int) int` | Integer subtraction for template arithmetic |

`groupTopicsByTag` groups topics by their first tag (falling back to `"other"`), sorts topics within each group by Order then Title, and sorts groups alphabetically by tag name.

## Static Site Generator (`generate.go`)

`Generate(catalog *Catalog, outputDir string) error` writes a self-contained static site:

| Output file | Content |
|---|---|
| `index.html` | Index page rendered from `index.html` template |
| `topics/{id}.html` | One file per topic rendered from `topic.html` template |
| `search.html` | Search page with client-side JavaScript search appended |
| `search-index.json` | JSON array of `{id, title, tags, content}` (content truncated to 500 runes) |
| `404.html` | Not found page rendered from `404.html` template |

The client-side search JavaScript (`clientSearchScript` constant) is appended verbatim to `search.html`. It loads `search-index.json` on page load, intercepts the search form submit event, and scores results using the same title (+10), content (+1), and tag (+3) weightings as the server-side index. All DOM insertion uses `textContent` or `document.createElement` to prevent XSS; no `innerHTML` is used with user-supplied strings.

All CSS is inlined; no external stylesheets are required, making the output suitable for direct file serving or CDN deployment.

## CLI Help Text Ingestion (`ingest.go`)

`ParseHelpText(name string, text string) *Topic` converts raw CLI help output (Go flag-style or Cobra-style) into a `Topic`:

1. Extracts `See also:` lines and converts the comma-separated references into `Related` topic IDs via `GenerateID`.
2. `convertHelpToMarkdown` scans lines for section headers (`Usage:`, `Flags:`, `Options:`, `Examples:`, `Commands:`, `Available Commands:`) and wraps their content in Markdown code blocks or bullet lists. Descriptive paragraphs are passed through as plain Markdown.
3. Tags are set to `["cli", first-word-of-name]` (e.g. `["cli", "dev"]` for command `"dev commit"`).
4. `ExtractSections` is called on the generated Markdown to populate `Sections`.

`IngestCLIHelp(helpTexts map[string]string) *Catalog` batch-ingests a map of command name → help text and returns a populated `Catalog`.

## Dependencies

| Package | Purpose |
|---|---|
| `gopkg.in/yaml.v3` | YAML frontmatter parsing |
| `github.com/yuin/goldmark` | Markdown-to-HTML rendering |
| `github.com/stretchr/testify` | Test assertions (test-only) |

The package has no runtime dependency on a network, database, file system (beyond the embedded templates), or operating system service.