go-help/docs/architecture.md
Snider 142567a8f5 docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00

12 KiB
Raw Blame History

Architecture — go-help

Module: forge.lthn.ai/core/go-help

Overview

go-help is a display-agnostic help content management library. It provides a YAML/Markdown catalog of help topics, full-text search with relevance scoring, HTTP serving (both HTML and JSON), goldmark-based Markdown rendering, a static site generator, and CLI help text ingestion. The package has no runtime dependency on a database or external service; it is entirely self-contained.

Core Types

Topic

Topic (topic.go) is the fundamental unit of content:

type Topic struct {
    ID       string    // URL-safe slug, e.g. "getting-started"
    Title    string    // Human-readable title
    Path     string    // Source file path (empty for programmatic topics)
    Content  string    // Raw Markdown body (without frontmatter)
    Sections []Section // Parsed heading hierarchy
    Tags     []string  // Arbitrary classification labels
    Related  []string  // IDs of related topics
    Order    int       // Sort weight (lower = earlier)
}

Section captures each Markdown heading with its nested body text:

type Section struct {
    ID      string // GenerateID(Title)
    Title   string
    Level   int    // 16, matching H1H6
    Line    int    // 1-indexed line number in Content
    Content string // All text beneath this heading until the next heading
}

Frontmatter

Topic files may begin with a YAML frontmatter block delimited by ---. The Frontmatter struct maps to the supported fields:

type Frontmatter struct {
    Title   string   `yaml:"title"`
    Tags    []string `yaml:"tags"`
    Related []string `yaml:"related"`
    Order   int      `yaml:"order"`
}

If frontmatter is absent, the parser falls back to the filename and the first H1 heading for the title and ID.

Parser (parser.go)

ParseTopic(path string, content []byte) (*Topic, error) parses a Markdown file into a Topic:

  1. ExtractFrontmatter matches the leading ---\n...\n---\n block via a compiled regex, unmarshals it with gopkg.in/yaml.v3, and returns the remaining body.
  2. ExtractSections iterates over body lines, matching ^(#{1,6})\s+(.+)$, and accumulates the text between consecutive headings as Section.Content.
  3. GenerateID(title string) string produces a URL-safe slug: lowercase, letters and digits preserved, spaces/hyphens/underscores collapsed to a single hyphen, leading and trailing hyphens trimmed.
  4. pathToTitle(path string) string converts a filename (e.g. getting-started.md) to a title string (Getting Started) for use when frontmatter is absent.

ID generation rules

  • "Getting Started""getting-started"
  • "API / Rate Limits""api-rate-limits"
  • Non-letter, non-digit characters other than space, hyphen, and underscore are dropped silently.
  • Consecutive separators collapse to a single hyphen.

Catalog (catalog.go)

Catalog is the in-memory store of all topics. It holds a map[string]*Topic keyed by topic ID and an attached searchIndex.

func DefaultCatalog() *Catalog  // Returns a catalog pre-loaded with two built-in topics
func (c *Catalog) Add(t *Topic)
func (c *Catalog) List() []*Topic
func (c *Catalog) Get(id string) (*Topic, error)
func (c *Catalog) Search(query string) []*SearchResult

Add both stores the topic in the map and calls searchIndex.Add to index it. There is no lazy indexing; every Add is immediately reflected in search results.

Search (search.go)

Index structure

searchIndex maintains two maps:

  • topics map[string]*Topic — a reference copy for snippet extraction and scoring passes
  • index map[string][]string — inverted index: token → []topicID

tokenize(text string) []string lowercases the input, splits on non-alphanumeric characters, discards tokens shorter than two characters, and for each token also emits its stemmed form if the stem differs from the original. This means the index naturally holds both raw and stemmed variants.

Scoring constants

Constant Value Purpose
scoreTitleBoost 10.0 Query word appears in topic title
scorePhraseBoost 8.0 Exact quoted phrase present in topic
scoreSectionBoost 5.0 Query word appears in a section heading
scoreTagBoost 3.0 Query word matches a topic tag
scoreAllWords 2.0 All query words present (multi-word bonus)
scoreExactWord 1.0 Exact token match in inverted index
scoreStemWord 0.7 Stemmed variant match
scorePrefixWord 0.5 Prefix (partial) match
scoreFuzzyWord 0.3 Levenshtein fuzzy match

Search pipeline

  1. extractPhrases(query) strips "quoted strings" from the query and returns them separately. The remaining text is tokenised normally.
  2. For each query token: exact lookup in the inverted index; prefix scan (token is a prefix of an indexed word); Levenshtein fuzzy scan for tokens of three or more characters with a maximum edit distance of 2. Stemmed tokens score at scoreStemWord rather than scoreExactWord.
  3. After initial scoring, a second pass over matched topics applies title boost, tag boost, multi-word bonus, and section title boost.
  4. Phrase matching scans the concatenated title + content + section content of every topic that received any score.
  5. Results are sorted by score descending; ties are broken alphabetically by title.

Snippet extraction and highlighting

findBestMatch selects the section whose title and content best match the query words (title matches weighted double), then calls extractSnippet to pull a 150-character window centred on the first regex match. highlight wraps matched spans in **...** (Markdown bold), merging overlapping matches to avoid double-wrapping.

Stemmer (stemmer.go)

stem(word string) string implements a subset of Porter-style suffix stripping. Words shorter than four characters are returned unchanged. The result is always at least two characters. Two passes are applied:

  • stemInflectional: handles -sses, -ies, -eed, -ing, -ed, -s (but not -ss).
  • stemDerivational: longest-match suffix rules including -fulness, -ational, -tional, -ously, -ively, -ingly, -ation, -ness, -ment, -ably, -ally, -izer.

Fuzzy matching

levenshtein(a, b string) int uses a two-row dynamic programming approach (O(min(m,n)) space). Fuzzy matching is only applied to index words that are neither an exact nor a prefix match for the query token, keeping the common-case query path fast.

Markdown Rendering (render.go)

RenderMarkdown(content string) (string, error) converts Markdown to an HTML fragment using github.com/yuin/goldmark configured with:

  • extension.GFM — GitHub Flavoured Markdown: tables, strikethrough, autolinks
  • extension.Typographer — smart quotes and dashes
  • html.WithUnsafe() — raw HTML in source is passed through (required for embedded code examples in catalog content)

The function returns a fragment only; <html> and <body> wrappers are provided by the templates.

HTTP Server (server.go)

NewServer(catalog *Catalog, addr string) *Server creates an HTTP server and registers six routes on construction.

Routes

Method Pattern Handler Response
GET / handleIndex HTML — all topics grouped by first tag, sorted by Order then Title
GET /topics/{id} handleTopic HTML — rendered Markdown body, section ToC, related topics; 404 if unknown
GET /search?q= handleSearch HTML — ranked results with highlighted snippets; 400 if q is absent
GET /api/topics handleAPITopics JSON array of all topics
GET /api/topics/{id} handleAPITopic JSON single topic; 404 if unknown
GET /api/search?q= handleAPISearch JSON array of SearchResult; 400 if q is absent

All routes set X-Content-Type-Options: nosniff. HTML routes set Content-Type: text/html; charset=utf-8. JSON routes set Content-Type: application/json.

Server implements http.Handler via ServeHTTP, allowing it to be embedded into an existing mux or used standalone via ListenAndServe.

Templates (templates.go)

Templates are embedded at compile time via //go:embed templates/*.html. The directory contains five files:

  • base.html — shared layout: dark theme CSS (background #0d1117, foreground #c9d1d9, accent #58a6ff), navigation bar with search input, footer
  • index.html — topic listing: cards grouped by first tag, topic count
  • topic.html — single topic: rendered Markdown body, table of contents from sections, related topics sidebar
  • search.html — search results: query echo, result count, ranked list with highlighted snippets, empty-state message
  • 404.html — not found page with search suggestion

Template functions available to all templates:

Function Signature Purpose
renderMarkdown func(string) template.HTML Calls RenderMarkdown; returns empty paragraph on error
truncate func(string, int) string Strips Markdown headings, joins remaining lines, truncates to N runes
pluralise func(int, string, string) string Returns singular or plural form based on count
multiply func(int, int) int Integer multiplication for template arithmetic
sub func(int, int) int Integer subtraction for template arithmetic

groupTopicsByTag groups topics by their first tag (falling back to "other"), sorts topics within each group by Order then Title, and sorts groups alphabetically by tag name.

Static Site Generator (generate.go)

Generate(catalog *Catalog, outputDir string) error writes a self-contained static site:

Output file Content
index.html Index page rendered from index.html template
topics/{id}.html One file per topic rendered from topic.html template
search.html Search page with client-side JavaScript search appended
search-index.json JSON array of {id, title, tags, content} (content truncated to 500 runes)
404.html Not found page rendered from 404.html template

The client-side search JavaScript (clientSearchScript constant) is appended verbatim to search.html. It loads search-index.json on page load, intercepts the search form submit event, and scores results using the same title (+10), content (+1), and tag (+3) weightings as the server-side index. All DOM insertion uses textContent or document.createElement to prevent XSS; no innerHTML is used with user-supplied strings.

All CSS is inlined; no external stylesheets are required, making the output suitable for direct file serving or CDN deployment.

CLI Help Text Ingestion (ingest.go)

ParseHelpText(name string, text string) *Topic converts raw CLI help output (Go flag-style or Cobra-style) into a Topic:

  1. Extracts See also: lines and converts the comma-separated references into Related topic IDs via GenerateID.
  2. convertHelpToMarkdown scans lines for section headers (Usage:, Flags:, Options:, Examples:, Commands:, Available Commands:) and wraps their content in Markdown code blocks or bullet lists. Descriptive paragraphs are passed through as plain Markdown.
  3. Tags are set to ["cli", first-word-of-name] (e.g. ["cli", "dev"] for command "dev commit").
  4. ExtractSections is called on the generated Markdown to populate Sections.

IngestCLIHelp(helpTexts map[string]string) *Catalog batch-ingests a map of command name → help text and returns a populated Catalog.

Dependencies

Package Purpose
gopkg.in/yaml.v3 YAML frontmatter parsing
github.com/yuin/goldmark Markdown-to-HTML rendering
github.com/stretchr/testify Test assertions (test-only)

The package has no runtime dependency on a network, database, file system (beyond the embedded templates), or operating system service.