Snider 142567a8f5 docs: graduate TODO/FINDINGS into production documentation

Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>

2026-02-20 15:01:55 +00:00

12 KiB

Raw Blame History

Architecture — go-help

Module: forge.lthn.ai/core/go-help

Overview

go-help is a display-agnostic help content management library. It provides a YAML/Markdown catalog of help topics, full-text search with relevance scoring, HTTP serving (both HTML and JSON), goldmark-based Markdown rendering, a static site generator, and CLI help text ingestion. The package has no runtime dependency on a database or external service; it is entirely self-contained.

Core Types

Topic

Topic (topic.go) is the fundamental unit of content:

type Topic struct {
    ID       string    // URL-safe slug, e.g. "getting-started"
    Title    string    // Human-readable title
    Path     string    // Source file path (empty for programmatic topics)
    Content  string    // Raw Markdown body (without frontmatter)
    Sections []Section // Parsed heading hierarchy
    Tags     []string  // Arbitrary classification labels
    Related  []string  // IDs of related topics
    Order    int       // Sort weight (lower = earlier)
}

Section captures each Markdown heading with its nested body text:

type Section struct {
    ID      string // GenerateID(Title)
    Title   string
    Level   int    // 1–6, matching H1–H6
    Line    int    // 1-indexed line number in Content
    Content string // All text beneath this heading until the next heading
}

Frontmatter

Topic files may begin with a YAML frontmatter block delimited by ---. The Frontmatter struct maps to the supported fields:

type Frontmatter struct {
    Title   string   `yaml:"title"`
    Tags    []string `yaml:"tags"`
    Related []string `yaml:"related"`
    Order   int      `yaml:"order"`
}

If frontmatter is absent, the parser falls back to the filename and the first H1 heading for the title and ID.

Parser (`parser.go`)

ParseTopic(path string, content []byte) (*Topic, error) parses a Markdown file into a Topic:

ExtractFrontmatter matches the leading ---\n...\n---\n block via a compiled regex, unmarshals it with gopkg.in/yaml.v3, and returns the remaining body.
ExtractSections iterates over body lines, matching ^(#{1,6})\s+(.+)$, and accumulates the text between consecutive headings as Section.Content.
GenerateID(title string) string produces a URL-safe slug: lowercase, letters and digits preserved, spaces/hyphens/underscores collapsed to a single hyphen, leading and trailing hyphens trimmed.
pathToTitle(path string) string converts a filename (e.g. getting-started.md) to a title string (Getting Started) for use when frontmatter is absent.

ID generation rules

"Getting Started" → "getting-started"
"API / Rate Limits" → "api-rate-limits"
Non-letter, non-digit characters other than space, hyphen, and underscore are dropped silently.
Consecutive separators collapse to a single hyphen.

Catalog (`catalog.go`)

Catalog is the in-memory store of all topics. It holds a map[string]*Topic keyed by topic ID and an attached searchIndex.

func DefaultCatalog() *Catalog  // Returns a catalog pre-loaded with two built-in topics
func (c *Catalog) Add(t *Topic)
func (c *Catalog) List() []*Topic
func (c *Catalog) Get(id string) (*Topic, error)
func (c *Catalog) Search(query string) []*SearchResult

Add both stores the topic in the map and calls searchIndex.Add to index it. There is no lazy indexing; every Add is immediately reflected in search results.

Search (`search.go`)

Index structure

searchIndex maintains two maps:

topics map[string]*Topic — a reference copy for snippet extraction and scoring passes
index map[string][]string — inverted index: token → []topicID

tokenize(text string) []string lowercases the input, splits on non-alphanumeric characters, discards tokens shorter than two characters, and for each token also emits its stemmed form if the stem differs from the original. This means the index naturally holds both raw and stemmed variants.

Scoring constants

Constant	Value	Purpose
`scoreTitleBoost`	10.0	Query word appears in topic title
`scorePhraseBoost`	8.0	Exact quoted phrase present in topic
`scoreSectionBoost`	5.0	Query word appears in a section heading
`scoreTagBoost`	3.0	Query word matches a topic tag
`scoreAllWords`	2.0	All query words present (multi-word bonus)
`scoreExactWord`	1.0	Exact token match in inverted index
`scoreStemWord`	0.7	Stemmed variant match
`scorePrefixWord`	0.5	Prefix (partial) match
`scoreFuzzyWord`	0.3	Levenshtein fuzzy match

Search pipeline

extractPhrases(query) strips "quoted strings" from the query and returns them separately. The remaining text is tokenised normally.
For each query token: exact lookup in the inverted index; prefix scan (token is a prefix of an indexed word); Levenshtein fuzzy scan for tokens of three or more characters with a maximum edit distance of 2. Stemmed tokens score at scoreStemWord rather than scoreExactWord.
After initial scoring, a second pass over matched topics applies title boost, tag boost, multi-word bonus, and section title boost.
Phrase matching scans the concatenated title + content + section content of every topic that received any score.
Results are sorted by score descending; ties are broken alphabetically by title.

Snippet extraction and highlighting

findBestMatch selects the section whose title and content best match the query words (title matches weighted double), then calls extractSnippet to pull a 150-character window centred on the first regex match. highlight wraps matched spans in **...** (Markdown bold), merging overlapping matches to avoid double-wrapping.

Stemmer (`stemmer.go`)

stem(word string) string implements a subset of Porter-style suffix stripping. Words shorter than four characters are returned unchanged. The result is always at least two characters. Two passes are applied:

stemInflectional: handles -sses, -ies, -eed, -ing, -ed, -s (but not -ss).
stemDerivational: longest-match suffix rules including -fulness, -ational, -tional, -ously, -ively, -ingly, -ation, -ness, -ment, -ably, -ally, -izer.

Fuzzy matching

levenshtein(a, b string) int uses a two-row dynamic programming approach (O(min(m,n)) space). Fuzzy matching is only applied to index words that are neither an exact nor a prefix match for the query token, keeping the common-case query path fast.

Markdown Rendering (`render.go`)

RenderMarkdown(content string) (string, error) converts Markdown to an HTML fragment using github.com/yuin/goldmark configured with:

extension.GFM — GitHub Flavoured Markdown: tables, strikethrough, autolinks
extension.Typographer — smart quotes and dashes
html.WithUnsafe() — raw HTML in source is passed through (required for embedded code examples in catalog content)

The function returns a fragment only; <html> and <body> wrappers are provided by the templates.

HTTP Server (`server.go`)

NewServer(catalog *Catalog, addr string) *Server creates an HTTP server and registers six routes on construction.

Routes

Method	Pattern	Handler	Response
`GET`	`/`	`handleIndex`	HTML — all topics grouped by first tag, sorted by Order then Title
`GET`	`/topics/{id}`	`handleTopic`	HTML — rendered Markdown body, section ToC, related topics; 404 if unknown
`GET`	`/search?q=`	`handleSearch`	HTML — ranked results with highlighted snippets; 400 if `q` is absent
`GET`	`/api/topics`	`handleAPITopics`	JSON array of all topics
`GET`	`/api/topics/{id}`	`handleAPITopic`	JSON single topic; 404 if unknown
`GET`	`/api/search?q=`	`handleAPISearch`	JSON array of `SearchResult`; 400 if `q` is absent

All routes set X-Content-Type-Options: nosniff. HTML routes set Content-Type: text/html; charset=utf-8. JSON routes set Content-Type: application/json.

Server implements http.Handler via ServeHTTP, allowing it to be embedded into an existing mux or used standalone via ListenAndServe.

Templates (`templates.go`)

Templates are embedded at compile time via //go:embed templates/*.html. The directory contains five files:

base.html — shared layout: dark theme CSS (background #0d1117, foreground #c9d1d9, accent #58a6ff), navigation bar with search input, footer
index.html — topic listing: cards grouped by first tag, topic count
topic.html — single topic: rendered Markdown body, table of contents from sections, related topics sidebar
search.html — search results: query echo, result count, ranked list with highlighted snippets, empty-state message
404.html — not found page with search suggestion

Template functions available to all templates:

Function	Signature	Purpose
`renderMarkdown`	`func(string) template.HTML`	Calls `RenderMarkdown`; returns empty paragraph on error
`truncate`	`func(string, int) string`	Strips Markdown headings, joins remaining lines, truncates to N runes
`pluralise`	`func(int, string, string) string`	Returns singular or plural form based on count
`multiply`	`func(int, int) int`	Integer multiplication for template arithmetic
`sub`	`func(int, int) int`	Integer subtraction for template arithmetic

groupTopicsByTag groups topics by their first tag (falling back to "other"), sorts topics within each group by Order then Title, and sorts groups alphabetically by tag name.

Static Site Generator (`generate.go`)

Generate(catalog *Catalog, outputDir string) error writes a self-contained static site:

Output file	Content
`index.html`	Index page rendered from `index.html` template
`topics/{id}.html`	One file per topic rendered from `topic.html` template
`search.html`	Search page with client-side JavaScript search appended
`search-index.json`	JSON array of `{id, title, tags, content}` (content truncated to 500 runes)
`404.html`	Not found page rendered from `404.html` template

The client-side search JavaScript (clientSearchScript constant) is appended verbatim to search.html. It loads search-index.json on page load, intercepts the search form submit event, and scores results using the same title (+10), content (+1), and tag (+3) weightings as the server-side index. All DOM insertion uses textContent or document.createElement to prevent XSS; no innerHTML is used with user-supplied strings.

All CSS is inlined; no external stylesheets are required, making the output suitable for direct file serving or CDN deployment.

CLI Help Text Ingestion (`ingest.go`)

ParseHelpText(name string, text string) *Topic converts raw CLI help output (Go flag-style or Cobra-style) into a Topic:

Extracts See also: lines and converts the comma-separated references into Related topic IDs via GenerateID.
convertHelpToMarkdown scans lines for section headers (Usage:, Flags:, Options:, Examples:, Commands:, Available Commands:) and wraps their content in Markdown code blocks or bullet lists. Descriptive paragraphs are passed through as plain Markdown.
Tags are set to ["cli", first-word-of-name] (e.g. ["cli", "dev"] for command "dev commit").
ExtractSections is called on the generated Markdown to populate Sections.

IngestCLIHelp(helpTexts map[string]string) *Catalog batch-ingests a map of command name → help text and returns a populated Catalog.

Dependencies

Package	Purpose
`gopkg.in/yaml.v3`	YAML frontmatter parsing
`github.com/yuin/goldmark`	Markdown-to-HTML rendering
`github.com/stretchr/testify`	Test assertions (test-only)

The package has no runtime dependency on a network, database, file system (beyond the embedded templates), or operating system service.

12 KiB Raw Blame History Unescape Escape