Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
12 KiB
Architecture — go-help
Module: forge.lthn.ai/core/go-help
Overview
go-help is a display-agnostic help content management library. It provides a YAML/Markdown catalog of help topics, full-text search with relevance scoring, HTTP serving (both HTML and JSON), goldmark-based Markdown rendering, a static site generator, and CLI help text ingestion. The package has no runtime dependency on a database or external service; it is entirely self-contained.
Core Types
Topic
Topic (topic.go) is the fundamental unit of content:
type Topic struct {
ID string // URL-safe slug, e.g. "getting-started"
Title string // Human-readable title
Path string // Source file path (empty for programmatic topics)
Content string // Raw Markdown body (without frontmatter)
Sections []Section // Parsed heading hierarchy
Tags []string // Arbitrary classification labels
Related []string // IDs of related topics
Order int // Sort weight (lower = earlier)
}
Section captures each Markdown heading with its nested body text:
type Section struct {
ID string // GenerateID(Title)
Title string
Level int // 1–6, matching H1–H6
Line int // 1-indexed line number in Content
Content string // All text beneath this heading until the next heading
}
Frontmatter
Topic files may begin with a YAML frontmatter block delimited by ---. The Frontmatter struct maps to the supported fields:
type Frontmatter struct {
Title string `yaml:"title"`
Tags []string `yaml:"tags"`
Related []string `yaml:"related"`
Order int `yaml:"order"`
}
If frontmatter is absent, the parser falls back to the filename and the first H1 heading for the title and ID.
Parser (parser.go)
ParseTopic(path string, content []byte) (*Topic, error) parses a Markdown file into a Topic:
ExtractFrontmattermatches the leading---\n...\n---\nblock via a compiled regex, unmarshals it withgopkg.in/yaml.v3, and returns the remaining body.ExtractSectionsiterates over body lines, matching^(#{1,6})\s+(.+)$, and accumulates the text between consecutive headings asSection.Content.GenerateID(title string) stringproduces a URL-safe slug: lowercase, letters and digits preserved, spaces/hyphens/underscores collapsed to a single hyphen, leading and trailing hyphens trimmed.pathToTitle(path string) stringconverts a filename (e.g.getting-started.md) to a title string (Getting Started) for use when frontmatter is absent.
ID generation rules
"Getting Started"→"getting-started""API / Rate Limits"→"api-rate-limits"- Non-letter, non-digit characters other than space, hyphen, and underscore are dropped silently.
- Consecutive separators collapse to a single hyphen.
Catalog (catalog.go)
Catalog is the in-memory store of all topics. It holds a map[string]*Topic keyed by topic ID and an attached searchIndex.
func DefaultCatalog() *Catalog // Returns a catalog pre-loaded with two built-in topics
func (c *Catalog) Add(t *Topic)
func (c *Catalog) List() []*Topic
func (c *Catalog) Get(id string) (*Topic, error)
func (c *Catalog) Search(query string) []*SearchResult
Add both stores the topic in the map and calls searchIndex.Add to index it. There is no lazy indexing; every Add is immediately reflected in search results.
Search (search.go)
Index structure
searchIndex maintains two maps:
topics map[string]*Topic— a reference copy for snippet extraction and scoring passesindex map[string][]string— inverted index: token → []topicID
tokenize(text string) []string lowercases the input, splits on non-alphanumeric characters, discards tokens shorter than two characters, and for each token also emits its stemmed form if the stem differs from the original. This means the index naturally holds both raw and stemmed variants.
Scoring constants
| Constant | Value | Purpose |
|---|---|---|
scoreTitleBoost |
10.0 | Query word appears in topic title |
scorePhraseBoost |
8.0 | Exact quoted phrase present in topic |
scoreSectionBoost |
5.0 | Query word appears in a section heading |
scoreTagBoost |
3.0 | Query word matches a topic tag |
scoreAllWords |
2.0 | All query words present (multi-word bonus) |
scoreExactWord |
1.0 | Exact token match in inverted index |
scoreStemWord |
0.7 | Stemmed variant match |
scorePrefixWord |
0.5 | Prefix (partial) match |
scoreFuzzyWord |
0.3 | Levenshtein fuzzy match |
Search pipeline
extractPhrases(query)strips"quoted strings"from the query and returns them separately. The remaining text is tokenised normally.- For each query token: exact lookup in the inverted index; prefix scan (token is a prefix of an indexed word); Levenshtein fuzzy scan for tokens of three or more characters with a maximum edit distance of 2. Stemmed tokens score at
scoreStemWordrather thanscoreExactWord. - After initial scoring, a second pass over matched topics applies title boost, tag boost, multi-word bonus, and section title boost.
- Phrase matching scans the concatenated title + content + section content of every topic that received any score.
- Results are sorted by score descending; ties are broken alphabetically by title.
Snippet extraction and highlighting
findBestMatch selects the section whose title and content best match the query words (title matches weighted double), then calls extractSnippet to pull a 150-character window centred on the first regex match. highlight wraps matched spans in **...** (Markdown bold), merging overlapping matches to avoid double-wrapping.
Stemmer (stemmer.go)
stem(word string) string implements a subset of Porter-style suffix stripping. Words shorter than four characters are returned unchanged. The result is always at least two characters. Two passes are applied:
stemInflectional: handles-sses,-ies,-eed,-ing,-ed,-s(but not-ss).stemDerivational: longest-match suffix rules including-fulness,-ational,-tional,-ously,-ively,-ingly,-ation,-ness,-ment,-ably,-ally,-izer.
Fuzzy matching
levenshtein(a, b string) int uses a two-row dynamic programming approach (O(min(m,n)) space). Fuzzy matching is only applied to index words that are neither an exact nor a prefix match for the query token, keeping the common-case query path fast.
Markdown Rendering (render.go)
RenderMarkdown(content string) (string, error) converts Markdown to an HTML fragment using github.com/yuin/goldmark configured with:
extension.GFM— GitHub Flavoured Markdown: tables, strikethrough, autolinksextension.Typographer— smart quotes and dasheshtml.WithUnsafe()— raw HTML in source is passed through (required for embedded code examples in catalog content)
The function returns a fragment only; <html> and <body> wrappers are provided by the templates.
HTTP Server (server.go)
NewServer(catalog *Catalog, addr string) *Server creates an HTTP server and registers six routes on construction.
Routes
| Method | Pattern | Handler | Response |
|---|---|---|---|
GET |
/ |
handleIndex |
HTML — all topics grouped by first tag, sorted by Order then Title |
GET |
/topics/{id} |
handleTopic |
HTML — rendered Markdown body, section ToC, related topics; 404 if unknown |
GET |
/search?q= |
handleSearch |
HTML — ranked results with highlighted snippets; 400 if q is absent |
GET |
/api/topics |
handleAPITopics |
JSON array of all topics |
GET |
/api/topics/{id} |
handleAPITopic |
JSON single topic; 404 if unknown |
GET |
/api/search?q= |
handleAPISearch |
JSON array of SearchResult; 400 if q is absent |
All routes set X-Content-Type-Options: nosniff. HTML routes set Content-Type: text/html; charset=utf-8. JSON routes set Content-Type: application/json.
Server implements http.Handler via ServeHTTP, allowing it to be embedded into an existing mux or used standalone via ListenAndServe.
Templates (templates.go)
Templates are embedded at compile time via //go:embed templates/*.html. The directory contains five files:
base.html— shared layout: dark theme CSS (background#0d1117, foreground#c9d1d9, accent#58a6ff), navigation bar with search input, footerindex.html— topic listing: cards grouped by first tag, topic counttopic.html— single topic: rendered Markdown body, table of contents from sections, related topics sidebarsearch.html— search results: query echo, result count, ranked list with highlighted snippets, empty-state message404.html— not found page with search suggestion
Template functions available to all templates:
| Function | Signature | Purpose |
|---|---|---|
renderMarkdown |
func(string) template.HTML |
Calls RenderMarkdown; returns empty paragraph on error |
truncate |
func(string, int) string |
Strips Markdown headings, joins remaining lines, truncates to N runes |
pluralise |
func(int, string, string) string |
Returns singular or plural form based on count |
multiply |
func(int, int) int |
Integer multiplication for template arithmetic |
sub |
func(int, int) int |
Integer subtraction for template arithmetic |
groupTopicsByTag groups topics by their first tag (falling back to "other"), sorts topics within each group by Order then Title, and sorts groups alphabetically by tag name.
Static Site Generator (generate.go)
Generate(catalog *Catalog, outputDir string) error writes a self-contained static site:
| Output file | Content |
|---|---|
index.html |
Index page rendered from index.html template |
topics/{id}.html |
One file per topic rendered from topic.html template |
search.html |
Search page with client-side JavaScript search appended |
search-index.json |
JSON array of {id, title, tags, content} (content truncated to 500 runes) |
404.html |
Not found page rendered from 404.html template |
The client-side search JavaScript (clientSearchScript constant) is appended verbatim to search.html. It loads search-index.json on page load, intercepts the search form submit event, and scores results using the same title (+10), content (+1), and tag (+3) weightings as the server-side index. All DOM insertion uses textContent or document.createElement to prevent XSS; no innerHTML is used with user-supplied strings.
All CSS is inlined; no external stylesheets are required, making the output suitable for direct file serving or CDN deployment.
CLI Help Text Ingestion (ingest.go)
ParseHelpText(name string, text string) *Topic converts raw CLI help output (Go flag-style or Cobra-style) into a Topic:
- Extracts
See also:lines and converts the comma-separated references intoRelatedtopic IDs viaGenerateID. convertHelpToMarkdownscans lines for section headers (Usage:,Flags:,Options:,Examples:,Commands:,Available Commands:) and wraps their content in Markdown code blocks or bullet lists. Descriptive paragraphs are passed through as plain Markdown.- Tags are set to
["cli", first-word-of-name](e.g.["cli", "dev"]for command"dev commit"). ExtractSectionsis called on the generated Markdown to populateSections.
IngestCLIHelp(helpTexts map[string]string) *Catalog batch-ingests a map of command name → help text and returns a populated Catalog.
Dependencies
| Package | Purpose |
|---|---|
gopkg.in/yaml.v3 |
YAML frontmatter parsing |
github.com/yuin/goldmark |
Markdown-to-HTML rendering |
github.com/stretchr/testify |
Test assertions (test-only) |
The package has no runtime dependency on a network, database, file system (beyond the embedded templates), or operating system service.