go-help/TODO.md
Snider 944cad006b feat(help): Phase 2 — HTTP server, rendering, static site generator, CLI ingestion
Add complete HTTP server and rendering layer for the help catalog:

- render.go: Markdown-to-HTML via goldmark (GFM, typographer, raw HTML)
- server.go: HTTP server with 6 routes (HTML index/topic/search + JSON API)
- templates.go: Embedded HTML templates with dark theme (bg #0d1117)
- templates/: base, index, topic, search, 404 page templates
- generate.go: Static site generator with client-side JS search
- ingest.go: CLI help text parser (Usage/Flags/Examples/Commands sections)

320 tests passing, 95.5% coverage, race-clean, vet-clean.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:50:10 +00:00

12 KiB

TODO.md -- go-help

Dispatched from core/go orchestration. Pick up tasks in order.


Phase 0: Hardening & Test Coverage

  • Expand parser tests -- Parser at 100%. Tests cover: empty input, frontmatter-only, malformed YAML (3 variants), frontmatter not at start, deeply nested headings (H4-H6 with content), Unicode (CJK, emoji, diacritics, mixed scripts), 10K+ line document, empty sections, headings without space, consecutive headings, GenerateID edge cases, path-derived IDs.
  • Expand search tests -- Added tests for: empty query (4 variants), no results (3 variants), case sensitivity (4 variants), multi-word queries (4 variants), special characters (@, dots, underscores), overlapping matches, scoring boundary cases, nil-topic guard, snippet edge cases (headings-only, whitespace-only), duplicate topic IDs, catalog integration.
  • Add catalog tests -- Created catalog_test.go covering: DefaultCatalog, Add, List, Search, Get (found/not-found), score tie-breaking.
  • Benchmark search -- search_bench_test.go with 8 benchmarks: single word, multi-word, no results, partial match, 500-topic catalog, 1000-topic catalog, Add indexing, tokenize. Uses b.Loop() (Go 1.25+) and b.ReportAllocs().
  • go vet ./... clean -- No warnings.
  • Coverage: 100% -- Up from 92.1%.

Phase 1: Search Improvements

  • Fuzzy matching -- Levenshtein distance with max edit distance of 2. Words under 3 chars skip fuzzy. Score: +0.3 per fuzzy match (below prefix +0.5 and exact +1.0).
  • English stemming — Add a lightweight Porter-style stemmer for English search terms. Pure Go, no external deps.
    • Create stemmer.go — Implement stem(word string) string covering the most impactful English suffix rules:
      • Step 1: Plurals and -ed/-ing forms (-sses-ss, -ies-i, -s"", -eed-ee, -ed"", -ing"")
      • Step 2: Derivational suffixes (-ational-ate, -tional-tion, -fulness-ful, -ness"", -ment"", -ation-ate, -ously-ous, -ively-ive, -ably-able, -ally-al, -izer-ize, -ingly-ing)
      • Guard: words under 4 chars are returned unchanged; result must be at least 2 chars
      • Use a simple suffix-stripping approach (not the full Porter algorithm — we don't need morphological analysis for a help catalog)
    • Modify tokenize() — Add stemmed variants: for each word, compute stem(word). If the stem differs from the word, return BOTH the original word AND the stem. This ensures exact matches still work while adding stemmed coverage.
    • Modify Search() — Stem query words before matching. Add scoreStemWord = 0.7 constant (between exact 1.0 and prefix 0.5) for stem-only matches.
    • Integration: When indexing (Add), tokenize() already produces stemmed variants, so the index naturally contains stems. When searching, stem the query words and match both raw and stemmed forms against the index.
    • Tests — (a) stem() unit tests for all suffix rules (15+ cases), (b) short words unchanged, (c) search "running" matches topic containing "run", (d) search "configurations" matches "configure", (e) plural "servers" matches "server", (f) existing tests still pass (no regression), (g) benchmark BenchmarkStem with 100 words
  • Phrase search -- Quoted multi-word queries via extractPhrases(). Phrase boost: +8.0. Searches title, content, and section content.
  • Improved scoring weights -- Named constants: title +10, section +5, tag +3, phrase +8, all-words bonus +2, exact +1.0, prefix +0.5, fuzzy +0.3.
  • Tag boost -- Query words matching tags add +3.0 per matching tag.
  • Multi-word bonus -- All query words present in topic adds +2.0.
  • Tests for all new features -- Levenshtein, min3, extractPhrases, fuzzy search, phrase search, tag boost, multi-word bonus, scoring constants, phrase highlighting, section phrase matching.

Phase 2: HTTP Server & Rendering

2.1 Markdown Rendering (render.go)

  • Add github.com/yuin/goldmark dependency — CommonMark-compliant Markdown renderer. Run go get github.com/yuin/goldmark.
  • Create render.go — Markdown-to-HTML conversion:
    • func RenderMarkdown(content string) (string, error) — converts Markdown to HTML using goldmark. Configure with:
      • html.WithUnsafe() — allow raw HTML in source (needed for embedded code examples)
      • extension.GFM — GitHub Flavoured Markdown (tables, strikethrough, autolinks)
      • extension.Typographer — smart quotes and dashes
    • Returns HTML fragment (no <html>/<body> wrapper — the server templates handle that)
  • Create render_test.go — Tests:
    • (a) heading hierarchy (H1-H6) produces correct <h1>-<h6> tags
    • (b) fenced code blocks (```go) produce <pre><code class="language-go">
    • (c) inline code backticks produce <code>
    • (d) lists (ordered + unordered)
    • (e) links and images
    • (f) tables (GFM extension)
    • (g) empty input returns empty string
    • (h) special characters are properly escaped

2.2 HTTP Server (server.go)

  • Create server.go — HTTP server for the help catalog:

    • type Server struct { catalog *Catalog; addr string; mux *http.ServeMux } — holds catalog reference and HTTP mux
    • func NewServer(catalog *Catalog, addr string) *Server — constructor. Creates mux with all routes registered.
    • func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) — delegates to mux (implements http.Handler)
    • func (s *Server) ListenAndServe() error — starts listening

    Routes:

    • GET / — Index page: rendered HTML listing all topics grouped by tags, sorted by Order then Title
    • GET /topics/{id} — Topic page: rendered Markdown content + table of contents from Sections + related topics sidebar
    • GET /search?q={query} — Search results page: rendered HTML with highlighted snippets, scores
    • GET /api/topics — JSON array of all topics (without rendered HTML — raw content field)
    • GET /api/topics/{id} — JSON single topic
    • GET /api/search?q={query} — JSON array of SearchResult

    Response format:

    • HTML routes: render using Go html/template with embedded templates (see 2.3)
    • JSON routes: application/json with json.NewEncoder
    • All routes: set Content-Type, X-Content-Type-Options: nosniff
    • Search routes: return 400 if query is empty
  • Create server_test.go — Tests using httptest.NewServer:

    • (a) GET / returns 200 with HTML containing topic titles
    • (b) GET /topics/getting-started returns 200 with rendered content
    • (c) GET /topics/nonexistent returns 404
    • (d) GET /search?q=install returns 200 with results
    • (e) GET /search (no query) returns 400
    • (f) GET /api/topics returns JSON array
    • (g) GET /api/topics/getting-started returns JSON object
    • (h) GET /api/topics/nonexistent returns 404
    • (i) GET /api/search?q=config returns JSON results
    • (j) Content-Type headers correct for HTML vs JSON routes

2.3 HTML Templates (templates.go)

  • Create templates.go — Embedded HTML templates using embed.FS:

    • Use html/template with a //go:embed templates/*.html directive
    • Create templates/ directory with:
      • base.html — shared layout: <!DOCTYPE html>, dark theme CSS (reuse go-session's colour palette: --bg: #0d1117; --fg: #c9d1d9; --accent: #58a6ff), nav bar with search input, footer
      • index.html — topic listing: cards grouped by first tag, topic count, search bar
      • topic.html — single topic: rendered Markdown body, table of contents (from Sections), related topics, prev/next navigation
      • search.html — search results: query echo, result count, ranked list with snippets, "no results" state with fuzzy suggestions
      • 404.html — not found page with search suggestion
    • Template functions: renderMarkdown (calls RenderMarkdown), truncate, pluralise
    • All templates use the dark theme. Monospace font. Clean, minimal.
  • Create templates_test.go — Tests:

    • (a) all templates parse without error
    • (b) index template renders topic titles
    • (c) topic template renders Markdown content
    • (d) search template renders results with snippets
    • (e) 404 template contains "not found"

2.4 Static Site Generator (generate.go)

  • Create generate.go — Generate a static site from the catalog:

    • func Generate(catalog *Catalog, outputDir string) error — writes static HTML files:
      • {outputDir}/index.html — index page (rendered from index template)
      • {outputDir}/topics/{id}.html — one file per topic (rendered from topic template)
      • {outputDir}/search.html — search page with client-side JS search
      • {outputDir}/search-index.json — JSON search index for client-side search:
        [{"id":"getting-started","title":"Getting Started","tags":["intro"],"content":"...truncated..."}]
        
      • {outputDir}/404.html — not found page
    • Client-side search JS: inline <script> in search.html that:
      • Loads search-index.json on page load
      • Filters topics by title/content matching (simple substring for static site)
      • Updates DOM with results (no server round-trip)
    • All CSS inlined (no external stylesheets for CDN simplicity)
  • Create generate_test.go — Tests:

    • (a) generates expected file structure in temp dir
    • (b) index.html contains all topic titles
    • (c) topic files contain rendered Markdown
    • (d) search-index.json is valid JSON with all topics
    • (e) 404.html exists
    • (f) generates into empty dir successfully
    • (g) overwrites existing files without error

2.5 CLI Help Text Ingestion (ingest.go)

  • Create ingest.go — Parse standard Go CLI help output into Topics:

    • func ParseHelpText(name string, text string) *Topic — parses help text format:
      • Title: derived from name (e.g., "dev commit""Dev Commit")
      • ID: GenerateID(name) (e.g., "dev-commit")
      • Content: the full help text, converted to Markdown:
        • Lines starting with Usage:## Usage + code block
        • Lines starting with Flags: or Options:## Flags + code block
        • Lines starting with Examples:## Examples + code block
        • Descriptive paragraphs → plain Markdown text
        • Subcommand listings → bulleted list with links
      • Tags: ["cli", first-word-of-name] (e.g., ["cli", "dev"])
      • Related: extracted from "See also:" lines if present
    • func IngestCLIHelp(helpTexts map[string]string) *Catalog — batch ingest: create catalog, parse each help text, add all topics. Key = command name (e.g., "dev commit"), Value = help text.
  • Create ingest_test.go — Tests:

    • (a) standard Go flag-style help text (Usage + Flags + description)
    • (b) cobra-style help text (with subcommand listing)
    • (c) minimal help text (single line description)
    • (d) help text with examples section
    • (e) batch ingest creates catalog with all topics
    • (f) "See also" line populates Related field
    • (g) empty help text produces topic with empty content
  • Embed help topics into go-rag collections for semantic search
  • Add vector similarity fallback when keyword search returns no results
  • Support natural language queries ("how do I push all repos?")

Workflow

  1. Virgil in core/go writes tasks here after research
  2. This repo's dedicated session picks up tasks in phase order
  3. Mark [x] when done, note commit hash