go-html/docs/architecture.md
Snider 53edd80476 docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00

10 KiB

Architecture

go-html is an HLCRF DOM compositor with grammar pipeline integration. It provides a pure-Go, type-safe HTML rendering library designed for server-side generation with an optional lightweight WASM client module.

Module path: forge.lthn.ai/core/go-html

Node Interface

All renderable units implement a single interface:

type Node interface {
    Render(ctx *Context) string
}

Every node type is a private struct with a public constructor. The API surface is intentionally small: nine public constructors plus Attr() and Render() helpers.

Constructor Description
El(tag, ...Node) HTML element with children
Attr(Node, key, value) Set attribute on an El node; chainable
Text(key, ...any) Translated, HTML-escaped text via go-i18n
Raw(content) Unescaped trusted content
If(cond, Node) Conditional render
Unless(cond, Node) Inverse conditional render
Each[T](items, fn) Type-safe iteration with generics
Switch(selector, cases) Runtime dispatch to named cases
Entitled(feature, Node) Entitlement-gated render; deny-by-default

Safety guarantees

  • Text nodes are always HTML-escaped. XSS via user-supplied strings fed through Text() is not possible.
  • Raw is an explicit escape hatch for trusted content only. Its name signals intent.
  • Entitled returns an empty string when no entitlement function is set on the context. Access is denied by default, not granted.
  • El attributes are sorted alphabetically before output, producing deterministic HTML regardless of insertion order.
  • Void elements (br, img, input, etc.) never emit a closing tag.

HLCRF Layout

The Layout type is a compositor for five named slots: Header, Left, Content, Right, Footer. Each slot maps to a specific semantic HTML element and ARIA role:

Slot Element ARIA role
H <header> banner
L <aside> complementary
C <main> main
R <aside> complementary
F <footer> contentinfo

A layout variant string selects which slots are rendered and in which order:

NewLayout("HLCRF")   // all five slots
NewLayout("HCF")     // header, content, footer — no sidebars
NewLayout("C")       // content only

Each rendered slot receives a deterministic data-block attribute encoding its position in the tree. The root layout produces IDs in the form {slot}-0 (e.g., H-0, C-0). Nested layouts extend the parent's block ID as a path prefix: a Layout placed inside the L slot of a root layout will produce inner slot IDs like L-0-H-0, L-0-C-0.

This path scheme is computed without fmt.Sprintf — using simple string concatenation — to keep fmt out of the WASM import graph.

Nested layouts

Layout implements Node, so it can be placed inside any slot of another layout. At render time, nested layouts are cloned and their path field is set to the parent's block ID. This clone-on-render approach avoids shared mutation and is safe for concurrent use.

inner := NewLayout("HCF").H(Raw("nav")).C(Raw("body")).F(Raw("links"))
outer := NewLayout("HLCRF").H(Raw("top")).L(inner).C(Raw("main")).F(Raw("foot"))

Fluent builder

All slot methods return the *Layout for chaining. Multiple nodes may be appended to the same slot across multiple calls:

NewLayout("HCF").
    H(El("h1", Text("Title"))).
    C(El("p", Text("Content")), Raw("<hr>")).
    F(El("small", Text("Copyright")))

Responsive Compositor

Responsive wraps multiple named Layout variants for breakpoint-aware rendering. Each variant renders inside a <div data-variant="name"> container, giving CSS media queries or JavaScript a stable hook for show/hide logic.

NewResponsive().
    Variant("desktop", NewLayout("HLCRF")...).
    Variant("tablet", NewLayout("HCF")...).
    Variant("mobile", NewLayout("C")...)

Responsive itself implements Node and may be passed to Imprint() for cross-variant semantic analysis.

Note: Responsive.Variant() accepts only *Layout, not arbitrary Node values. Arbitrary subtrees must be wrapped in a layout first.

Rendering Context

Context carries per-request state through the entire node tree:

type Context struct {
    Identity     string
    Locale       string
    Entitlements func(feature string) bool
    Data         map[string]any
    service      *i18n.Service  // private; set via NewContextWithService()
}

The service field is intentionally unexported. Custom i18n adapter injection requires NewContextWithService(svc). This prevents callers from setting it inconsistently after construction.

When ctx.service is nil, Text nodes fall back to the global i18n.T() default service.

Grammar Pipeline

The grammar pipeline is a server-side-only feature. It is guarded with //go:build !js and absent from all WASM builds.

StripTags

StripTags(html string) string converts rendered HTML to plain text. Tag boundaries are collapsed to single spaces; the result is trimmed. The implementation is a single-pass rune scanner: no regex, no allocations beyond the output builder. It does not attempt to elide <script> or <style> content because go-html never generates those elements.

Imprint

Imprint(node Node, ctx *Context) reversal.GrammarImprint runs the full render-to-analysis pipeline:

  1. Call node.Render(ctx) to produce HTML.
  2. Pass HTML through StripTags to extract plain text.
  3. Pass plain text through go-i18n/reversal.Tokeniser to produce a token sequence.
  4. Wrap tokens in a reversal.GrammarImprint for structural analysis.

The resulting GrammarImprint exposes TokenCount, UniqueVerbs, and a Similar() method for pairwise semantic similarity scoring. This bridges the rendering layer to the privacy and analytics layers of the Lethean stack.

CompareVariants

CompareVariants(r *Responsive, ctx *Context) map[string]float64 runs Imprint on each named layout variant in a Responsive and returns pairwise similarity scores. Keys are "name1:name2". This enables detection of semantically divergent responsive variants — for example, a mobile layout that strips critical information that appears in the desktop variant.

Server/Client Split

The binary split is enforced by Go build tags.

File Build tag Reason for exclusion from WASM
pipeline.go //go:build !js Imports go-i18n/reversal (~250 KB gzip)
cmd/wasm/register.go //go:build !js Imports encoding/json (~200 KB gzip) and text/template (~125 KB gzip)

The WASM binary includes only: node types, layout, responsive, context, render, path, and go-i18n core (translation). No codegen, no pipeline, no JSON, no templates, no fmt.

WASM Module

The WASM entry point is cmd/wasm/main.go, compiled with GOOS=js GOARCH=wasm.

It exposes a single JavaScript function on window.gohtml:

gohtml.renderToString(variant, locale, slots)
  • variant: HLCRF variant string, e.g. "HCF".
  • locale: BCP 47 locale string for i18n, e.g. "en-GB".
  • slots: object with optional keys H, L, C, R, F containing HTML strings.

Slot content is injected via Raw(). The caller is responsible for sanitisation. This is intentional: the WASM module is a rendering engine for trusted content produced server-side or by the application's own templates.

Size gate

cmd/wasm/size_test.go contains TestWASMBinarySize_Good, a build-gated test that:

  1. Builds the WASM binary with -ldflags=-s -w.
  2. Gzip-compresses the output at best compression.
  3. Asserts the compressed size is below 1,048,576 bytes (1 MB).
  4. Asserts the raw size is below 3,145,728 bytes (3 MB).

This test is skipped under go test -short. It is guarded with //go:build !js so it does not run within the WASM environment itself. Current measured size: 2.90 MB raw, 842 KB gzip.

Codegen CLI

cmd/codegen/main.go is a build-time tool for generating Web Component JavaScript bundles from HLCRF slot assignments. It reads a JSON slot map from stdin and writes the generated JS to stdout.

echo '{"H":"nav-bar","C":"main-content"}' | go run ./cmd/codegen/ > components.js

The codegen package generates ES2022 class definitions with closed Shadow DOM. The generated pattern per component:

  • A class extending HTMLElement with a private #shadow field.
  • constructor() attaches a closed shadow root (mode: "closed").
  • connectedCallback() dispatches a wc-ready custom event with the tag name and slot.
  • render(html) sets shadow content from a <template> clone.
  • customElements.define() registration.

Closed Shadow DOM provides style isolation. Content is set via the DOM API, never via innerHTML directly on the element.

Tag names must contain a hyphen (Web Components specification requirement). TagToClassName() converts kebab-case tags to PascalCase class names: nav-bar becomes NavBar.

The codegen CLI uses encoding/json and text/template, which are excluded from the WASM build. Consumers generate the JS bundle at build time, not at runtime.

Block ID Path Scheme

path.go exports ParseBlockID(id string) []byte, which extracts the slot letter sequence from a data-block attribute value.

Format: slots are separated by -0-. The sequence L-0-C-0 decodes to ['L', 'C'], meaning the content slot of a layout nested inside the left slot.

This scheme is deterministic and human-readable. It enables server-side or client-side code to locate a specific block in the rendered tree by path.

Dependency Graph

go-html
├── forge.lthn.ai/core/go-i18n          (direct, all builds)
│   └── forge.lthn.ai/core/go-inference (indirect)
├── forge.lthn.ai/core/go-i18n/reversal (server builds only, !js)
└── github.com/stretchr/testify         (test only)

Both go-i18n and go-html are developed in parallel. The go.mod uses a replace directive pointing to ../go-i18n. Both repositories must be present on the local filesystem for builds and tests.