go-html/docs/architecture.md
Virgil d0e7f60dab
Some checks are pending
Security Scan / security (push) Waiting to run
Test / test (push) Waiting to run
feat(html): add dir helper
Add a Dir convenience helper for the dir attribute so RTL/text-direction markup can be applied without raw Attr calls.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-04-03 17:42:50 +00:00

318 lines
14 KiB
Markdown

---
title: Architecture
description: Internals of the go-html HLCRF DOM compositor, covering the node interface, layout system, responsive wrapper, grammar pipeline, WASM module, and codegen CLI.
---
# Architecture
`go-html` is structured around a single interface, a layout compositor, and a server-side analysis pipeline. Everything renders to `string` -- there is no virtual DOM, no diffing, and no retained state between renders.
## Node Interface
Every renderable unit implements one method:
```go
type Node interface {
Render(ctx *Context) string
}
```
All concrete node types are unexported structs with exported constructor functions. The public API surface consists of nine constructors plus the `Attr()` and `Render()` helpers:
| Constructor | Behaviour |
|-------------|-----------|
| `El(tag, ...Node)` | HTML element with children. Void elements (`br`, `img`, `input`, etc.) never emit a closing tag. |
| `Attr(Node, key, value)` | Sets an attribute on an `El` node. Traverses through `If`, `Unless`, `Entitled`, `Switch`, and iterator wrappers. Returns the node for chaining. |
| `Text(key, ...any)` | Translated text via `go-i18n`. Output is always HTML-escaped. |
| `Raw(content)` | Unescaped trusted content. Explicit escape hatch. |
| `If(cond, Node)` | Renders the child only when the condition function returns true. |
| `Unless(cond, Node)` | Renders the child only when the condition function returns false. |
| `Each[T](items, fn)` | Iterates a slice and renders each item via a mapping function. Generic over `T`. |
| `EachSeq[T](items, fn)` | Same as `Each` but accepts an `iter.Seq[T]` instead of a slice. |
| `Switch(selector, cases)` | Renders one of several named cases based on a runtime selector function. Returns empty string when no case matches. |
| `Entitled(feature, Node)` | Renders the child only when the context's entitlement function grants the named feature. Deny-by-default: returns empty string when no entitlement function is set. |
Accessibility-oriented helpers are also provided for common attribute patterns:
- `AriaLabel(node, label)`
- `AriaDescribedBy(node, ids...)`
- `AriaLabelledBy(node, ids...)`
- `Role(node, role)`
- `Lang(node, locale)`
- `Dir(node, direction)`
- `Alt(node, text)`
- `AriaHidden(node, hidden)`
- `TabIndex(node, index)`
- `AutoFocus(node)`
### Safety Guarantees
- **XSS prevention**: `Text()` nodes always HTML-escape their output via `html.EscapeString()`. User-supplied strings passed through `Text()` cannot inject HTML.
- **Attribute escaping**: Attribute values are escaped with `html.EscapeString()`, handling `&`, `<`, `>`, `"`, and `'`.
- **Deterministic output**: Attribute keys on `El` nodes are sorted alphabetically before rendering, producing identical output regardless of insertion order.
- **Void elements**: A lookup table of 13 void elements (`area`, `base`, `br`, `col`, `embed`, `hr`, `img`, `input`, `link`, `meta`, `source`, `track`, `wbr`) ensures these never emit a closing tag.
- **Deny-by-default entitlements**: `Entitled` returns an empty string when the context is nil, when no entitlement function is set, or when the function returns false. Content is absent from the DOM, not merely hidden.
## Rendering Context
The `Context` struct carries per-request state through the node tree during rendering:
```go
type Context struct {
Identity string // e.g. user ID or session identifier
Locale string // BCP 47 locale string
Entitlements func(feature string) bool // feature gate callback
Data map[string]any // arbitrary per-request data
service *i18n.Service // unexported; set via constructor
}
```
Two constructors are provided:
- `NewContext()` creates a context with sensible defaults and an empty `Data` map.
- `NewContextWithService(svc)` creates a context backed by a specific `i18n.Service` instance.
The `service` field is intentionally unexported. When nil, `Text` nodes fall back to the global `i18n.T()` default. This prevents callers from setting the service inconsistently after construction.
## HLCRF Layout
The `Layout` type is a compositor for five named slots:
| Slot Letter | Semantic Element | ARIA Role | Accessor |
|-------------|-----------------|-----------|----------|
| H | `<header>` | `banner` | `layout.H(...)` |
| L | `<aside>` | `complementary` | `layout.L(...)` |
| C | `<main>` | `main` | `layout.C(...)` |
| R | `<aside>` | `complementary` | `layout.R(...)` |
| F | `<footer>` | `contentinfo` | `layout.F(...)` |
### Variant String
The variant string passed to `NewLayout()` determines which slots render and in which order:
```go
NewLayout("HLCRF") // all five slots
NewLayout("HCF") // header, content, footer (no sidebars)
NewLayout("C") // content only
NewLayout("LC") // left sidebar and content
```
Slot letters not present in the variant string are ignored, even if nodes have been appended to those slots. Unrecognised characters (lowercase, digits, special characters) are silently skipped during rendering, but `ValidateLayoutVariant()` and `Layout.VariantError()` report the invalid input.
### Deterministic Block IDs
Each rendered slot receives a `data-block` attribute encoding its position in the layout tree. At the root level, IDs follow the pattern `{slot}-0`:
```html
<header role="banner" data-block="H-0">...</header>
<main role="main" data-block="C-0">...</main>
<footer role="contentinfo" data-block="F-0">...</footer>
```
Block IDs are constructed by simple string concatenation (no `fmt.Sprintf`) to keep the `fmt` package out of the WASM import graph.
### Nested Layouts
`Layout` implements `Node`, so a layout can be placed inside any slot of another layout. At render time, nested layouts are cloned and their internal `path` field is set to the parent's block ID as a prefix. This produces hierarchical paths:
```go
inner := html.NewLayout("HCF").
H(html.Raw("nav")).
C(html.Raw("body")).
F(html.Raw("links"))
outer := html.NewLayout("HLCRF").
H(html.Raw("top")).
L(inner). // inner layout nested in the Left slot
C(html.Raw("main")).
F(html.Raw("foot"))
```
The inner layout's slots render with prefixed block IDs: `L-0-H-0`, `L-0-C-0`, `L-0-F-0`. At 10 levels of nesting, the deepest block ID becomes `C-0-C-0-C-0-C-0-C-0-C-0-C-0-C-0-C-0-C-0` (tested in `edge_test.go`).
The clone-on-render approach means the original layout is never mutated. This is safe for concurrent use.
### Fluent Builder
All slot methods return `*Layout` for chaining. Multiple nodes can be appended to the same slot across multiple calls:
```go
html.NewLayout("HCF").
H(html.El("h1", html.Text("page.title"))).
C(html.El("p", html.Text("intro"))).
C(html.El("p", html.Text("body"))). // appends to the same C slot
F(html.El("small", html.Text("footer")))
```
### Block ID Parsing
`ParseBlockID()` in `path.go` extracts the slot letter sequence from a `data-block` attribute value:
```go
ParseBlockID("L-0-C-0") // returns ['L', 'C']
ParseBlockID("C-0-C-0-C-0") // returns ['C', 'C', 'C']
ParseBlockID("H-0") // returns ['H']
ParseBlockID("") // returns nil
```
This enables server-side or client-side code to locate a specific block in the rendered tree by its structural path.
## Responsive Compositor
`Responsive` wraps multiple named `Layout` variants for breakpoint-aware rendering:
```go
html.NewResponsive().
Variant("desktop", html.NewLayout("HLCRF").
H(html.Raw("header")).L(html.Raw("nav")).C(html.Raw("main")).
R(html.Raw("aside")).F(html.Raw("footer"))).
Variant("tablet", html.NewLayout("HCF").
H(html.Raw("header")).C(html.Raw("main")).F(html.Raw("footer"))).
Variant("mobile", html.NewLayout("C").
C(html.Raw("main")))
```
Each variant renders inside a `<div data-variant="name">` container. Variants render in insertion order. CSS media queries or JavaScript can target these containers for show/hide logic.
`Responsive` implements `Node`, so it can be passed to `Render()` or `Imprint()`. The `Variant()` method accepts `*Layout` specifically, not arbitrary `Node` values.
Each variant maintains independent block ID namespaces -- nesting a layout inside a responsive variant does not conflict with the same layout structure in another variant.
Two helpers support CSS targeting:
```go
VariantSelector("desktop") // [data-variant="desktop"]
ScopeVariant("desktop", ".nav") // [data-variant="desktop"] .nav
```
## Grammar Pipeline (Server-Side Only)
The grammar pipeline is excluded from WASM builds via `//go:build !js` on `pipeline.go`. It bridges the rendering layer to the semantic analysis layer.
### StripTags
```go
func StripTags(html string) string
```
Converts rendered HTML to plain text. Tag boundaries are collapsed into single spaces; the result is trimmed. The implementation is a single-pass rune scanner with no regular expressions and no allocations beyond the output `strings.Builder`. It does not handle `<script>` or `<style>` content because `go-html` never generates those elements.
### Imprint
```go
func Imprint(node Node, ctx *Context) reversal.GrammarImprint
```
Runs the full render-to-analysis pipeline:
1. Renders the node tree to HTML via `node.Render(ctx)`.
2. Strips HTML tags via `StripTags()` to extract plain text.
3. Tokenises the text via `go-i18n/reversal.NewTokeniser().Tokenise()`.
4. Wraps tokens in a `reversal.GrammarImprint` for structural analysis.
The resulting `GrammarImprint` exposes `TokenCount`, `UniqueVerbs`, and a `Similar()` method for pairwise semantic similarity scoring.
A nil context is handled gracefully: `Imprint` creates a default context internally.
### CompareVariants
```go
func CompareVariants(r *Responsive, ctx *Context) map[string]float64
```
Runs `Imprint` independently on each named layout variant in a `Responsive` and returns pairwise similarity scores. Keys are formatted as `"name1:name2"`.
This enables detection of semantically divergent responsive variants -- for example, a mobile layout that strips critical information present in the desktop variant. Same-content variants with different layout structures (e.g. `HLCRF` vs `HCF`) score above 0.8 similarity.
A single-variant `Responsive` produces an empty score map (no pairs to compare).
## WASM Module
The WASM entry point at `cmd/wasm/main.go` is compiled with `GOOS=js GOARCH=wasm` and exposes a single JavaScript function:
```js
gohtml.renderToString(variant, locale, slots)
```
**Parameters:**
- `variant` (string): HLCRF variant string, e.g. `"HCF"`.
- `locale` (string): BCP 47 locale string for i18n, e.g. `"en-GB"`.
- `slots` (object): Optional keys `H`, `L`, `C`, `R`, `F` containing HTML strings.
Slot content is injected via `Raw()`. The caller is responsible for sanitisation -- the WASM module is a rendering engine for trusted content produced server-side or by the application's own templates.
### Size Budget
The WASM binary has a size gate enforced by `cmd/wasm/size_test.go`:
| Metric | Limit | Current |
|--------|-------|---------|
| Raw binary | 3.5 MB | ~2.90 MB |
| Gzip compressed | 1 MB | ~842 KB |
The test builds the WASM binary as a subprocess and is skipped under `go test -short`. The Makefile `wasm` target performs the same build with size checking.
### Server/Client Split
The binary split is enforced by Go build tags:
| File | Build Tag | Reason for WASM Exclusion |
|------|-----------|--------------------------|
| `pipeline.go` | `!js` | Imports `go-i18n/reversal` |
| `cmd/wasm/register.go` | `!js` | Imports `encoding/json` and `text/template` |
The WASM binary includes only: node types, layout, responsive, context, render, path, and `go-i18n` core translation. No codegen, no pipeline, no JSON, no templates, no `fmt`.
## Codegen CLI
`cmd/codegen/main.go` generates Web Component JavaScript bundles from HLCRF slot assignments at build time:
```bash
echo '{"H":"nav-bar","C":"main-content","F":"page-footer"}' | go run ./cmd/codegen/ > components.js
```
The `codegen` package (`codegen/codegen.go`) generates ES2022 class definitions with closed Shadow DOM. For each custom element tag, it produces:
1. A class extending `HTMLElement` with a private `#shadow` field.
2. `constructor()` attaching a closed shadow root (`mode: "closed"`).
3. `connectedCallback()` dispatching a `wc-ready` custom event with the tag name and slot.
4. `render(html)` method that sets shadow content from a `<template>` clone.
5. A `customElements.define()` registration call.
Tag names must contain a hyphen (Web Components specification requirement). `TagToClassName()` converts kebab-case to PascalCase: `nav-bar` becomes `NavBar`, `my-super-widget` becomes `MySuperWidget`.
`GenerateBundle()` deduplicates tags -- if the same tag is assigned to multiple slots, only one class definition is emitted.
The codegen CLI uses `encoding/json` and `text/template`, which are excluded from the WASM build. Consumers generate the JS bundle at build time and serve it as a static asset.
## Data Flow Summary
```
Server-Side
+-------------------+
| |
Node tree -------> Render(ctx) |-----> HTML string
| |
| StripTags() |-----> plain text
| |
| Imprint() |-----> GrammarImprint
| | .TokenCount
| CompareVariants()| .UniqueVerbs
| | .Similar()
+-------------------+
WASM Client
+-------------------+
| |
JS call ---------> renderToString() |-----> HTML string
(variant, locale, | |
slots object) +-------------------+
Build Time
+-------------------+
| |
JSON slot map ---> cmd/codegen/ |-----> Web Component JS
(stdin) | | (stdout)
+-------------------+
```