From 42024ef4760cfd359c6bd7c6286db0243a01e141 Mon Sep 17 00:00:00 2001 From: Snider Date: Wed, 11 Mar 2026 13:02:40 +0000 Subject: [PATCH] docs: add human-friendly documentation Co-Authored-By: Claude Opus 4.6 --- docs/architecture.md | 252 +++++++++++++++++++++++++++++++++++++++++++ docs/development.md | 160 +++++++++++++++++++++++++++ docs/index.md | 122 +++++++++++++++++++++ 3 files changed, 534 insertions(+) create mode 100644 docs/architecture.md create mode 100644 docs/development.md create mode 100644 docs/index.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..a36207a --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,252 @@ +--- +title: Architecture +description: Internal design of go-infra -- shared HTTP client, provider clients, configuration model, and CLI command structure. +--- + +# Architecture + +go-infra is organised into four layers: a shared HTTP client, provider-specific API clients, a declarative configuration parser, and CLI commands that tie them together. + +``` +cmd/prod/ CLI commands (setup, status, dns, lb, ssh) +cmd/monitor/ CLI commands (security finding aggregation) + | + v +config.go YAML config parser (infra.yaml) +hetzner.go Hetzner Cloud + Robot API clients +cloudns.go CloudNS DNS API client + | + v +client.go Shared APIClient (retry, backoff, rate-limit) + | + v +net/http Go standard library +``` + +## Shared HTTP Client (`client.go`) + +All provider-specific clients delegate HTTP requests to `APIClient`, which provides: + +- **Exponential backoff with jitter** -- retries on 5xx errors and network failures +- **Rate-limit compliance** -- honours `Retry-After` headers on 429 responses +- **Configurable authentication** -- each provider injects its own auth function +- **Context-aware cancellation** -- all waits respect `context.Context` deadlines + +### Key Types + +```go +type APIClient struct { + client *http.Client + retry RetryConfig + authFn func(req *http.Request) + prefix string // error message prefix, e.g. "hcloud API" + mu sync.Mutex + blockedUntil time.Time // rate-limit backoff window +} + +type RetryConfig struct { + MaxRetries int // 0 = no retries + InitialBackoff time.Duration // delay before first retry + MaxBackoff time.Duration // upper bound on backoff duration +} +``` + +### Configuration via Options + +`APIClient` uses the functional options pattern: + +```go +client := infra.NewAPIClient( + infra.WithHTTPClient(customHTTPClient), + infra.WithAuth(func(req *http.Request) { + req.Header.Set("Authorization", "Bearer "+token) + }), + infra.WithRetry(infra.RetryConfig{ + MaxRetries: 5, + InitialBackoff: 200 * time.Millisecond, + MaxBackoff: 10 * time.Second, + }), + infra.WithPrefix("my-api"), +) +``` + +Default configuration (from `DefaultRetryConfig()`): 3 retries, 100ms initial backoff, 5s maximum backoff. + +### Request Flow + +The `Do(req, result)` and `DoRaw(req)` methods follow this flow for each attempt: + +1. **Rate-limit check** -- if a previous 429 response set `blockedUntil`, wait until that time passes (or the context is cancelled). +2. **Apply authentication** -- call `authFn(req)` to inject credentials. +3. **Execute request** -- send via the underlying `http.Client`. +4. **Handle response**: + - **429 Too Many Requests** -- parse `Retry-After` header, set `blockedUntil`, and retry. + - **5xx Server Error** -- retryable; sleep with exponential backoff + jitter. + - **4xx Client Error** (except 429) -- not retried; return error immediately. + - **2xx Success** -- if `result` is non-nil, JSON-decode the body into it. +5. If all attempts are exhausted, return the last error. + +The backoff calculation uses `base = initialBackoff * 2^attempt`, capped at `maxBackoff`, with jitter applied as a random factor between 50% and 100% of the calculated value. + +### Do vs DoRaw + +- `Do(req, result)` -- decodes the response body as JSON into `result`. Pass `nil` for fire-and-forget requests (e.g. DELETE). +- `DoRaw(req)` -- returns the raw `[]byte` response body. Used by CloudNS, whose responses need manual parsing due to inconsistent JSON shapes. + +## Hetzner Clients (`hetzner.go`) + +Two separate clients cover Hetzner's two distinct APIs. + +### HCloudClient (Hetzner Cloud API) + +Manages cloud servers, load balancers, and snapshots via `https://api.hetzner.cloud/v1`. Uses bearer token authentication. + +```go +hc := infra.NewHCloudClient("your-token") +``` + +**Operations:** + +| Method | Description | +|--------|-------------| +| `ListServers(ctx)` | List all cloud servers | +| `ListLoadBalancers(ctx)` | List all load balancers | +| `GetLoadBalancer(ctx, id)` | Get a load balancer by ID | +| `CreateLoadBalancer(ctx, req)` | Create a load balancer from a typed request struct | +| `DeleteLoadBalancer(ctx, id)` | Delete a load balancer by ID | +| `CreateSnapshot(ctx, serverID, description)` | Create a server snapshot | + +**Data model hierarchy:** + +``` +HCloudServer + +-- HCloudPublicNet --> HCloudIPv4 + +-- []HCloudPrivateNet + +-- HCloudServerType (name, cores, memory, disk) + +-- HCloudDatacenter + +HCloudLoadBalancer + +-- HCloudLBPublicNet --> HCloudIPv4 + +-- HCloudLBAlgorithm + +-- []HCloudLBService + | +-- HCloudLBHTTP (optional) + | +-- HCloudLBHealthCheck --> HCloudLBHCHTTP (optional) + +-- []HCloudLBTarget + +-- HCloudLBTargetIP (optional) + +-- HCloudLBTargetServer (optional) + +-- []HCloudLBHealthStatus +``` + +### HRobotClient (Hetzner Robot API) + +Manages dedicated (bare-metal) servers via `https://robot-ws.your-server.de`. Uses HTTP Basic authentication. + +```go +hr := infra.NewHRobotClient("user", "password") +``` + +**Operations:** + +| Method | Description | +|--------|-------------| +| `ListServers(ctx)` | List all dedicated servers | +| `GetServer(ctx, ip)` | Get a server by IP address | + +The Robot API wraps each server object in a `{"server": {...}}` envelope. `HRobotClient` unwraps this automatically. + +## CloudNS Client (`cloudns.go`) + +Manages DNS zones and records via `https://api.cloudns.net`. Uses query-parameter authentication (`auth-id` + `auth-password`). + +```go +dns := infra.NewCloudNSClient("12345", "password") +``` + +**Operations:** + +| Method | Description | +|--------|-------------| +| `ListZones(ctx)` | List all DNS zones | +| `ListRecords(ctx, domain)` | List all records in a zone (returns `map[id]CloudNSRecord`) | +| `CreateRecord(ctx, domain, host, type, value, ttl)` | Create a record; returns the new record ID | +| `UpdateRecord(ctx, domain, id, host, type, value, ttl)` | Update an existing record | +| `DeleteRecord(ctx, domain, id)` | Delete a record by ID | +| `EnsureRecord(ctx, domain, host, type, value, ttl)` | Idempotent create-or-update; returns whether a change was made | +| `SetACMEChallenge(ctx, domain, value)` | Create a `_acme-challenge` TXT record with 60s TTL | +| `ClearACMEChallenge(ctx, domain)` | Delete all `_acme-challenge` TXT records in a zone | + +**CloudNS quirks handled internally:** + +- Empty zone lists come back as `{}` (an object) instead of `[]` (an array). `ListZones` handles this gracefully. +- All mutations use POST with query parameters (not request bodies). +- Response status is checked via a `"status": "Success"` field in the JSON body, not HTTP status codes alone. + +## Configuration Model (`config.go`) + +The `Config` struct represents the full infrastructure topology, parsed from an `infra.yaml` file. It covers: + +``` +Config + +-- Hosts (map[string]*Host) Servers with SSH details, role, and services + +-- LoadBalancer Hetzner managed LB (name, type, backends, listeners, health) + +-- Network Private network CIDR + +-- DNS Provider config + zone records + +-- SSL Wildcard certificate settings + +-- Database Galera/MariaDB cluster nodes + backup config + +-- Cache Redis/Dragonfly cluster nodes + +-- Containers (map[string]*Container) Container deployments (image, replicas, depends_on) + +-- S3 Object storage endpoint + buckets + +-- CDN CDN provider and zones + +-- CICD CI/CD provider, runner, registry + +-- Monitoring Health endpoints and alert thresholds + +-- Backups Daily and weekly backup jobs +``` + +### Loading + +Two functions load configuration: + +- `Load(path)` -- reads and parses a specific file. Expands `~` in SSH key paths and defaults SSH port to 22. +- `Discover(startDir)` -- walks up from `startDir` looking for `infra.yaml`, then calls `Load`. Returns the config, the path found, and any error. + +### Host Queries + +```go +// Get all hosts with a specific role +appServers := cfg.HostsByRole("app") + +// Shorthand for role="app" +appServers := cfg.AppServers() +``` + +## CLI Commands + +### `core prod` (`cmd/prod/`) + +The production command group reads `infra.yaml` (auto-discovered or specified via `--config`) and provides: + +| Subcommand | Description | +|------------|-------------| +| `status` | Parallel SSH health check of all hosts. Checks Docker, Galera cluster size, Redis, Traefik, Coolify, Forgejo runner. Also queries Hetzner Cloud for load balancer health if `HCLOUD_TOKEN` is set. | +| `setup` | Runs a three-step foundation pipeline: **discover** (enumerate Hetzner Cloud + Robot servers), **lb** (create load balancer from config), **dns** (ensure DNS records via CloudNS). Supports `--dry-run` and `--step` for partial runs. | +| `dns list [zone]` | List DNS records for a zone (defaults to `host.uk.com`). | +| `dns set ` | Idempotent create-or-update of a DNS record. | +| `lb status` | Display load balancer details and per-target health status. | +| `lb create` | Create the load balancer defined in `infra.yaml`. | +| `ssh ` | Look up a host by name in `infra.yaml` and `exec` into an SSH session. | + +The `status` command uses `go-ansible`'s `SSHClient` to connect to each host in parallel, then runs shell commands to probe service state (Docker containers, MariaDB cluster, Redis ping, etc.). + +### `core monitor` (`cmd/monitor/`) + +Aggregates security findings from GitHub's Security tab using the `gh` CLI: + +- **Code scanning alerts** -- from Semgrep, Trivy, Gitleaks, CodeQL, etc. +- **Dependabot alerts** -- dependency vulnerability alerts. +- **Secret scanning alerts** -- exposed secrets/credentials (always classified as critical). + +Findings are normalised to a common `Finding` struct, sorted by severity (critical first), and output as either a formatted table or JSON. + +## Licence + +EUPL-1.2 diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..aff587f --- /dev/null +++ b/docs/development.md @@ -0,0 +1,160 @@ +--- +title: Development +description: How to build, test, and contribute to go-infra. +--- + +# Development + +## Prerequisites + +- **Go 1.26+** +- **Go workspace** -- this module is part of the workspace at `~/Code/go.work`. After cloning, run `go work sync` if module resolution fails. +- **`gh` CLI** (optional) -- required only for `core monitor` commands. + +## Building + +The library package (`infra`) has no binary output. The CLI commands in `cmd/prod/` and `cmd/monitor/` are compiled into the `core` binary via the `forge.lthn.ai/core/cli` module -- they are not standalone binaries. + +To verify the package compiles: + +```bash +cd /Users/snider/Code/core/go-infra +go build ./... +``` + +## Running Tests + +```bash +# All tests +go test ./... + +# With race detector +go test -race ./... + +# A specific test +go test -run TestAPIClient_Do_Good_Success + +# Verbose output +go test -v ./... +``` + +If the `core` CLI is available: + +```bash +core go test +core go test --run TestAPIClient_Do_Good_Success +``` + +### Test Organisation + +Tests follow the `_Good`, `_Bad`, `_Ugly` suffix convention: + +| Suffix | Purpose | Example | +|--------|---------|---------| +| `_Good` | Happy path -- expected successful behaviour | `TestAPIClient_Do_Good_Success` | +| `_Bad` | Expected error conditions -- invalid input, auth failures, exhausted retries | `TestAPIClient_Do_Bad_ClientError` | +| `_Ugly` | Edge cases -- context cancellation, malformed data, panics | `TestAPIClient_Do_Ugly_ContextCancelled` | + +### Test Approach + +All API client tests use `net/http/httptest.Server` to mock HTTP responses. No real API calls are made during tests. The test servers simulate: + +- Successful JSON responses +- HTTP error codes (400, 401, 403, 404, 500, 502, 503) +- Rate limiting (429 with `Retry-After` header) +- Transient failures that succeed after retries +- Authentication verification (bearer tokens, basic auth, query parameters) + +The config tests use `Discover()` to find a real `infra.yaml` in parent directories (skipped if not present) and also test error paths with nonexistent and malformed files. + +### Test Coverage by File + +| File | Tests | Coverage Focus | +|------|-------|----------------| +| `client_test.go` | 20 tests | Constructor defaults/options, `Do` JSON decoding, `DoRaw` raw responses, retry on 5xx, no retry on 4xx, rate-limit handling, context cancellation, `parseRetryAfter`, integration with HCloud/CloudNS clients | +| `hetzner_test.go` | 10 tests | HCloud/HRobot constructors, `ListServers`, JSON deserialisation of servers/load balancers/Robot servers, auth header verification, error responses | +| `cloudns_test.go` | 16 tests | Constructor, auth params, raw HTTP calls, zone/record JSON parsing, CRUD round-trips, ACME challenge helpers, `EnsureRecord` logic (already correct / needs update / needs create), edge cases (empty body, empty map) | +| `config_test.go` | 4 tests | `Load` with real config, missing file, invalid YAML, `expandPath` with tilde/absolute/relative paths | + +## Code Style + +- **UK English** in all documentation, comments, and user-facing strings (colour, organisation, centre, serialisation). +- **Strict typing** -- all function parameters and return values have explicit types. +- **Error wrapping** -- use `fmt.Errorf("context: %w", err)` to preserve error chains. +- **Formatting** -- standard `gofmt`. Run `go fmt ./...` or `core go fmt` before committing. + +## Adding a New Provider Client + +To add support for a new infrastructure provider: + +1. Create a new file (e.g. `vultr.go`) in the package root. +2. Define a client struct that embeds or holds an `*APIClient`: + +```go +type VultrClient struct { + apiKey string + baseURL string + api *APIClient +} + +func NewVultrClient(apiKey string) *VultrClient { + c := &VultrClient{ + apiKey: apiKey, + baseURL: "https://api.vultr.com/v2", + } + c.api = NewAPIClient( + WithAuth(func(req *http.Request) { + req.Header.Set("Authorization", "Bearer "+c.apiKey) + }), + WithPrefix("vultr API"), + ) + return c +} +``` + +3. Add internal helper methods (`get`, `post`, `delete`) that delegate to `c.api.Do(req, result)`. +4. Write tests using `httptest.NewServer` -- never call real APIs in tests. +5. Follow the `_Good`/`_Bad`/`_Ugly` test naming convention. + +## Adding CLI Commands + +CLI commands live in subdirectories of `cmd/`. Each command package: + +1. Calls `cli.RegisterCommands(AddXyzCommands)` in an `init()` function (see `cmd/prod/cmd_commands.go`). +2. Defines a root `*cli.Command` with subcommands. +3. Uses `loadConfig()` to auto-discover `infra.yaml` when needed. + +The `core` binary picks up these commands via blank imports in its main package. + +## Project Structure + +``` +go-infra/ + client.go Shared APIClient + client_test.go APIClient tests (20 tests) + config.go YAML config types + parser + config_test.go Config tests (4 tests) + hetzner.go HCloudClient + HRobotClient + hetzner_test.go Hetzner tests (10 tests) + cloudns.go CloudNSClient + cloudns_test.go CloudNS tests (16 tests) + cmd/ + prod/ + cmd_commands.go Command registration + cmd_prod.go Root 'prod' command + flags + cmd_status.go Parallel host health checks + cmd_setup.go Foundation setup pipeline (discover, lb, dns) + cmd_dns.go DNS record management + cmd_lb.go Load balancer management + cmd_ssh.go SSH into production hosts + monitor/ + cmd_commands.go Command registration + cmd_monitor.go Security finding aggregation + go.mod + go.sum + CLAUDE.md +``` + +## Licence + +EUPL-1.2 diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..54545c3 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,122 @@ +--- +title: go-infra +description: Infrastructure provider API clients and YAML-based configuration for managing production environments. +--- + +# go-infra + +`forge.lthn.ai/core/go-infra` provides typed Go clients for infrastructure provider APIs (Hetzner Cloud, Hetzner Robot, CloudNS) and a declarative YAML configuration layer for describing production topology. It also ships CLI commands for production management (`core prod`) and security monitoring (`core monitor`). + +The library has no framework dependencies beyond the Go standard library, YAML parsing, and testify for tests. All HTTP communication goes through a shared `APIClient` that handles retries, exponential backoff, and rate-limit compliance automatically. + +## Module Path + +``` +forge.lthn.ai/core/go-infra +``` + +Requires **Go 1.26+**. + +## Quick Start + +### Using the API Clients Directly + +```go +import "forge.lthn.ai/core/go-infra" + +// Hetzner Cloud -- list all servers +hc := infra.NewHCloudClient(os.Getenv("HCLOUD_TOKEN")) +servers, err := hc.ListServers(ctx) + +// Hetzner Robot -- list dedicated servers +hr := infra.NewHRobotClient(user, password) +dedicated, err := hr.ListServers(ctx) + +// CloudNS -- ensure a DNS record exists +dns := infra.NewCloudNSClient(authID, authPassword) +changed, err := dns.EnsureRecord(ctx, "example.com", "www", "A", "1.2.3.4", 300) +``` + +### Loading Infrastructure Configuration + +```go +import "forge.lthn.ai/core/go-infra" + +// Auto-discover infra.yaml by walking up from the current directory +cfg, path, err := infra.Discover(".") + +// Or load a specific file +cfg, err := infra.Load("/path/to/infra.yaml") + +// Query the configuration +appServers := cfg.AppServers() +for name, host := range appServers { + fmt.Printf("%s: %s (%s)\n", name, host.IP, host.Role) +} +``` + +### CLI Commands + +When registered with the `core` CLI binary, go-infra provides two command groups: + +```bash +# Production infrastructure management +core prod status # Health check all hosts, services, and load balancer +core prod setup # Phase 1 foundation: discover topology, create LB, configure DNS +core prod setup --dry-run # Preview what setup would do +core prod setup --step=dns # Run a single setup step +core prod dns list # List DNS records for a zone +core prod dns set www A 1.2.3.4 # Create or update a DNS record +core prod lb status # Show load balancer status and target health +core prod lb create # Create load balancer from infra.yaml +core prod ssh noc # SSH into a named host + +# Security monitoring (aggregates GitHub Security findings) +core monitor # Scan current repo +core monitor --all # Scan all repos in registry +core monitor --repo core-php # Scan a specific repo +core monitor --severity high # Filter by severity +core monitor --json # JSON output +``` + +## Package Layout + +| Path | Description | +|------|-------------| +| `client.go` | Shared HTTP API client with retry, exponential backoff, and rate-limit handling | +| `config.go` | YAML infrastructure configuration parser and typed config structs | +| `hetzner.go` | Hetzner Cloud API (servers, load balancers, snapshots) and Hetzner Robot API (dedicated servers) | +| `cloudns.go` | CloudNS DNS API (zones, records, ACME challenge helpers) | +| `cmd/prod/` | CLI commands for production infrastructure management (`core prod`) | +| `cmd/monitor/` | CLI commands for security finding aggregation (`core monitor`) | + +## Dependencies + +### Direct + +| Module | Purpose | +|--------|---------| +| `forge.lthn.ai/core/cli` | CLI framework (cobra-based command registration) | +| `forge.lthn.ai/core/go-ansible` | SSH client used by `core prod status` for host health checks | +| `forge.lthn.ai/core/go-i18n` | Internationalisation strings for monitor command | +| `forge.lthn.ai/core/go-io` | Filesystem abstraction used by monitor's registry lookup | +| `forge.lthn.ai/core/go-log` | Structured error logging | +| `forge.lthn.ai/core/go-scm` | Repository registry for multi-repo monitoring | +| `gopkg.in/yaml.v3` | YAML parsing for `infra.yaml` | +| `github.com/stretchr/testify` | Test assertions | + +The core library types (`config.go`, `client.go`, `hetzner.go`, `cloudns.go`) only depend on the standard library and `gopkg.in/yaml.v3`. The heavier dependencies (`cli`, `go-ansible`, `go-scm`, etc.) are confined to the `cmd/` packages. + +## Environment Variables + +| Variable | Used by | Description | +|----------|---------|-------------| +| `HCLOUD_TOKEN` | `prod setup`, `prod status`, `prod lb` | Hetzner Cloud API bearer token | +| `HETZNER_ROBOT_USER` | `prod setup` | Hetzner Robot API username | +| `HETZNER_ROBOT_PASS` | `prod setup` | Hetzner Robot API password | +| `CLOUDNS_AUTH_ID` | `prod setup`, `prod dns` | CloudNS sub-auth user ID | +| `CLOUDNS_AUTH_PASSWORD` | `prod setup`, `prod dns` | CloudNS auth password | + +## Licence + +EUPL-1.2