docs: add domain expert guide, task queue, and research notes

CLAUDE.md: architecture guide for ansible/build/infra/release (29K LOC)
TODO.md: 5-phase task queue (test coverage, ansible, infra, release, devkit)
FINDINGS.md: package inventory, test gaps, config ecosystem

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-20 00:59:01 +00:00
parent 392ad68047
commit 9b55b97b28
3 changed files with 335 additions and 0 deletions

208
CLAUDE.md Normal file
View file

@ -0,0 +1,208 @@
# CLAUDE.md — go-devops Domain Expert Guide
You are a dedicated domain expert for `forge.lthn.ai/core/go-devops`. Virgil (in core/go) orchestrates your work via TODO.md. Pick up tasks in phase order, mark `[x]` when done, commit and push.
## What This Package Does
Infrastructure management, build automation, and release pipelines. ~29K LOC across 118 Go files. Provides:
- **Ansible engine** — Native Go playbook executor (not shelling out to `ansible-playbook`). SSH, modules, facts, handlers.
- **Build system** — Plugin-based builders (Go, Wails, Docker, C++, LinuxKit, Taskfile). Cross-compilation, code signing (macOS/GPG/Windows).
- **Release automation** — Version detection, changelog from git history, multi-target publishing (GitHub, Docker, Homebrew, AUR, Scoop, Chocolatey, npm).
- **Infrastructure APIs** — Hetzner Cloud, Hetzner Robot (bare metal), CloudNS DNS, DigitalOcean.
- **Container/VM management** — LinuxKit images on QEMU (Linux) or Hyperkit (macOS).
- **SDK generation** — OpenAPI spec parsing, TypeScript/Python/Go/PHP client generation, breaking change detection.
- **Developer toolkit** — Code quality metrics, TODO detection, coverage reports, dependency graphs.
## Commands
```bash
go test ./... # Run all tests
go test -v -run TestName ./... # Single test
go test -race ./... # Race detector
go vet ./... # Static analysis
```
## Local Dependencies
| Module | Local Path | Notes |
|--------|-----------|-------|
| `forge.lthn.ai/core/go` | `../core` | Framework (core.E, io.Medium, config, i18n, log) |
**Do NOT change the replace directive path.** Use go.work for local resolution if needed.
## Architecture
### ansible/ — Playbook Execution Engine (~3,162 LOC)
| File | LOC | Purpose |
|------|-----|---------|
| `executor.go` | 1,021 | Playbook runner: task/handler/fact tracking, become/sudo |
| `modules.go` | 1,434 | Module implementations: service, file, template, command, copy, apt, yum |
| `parser.go` | 438 | YAML playbook + inventory parser |
| `ssh.go` | 451 | SSH client connection management |
| `types.go` | 258 | Core types: Play, Task, Handler, Inventory, Facts |
Executes Ansible playbooks natively in Go. Supports: `when` conditionals, `register` variables, `notify` handlers, `become` privilege escalation, `loop` iteration, fact gathering.
### build/ — Project Building & Cross-Compilation (~3,637 LOC)
**Root** (797 LOC): Project type detection, archive creation (tar.gz/xz/zip via Borg compression), config from `.core/build.yaml`, SHA checksums.
**builders/** (1,390 LOC): Plugin interface `Builder.Build()`.
| Builder | LOC | Notes |
|---------|-----|-------|
| `go.go` | — | Go cross-compilation |
| `wails.go` | 247 | Wails desktop app |
| `docker.go` | 215 | Docker image build |
| `cpp.go` | 253 | CMake C++ |
| `linuxkit.go` | 270 | LinuxKit VM image |
| `taskfile.go` | 275 | Taskfile automation |
**signing/** (377 LOC): Signer interface. macOS `codesign`, GPG, Windows `signtool`.
**buildcmd/** (1,053 LOC): CLI handlers for `core build`, `core build pwa`, `core build sdk`, `core release`.
### container/ — LinuxKit VM Management (~1,208 LOC)
| File | LOC | Purpose |
|------|-----|---------|
| `container.go` | 106 | Manager interface + Container model |
| `linuxkit.go` | 462 | LinuxKitManager: Run, Stop, List |
| `hypervisor.go` | 273 | Abstraction: QEMU (Linux) / Hyperkit (macOS) |
| `state.go` | 172 | Container state persistence (`~/.core/state.json`) |
| `templates.go` | 301 | Packer/LinuxKit template rendering |
### devops/ — Portable Dev Environment (~1,216 LOC)
| File | LOC | Purpose |
|------|-----|---------|
| `devops.go` | 243 | Manager: install, boot, stop, status |
| `config.go` | 90 | Config from `~/.core/config.yaml` |
| `images.go` | 198 | ImageManager: download from GitHub/CDN/registry |
| `shell.go` | 74 | Shell execution wrapper |
| `test.go` | 188 | Test execution helpers |
| `serve.go` | 109 | Dev environment HTTP server |
| `claude.go` | 143 | Claude/AI integration |
| `ssh_utils.go` | 68 | SSH key scanning |
**sources/** (218 LOC): `ImageSource` interface. GitHub Releases + S3/CDN download sources.
### infra/ — Infrastructure APIs (~953 LOC)
| File | LOC | Purpose |
|------|-----|---------|
| `config.go` | 300 | `infra.yaml` types: Host, LoadBalancer, Network, DNS, Database, Cache |
| `hetzner.go` | 381 | Hetzner Cloud API (VPS) + Hetzner Robot API (bare metal) |
| `cloudns.go` | 272 | CloudNS DNS: zones, records, ACME DNS-01 challenges |
### release/ — Release Automation (~4,008 LOC)
**Root** (1,398 LOC): Release orchestrator (version → build → changelog → publish), config from `.core/release.yaml`, git-based changelog, semver detection.
**publishers/** (2,610 LOC): `Publisher` interface.
| Publisher | LOC | Notes |
|-----------|-----|-------|
| `github.go` | 233 | GitHub Releases |
| `docker.go` | 278 | Docker image build + push |
| `homebrew.go` | 371 | Homebrew formula |
| `npm.go` | 265 | npm registry |
| `aur.go` | 313 | Arch Linux AUR |
| `scoop.go` | 284 | Windows Scoop |
| `chocolatey.go` | 294 | Windows Chocolatey |
| `linuxkit.go` | 300 | LinuxKit image |
### sdk/ — OpenAPI SDK Generation (~931 LOC)
Auto-detect OpenAPI spec, generate typed clients in 4 languages, detect breaking changes via oasdiff.
**generators/** (437 LOC): TypeScript, Python, Go, PHP generators.
### devkit/ — Developer Toolkit (~560 LOC)
Code quality analysis: TODOs/FIXMEs, coverage reports, race conditions, vulnerability detection, secret leak scanning, cyclomatic complexity, dependency graphs.
### deploy/ — Deployment Integrations (~366 LOC)
- **python/** — Embedded Python 3.13 runtime (kluctl/go-embed-python)
- **coolify/** — Coolify PaaS API client via Python Swagger
## Key Interfaces
```go
// build/builders/
type Builder interface {
Name() string
Detect(fs io.Medium, dir string) (bool, error)
Build(ctx context.Context, cfg *Config, targets []Target) ([]Artifact, error)
}
// release/publishers/
type Publisher interface {
Name() string
Publish(ctx context.Context, release *Release, pubCfg PublisherConfig, relCfg ReleaseConfig, dryRun bool) error
}
// container/
type Hypervisor interface {
Name() string
Available() bool
Run(ctx context.Context, opts RunOptions) (*process.Handle, error)
}
// devops/sources/
type ImageSource interface {
Name() string
Available() bool
Download(ctx context.Context, name, version string, progress func(downloaded, total int64)) (string, error)
}
// build/signing/
type Signer interface {
Name() string
Available() bool
Sign(filePath, keyID string) ([]byte, error)
}
// sdk/generators/
type Generator interface {
Language() string
Generate(ctx context.Context, spec, outputDir string, config *Config) error
}
```
## External Dependencies
| Package | Purpose |
|---------|---------|
| `github.com/Snider/Borg` | Compression (xz) for archives. **Not** Secure/Blob/Pointer. |
| `github.com/getkin/kin-openapi` | OpenAPI 3.x spec parsing |
| `github.com/oasdiff/oasdiff` | API breaking change detection |
| `github.com/kluctl/go-embed-python` | Embedded Python 3.13 runtime |
| `github.com/spf13/cobra` | CLI framework for build/release commands |
| `golang.org/x/crypto` | SSH connections (ansible/) |
## Configuration Files
- `.core/build.yaml` — Build targets, ldflags, signing, archive format
- `.core/release.yaml` — Version source, changelog style, SDK langs, publisher configs
- `infra.yaml` — Host inventory, DNS zones, cloud provider settings
- `~/.core/config.yaml` — Local dev environment config
## Coding Standards
- **UK English**: colour, organisation, centre
- **Tests**: testify assert/require, `_Good`/`_Bad`/`_Ugly` naming convention
- **Conventional commits**: `feat(ansible):`, `fix(infra):`, `refactor(build):`
- **Co-Author**: `Co-Authored-By: Virgil <virgil@lethean.io>`
- **Licence**: EUPL-1.2
- **Imports**: stdlib → forge.lthn.ai → third-party, each group separated by blank line
## Forge
- **Repo**: `forge.lthn.ai/core/go-devops`
- **Push via SSH**: `git push forge main` (remote: `ssh://git@forge.lthn.ai:2223/core/go-devops.git`)
## Task Queue
See `TODO.md` for prioritised work. See `FINDINGS.md` for research notes.

78
FINDINGS.md Normal file
View file

@ -0,0 +1,78 @@
# FINDINGS.md — go-devops Research & Discovery
## 2026-02-20: Initial Analysis (Virgil)
### Origin
Extracted from `core/go` on 16 Feb 2026 (commit `392ad68`). Single extraction commit — fresh repo.
### Package Inventory
| Package | Files | Source LOC | Test Files | Notes |
|---------|-------|-----------|-----------|-------|
| `ansible/` | 5 | 3,162 | 1 | Playbook executor, SSH, modules, parser |
| `build/` | 6 | 797 | 4 | Project detection, archives, checksums, config |
| `build/builders/` | 6 | 1,390 | — | Go, Wails, Docker, C++, LinuxKit, Taskfile |
| `build/signing/` | 5 | 377 | — | macOS, GPG, Windows signtool |
| `build/buildcmd/` | 6 | 1,053 | — | CLI command handlers |
| `container/` | 5 | 1,208 | 4 | LinuxKit VMs, hypervisor abstraction, state |
| `deploy/python/` | 1 | 147 | — | Embedded Python 3.13 |
| `deploy/coolify/` | 1 | 219 | — | Coolify PaaS API client |
| `devkit/` | 1 | 560 | 1 | Code quality metrics |
| `devops/` | 8 | 1,216 | 8 | Dev environment manager |
| `devops/sources/` | 3 | 218 | — | GitHub/CDN image sources |
| `infra/` | 3 | 953 | 1 | Hetzner, CloudNS, config |
| `release/` | 5 | 1,398 | 5 | Release orchestrator |
| `release/publishers/` | 9 | 2,610 | 9 | 8 target platforms |
| `sdk/` | 3 | 494 | 3 | OpenAPI detection + diff |
| `sdk/generators/` | 5 | 437 | 5 | 4-language SDK gen |
**Total**: ~29K LOC across 71 source files + 47 test files
### Key Observations
1. **ansible/modules.go is the largest file** — 1,434 LOC implementing Ansible modules in pure Go. Zero tests. Highest-priority testing gap.
2. **Borg dependency is compression-only**`github.com/Snider/Borg` used for xz archive creation in `build/archive.go`. Does NOT use the Secure/Blob/Pointer features.
3. **Python 3.13 embedded**`deploy/python/` embeds a full Python runtime via kluctl/go-embed-python. Used exclusively for Coolify API client (Python Swagger). Consider replacing with native Go HTTP client to remove the 50MB+ Python dependency.
4. **DigitalOcean gap** — Referenced in `infra/config.go` types but no `digitalocean.go` implementation exists. Either implement or remove the dead types.
5. **Single-commit repo** — Entire codebase arrived in one `feat: extract` commit. No git history for individual components. This makes blame/bisect impossible for bugs originating before extraction.
6. **Hypervisor platform detection**`container/hypervisor.go` auto-selects QEMU on Linux, Hyperkit on macOS. Both are platform-specific — tests may need build tags or mocking.
7. **CLI via Cobra**`build/buildcmd/` uses Cobra directly (not core/go's CLI framework). May need alignment.
8. **8 release publishers** — GitHub, Docker, Homebrew, npm, AUR, Scoop, Chocolatey, LinuxKit. All implement the `Publisher` interface. Each is ~250-370 LOC. All have test files.
### Test Coverage Gaps
| Package | Gap Severity | Notes |
|---------|-------------|-------|
| `ansible/modules.go` | **Critical** | 1,434 LOC, zero tests |
| `ansible/executor.go` | **Critical** | 1,021 LOC, zero tests |
| `ansible/parser.go` | High | 438 LOC, zero tests |
| `infra/hetzner.go` | High | 381 LOC, zero tests — API calls untested |
| `infra/cloudns.go` | High | 272 LOC, zero tests — DNS ops untested |
| `build/builders/*` | Medium | 1,390 LOC, no individual builder tests |
| `build/signing/*` | Medium | 377 LOC, signing logic untested |
| `deploy/*` | Low | 366 LOC, Python/Coolify integration |
### Integration Points
- **core/go** → Framework (core.E, io.Medium, config, logging)
- **core/go-crypt** → SSH key management (ansible/ssh.go uses golang.org/x/crypto directly, could use go-crypt)
- **core/cli** → Build/release commands registered via Cobra
- **DevOps repo**`infra.yaml` config used by Ansible playbooks in `/Users/snider/Code/DevOps`
### Config File Ecosystem
| File | Location | Purpose |
|------|----------|---------|
| `.core/build.yaml` | Project root | Build targets, signing, archives |
| `.core/release.yaml` | Project root | Version, changelog, publishers |
| `infra.yaml` | Project root | Host inventory, DNS, cloud providers |
| `~/.core/config.yaml` | User home | Local dev environment config |
| `~/.core/state.json` | User home | Container/VM state persistence |

49
TODO.md Normal file
View file

@ -0,0 +1,49 @@
# TODO.md — go-devops
Dispatched from core/go orchestration. Pick up tasks in order.
---
## Phase 0: Test Coverage & Hardening
- [ ] **Expand ansible/ tests** — Only `ssh_test.go` exists. Add: `executor_test.go` (run a minimal playbook with mock SSH, verify task order + handler notification), `modules_test.go` (test each module: service start/stop, file copy, template render, command exec — use mocked SSH session), `parser_test.go` (parse valid playbook YAML, invalid YAML, empty plays, nested vars), `types_test.go` (Facts merge, Inventory host grouping).
- [ ] **Expand infra/ tests** — Only `config_test.go` exists. Add: `hetzner_test.go` (mock HTTP responses for server list/create/delete, load balancer ops, snapshot management), `cloudns_test.go` (mock DNS zone/record CRUD, ACME challenge create/cleanup, error responses). Use `httptest.NewServer` for API mocking.
- [ ] **Expand build/ tests** — Add: builder detection tests (each builder's `Detect()` with matching/non-matching directory structures), archive round-trip (create tar.gz → extract → compare), signing mock tests (verify `Sign()` called with correct paths).
- [ ] **Expand release/ tests** — Add: version detection from git tags / package.json / go.mod, changelog generation from conventional commits (mock git log output), publisher dry-run tests.
- [ ] **Race condition tests**`go test -race ./...` across all packages. Ansible executor runs concurrent handlers — verify thread safety.
- [ ] **`go vet ./...` clean** — Fix any warnings.
## Phase 1: Ansible Engine Hardening
- [ ] **Module test coverage**`modules.go` is 1,434 LOC with zero tests. Each module (service, file, template, command, copy, apt, yum) needs unit tests with mocked SSH sessions.
- [ ] **Error propagation** — Verify all SSH errors are wrapped with `core.E()` including host context. Currently some errors may lose the host identifier.
- [ ] **Fact gathering** — Test fact collection from different Linux distros (Ubuntu, CentOS, Alpine). Mock `/etc/os-release` parsing.
- [ ] **Become/sudo** — Test privilege escalation paths. Verify password prompt handling.
- [ ] **Idempotency checks** — Modules should report `changed: false` when no action needed. Verify for file, service, template modules.
## Phase 2: Infrastructure API Robustness
- [ ] **Retry logic** — Add configurable retry with exponential backoff for Hetzner Cloud/Robot and CloudNS API calls. Cloud APIs are flaky.
- [ ] **Rate limiting** — Hetzner Cloud has rate limits. Detect 429 responses, queue and retry.
- [ ] **DigitalOcean support** — Currently referenced in config but no implementation. Either implement or remove.
- [ ] **API client abstraction** — Extract common HTTP client pattern from hetzner.go and cloudns.go into shared infra client.
## Phase 3: Release Pipeline Testing
- [ ] **Publisher integration tests** — Mock GitHub API for release creation, Docker registry for image push, Homebrew tap for formula update. Verify dry-run mode produces correct output without side effects.
- [ ] **SDK generation tests** — Generate TypeScript/Go/Python clients from a test OpenAPI spec. Verify output compiles/type-checks.
- [ ] **Breaking change detection** — Test oasdiff integration: modify a spec with breaking change, verify detection and failure mode.
## Phase 4: DevKit Expansion
- [ ] **Vulnerability scanning** — Integrate `govulncheck` output parsing into devkit findings.
- [ ] **Complexity thresholds** — Configurable cyclomatic complexity threshold. Flag functions exceeding it.
- [ ] **Coverage trending** — Store coverage snapshots, detect regressions between runs.
---
## Workflow
1. Virgil in core/go writes tasks here after research
2. This repo's dedicated session picks up tasks in phase order
3. Mark `[x]` when done, note commit hash