docs: add human-friendly documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Snider 2026-03-11 13:02:40 +00:00
parent af78c9db18
commit a97bbc4ae2
3 changed files with 614 additions and 0 deletions

277
docs/architecture.md Normal file
View file

@ -0,0 +1,277 @@
---
title: Architecture
description: Internal design of go-io — the Medium interface, backend implementations, sigil transformation pipeline, and security model.
---
# Architecture
This document explains how `go-io` is structured internally, how the key types relate to one another, and how data flows through the system.
## Design Principles
1. **One interface, many backends.** All storage operations go through `Medium`. Business logic never imports a specific backend.
2. **Path sandboxing by default.** The local backend validates every path against symlink escapes. Sandboxed mediums cannot reach outside their root.
3. **Composable transforms.** The `sigil` package lets you chain encoding, compression, and encryption steps into reversible pipelines.
4. **No CGO.** The SQLite driver (`modernc.org/sqlite`) is pure Go. The entire module compiles without a C toolchain.
## Core Types
### Medium (root package)
The `Medium` interface is defined in `io.go`. It is the only type that consuming code needs to know about. The root package also provides:
- **`io.Local`** — a package-level variable initialised in `init()` via `local.New("/")`. This gives unsandboxed access to the host filesystem, mirroring the behaviour of the standard `os` package.
- **`io.NewSandboxed(root)`** — creates a `local.Medium` restricted to `root`. All path resolution is confined within that directory.
- **`io.Copy(src, srcPath, dst, dstPath)`** — copies a file between any two mediums by reading from one and writing to the other.
- **`io.MockMedium`** — a fully functional in-memory implementation for unit tests. It tracks files, directories, and modification times in plain maps.
### FileInfo and DirEntry (root package)
Simple struct implementations of `fs.FileInfo` and `fs.DirEntry` are exported from the root package for use in mocks and tests. Each backend also defines its own unexported equivalents internally.
## Backend Implementations
### local.Medium
**File:** `local/client.go`
The local backend wraps the standard `os` package with two layers of path protection:
1. **`path(p string)`** — normalises paths using `filepath.Clean("/" + p)` before joining with the root. This neutralises `..` traversal at the string level. When root is `"/"`, absolute paths pass through unchanged and relative paths resolve against the working directory.
2. **`validatePath(p string)`** — walks each path component, calling `filepath.EvalSymlinks` at every step. If any resolved component lands outside the root, it logs a security event to stderr (including timestamp, root, attempted path, and OS username) and returns `os.ErrPermission`.
Delete operations refuse to remove `"/"` or the user's home directory as an additional safety rail.
```
caller -> path(p) -> validatePath(p) -> os.ReadFile / os.WriteFile / ...
```
### s3.Medium
**File:** `s3/s3.go`
The S3 backend translates `Medium` operations into AWS SDK calls. Key design decisions:
- **Key construction:** `key(p)` uses `path.Clean("/" + p)` to sandbox traversal, then prepends the optional prefix. This means `../secret` resolves to `secret`, not an escape.
- **Directory semantics:** S3 has no real directories. `EnsureDir` is a no-op. `IsDir` and `Exists` for directory-like paths use `ListObjectsV2` with `MaxKeys: 1` to check for objects under the prefix.
- **Rename:** Implemented as copy-then-delete, since S3 has no atomic rename.
- **Append:** Downloads existing content, appends in memory, re-uploads on `Close()`. This is the only viable approach given S3's immutable-object model.
- **Testability:** The `s3API` interface (unexported) abstracts the six SDK methods used. Tests inject a `mockS3` that stores objects in a `map[string][]byte` with a `sync.RWMutex`.
### sqlite.Medium
**File:** `sqlite/sqlite.go`
Stores files and directories as rows in a single SQLite table:
```sql
CREATE TABLE IF NOT EXISTS files (
path TEXT PRIMARY KEY,
content BLOB NOT NULL,
mode INTEGER DEFAULT 420, -- 0644
is_dir BOOLEAN DEFAULT FALSE,
mtime DATETIME DEFAULT CURRENT_TIMESTAMP
)
```
- **WAL mode** is enabled at connection time for better concurrent read performance.
- **Path cleaning** uses the same `path.Clean("/" + p)` pattern as other backends.
- **Rename** is transactional: it reads the source row, inserts at the destination, deletes the source, and moves all children (if it is a directory) within a single transaction.
- **Custom tables** are supported via `WithTable("name")` to allow multiple logical filesystems in one database.
- **`:memory:`** databases work out of the box for tests.
### node.Node
**File:** `node/node.go`
A pure in-memory filesystem that implements both `Medium` and the standard library's `fs.FS`, `fs.StatFS`, `fs.ReadDirFS`, and `fs.ReadFileFS` interfaces. Directories are implicit -- they exist whenever a stored file path contains a `"/"`.
Key capabilities beyond `Medium`:
- **`ToTar()` / `FromTar()`** — serialise the entire tree to a tar archive and back. This enables snapshotting, transport, and archival.
- **`Walk()` with `WalkOptions`** — extends `fs.WalkDir` with `MaxDepth`, `Filter`, and `SkipErrors` controls.
- **`CopyFile(src, dst, perm)`** — copies a file from the in-memory tree to the real filesystem.
- **`CopyTo(target Medium, src, dst)`** — copies a file or directory tree to any other `Medium`.
- **`ReadFile(name)`** — returns a defensive copy of file content, preventing callers from mutating internal state.
### datanode.Medium
**File:** `datanode/client.go`
A thread-safe `Medium` backed by Borg's `DataNode` (an in-memory `fs.FS` with tar serialisation). It adds:
- **`sync.RWMutex`** on every operation for concurrent safety.
- **Explicit directory tracking** via a `map[string]bool`, since `DataNode` only stores files.
- **`Snapshot()` / `Restore(data)`** — serialise and deserialise the entire filesystem as a tarball.
- **`DataNode()`** — exposes the underlying Borg DataNode for integration with TIM containers.
- **File deletion** is handled by rebuilding the DataNode without the target file, since the upstream type does not expose a `Remove` method.
## store Package
**Files:** `store/store.go`, `store/medium.go`
The store package provides two complementary APIs:
### Store (key-value)
A group-namespaced key-value store backed by SQLite:
```sql
CREATE TABLE IF NOT EXISTS kv (
grp TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (grp, key)
)
```
Operations: `Get`, `Set`, `Delete`, `Count`, `DeleteGroup`, `GetAll`, `Render`.
The `Render` method loads all key-value pairs from a group into a `map[string]string` and executes a Go `text/template` against them:
```go
s.Set("user", "pool", "pool.lthn.io:3333")
s.Set("user", "wallet", "iz...")
out, _ := s.Render(`{"pool":"{{ .pool }}"}`, "user")
// out: {"pool":"pool.lthn.io:3333"}
```
### store.Medium (Medium adapter)
Wraps a `Store` to satisfy the `Medium` interface. Paths are split as `group/key`:
- `Read("config/theme")` calls `Get("config", "theme")`
- `List("")` returns all groups as directories
- `List("config")` returns all keys in the `config` group as files
- `IsDir("config")` returns true if the group has entries
You can create it directly (`NewMedium(":memory:")`) or adapt an existing store (`store.AsMedium()`).
## sigil Package
**Files:** `sigil/sigil.go`, `sigil/sigils.go`, `sigil/crypto_sigil.go`
The sigil package implements composable, reversible data transformations.
### Interface
```go
type Sigil interface {
In(data []byte) ([]byte, error) // forward transform
Out(data []byte) ([]byte, error) // reverse transform
}
```
Contracts:
- Reversible sigils: `Out(In(x)) == x`
- Irreversible sigils (hashes): `Out` returns input unchanged
- Symmetric sigils (reverse): `In(x) == Out(x)`
- `nil` input returns `nil` without error
- Empty input returns empty without error
### Available Sigils
Created via `NewSigil(name)`:
| Name | Type | Reversible |
|------|------|------------|
| `reverse` | Byte reversal | Yes (symmetric) |
| `hex` | Hexadecimal encoding | Yes |
| `base64` | Base64 encoding | Yes |
| `gzip` | Gzip compression | Yes |
| `json` | JSON compaction | No (`Out` is passthrough) |
| `json-indent` | JSON pretty-printing | No (`Out` is passthrough) |
| `md4`, `md5`, `sha1` | Legacy hashes | No |
| `sha224` .. `sha512` | SHA-2 family | No |
| `sha3-224` .. `sha3-512` | SHA-3 family | No |
| `sha512-224`, `sha512-256` | Truncated SHA-512 | No |
| `ripemd160` | RIPEMD-160 | No |
| `blake2s-256` | BLAKE2s | No |
| `blake2b-256` .. `blake2b-512` | BLAKE2b | No |
### Pipeline Functions
```go
// Apply sigils left-to-right.
encoded, _ := sigil.Transmute(data, []sigil.Sigil{gzipSigil, hexSigil})
// Reverse sigils right-to-left.
original, _ := sigil.Untransmute(encoded, []sigil.Sigil{gzipSigil, hexSigil})
```
### Authenticated Encryption: ChaChaPolySigil
`ChaChaPolySigil` provides XChaCha20-Poly1305 authenticated encryption with a pre-obfuscation layer. It implements the `Sigil` interface, so it composes naturally into pipelines.
**Encryption flow:**
```
plaintext -> obfuscate(nonce) -> XChaCha20-Poly1305 encrypt -> [nonce || ciphertext || tag]
```
**Decryption flow:**
```
[nonce || ciphertext || tag] -> decrypt -> deobfuscate(nonce) -> plaintext
```
The pre-obfuscation layer ensures that raw plaintext patterns are never sent directly to CPU encryption routines, providing defence-in-depth against side-channel attacks. Two obfuscators are provided:
- **`XORObfuscator`** (default) — XORs data with a SHA-256 counter-mode key stream derived from the nonce.
- **`ShuffleMaskObfuscator`** — applies XOR masking followed by a deterministic Fisher-Yates byte shuffle, making both value and position analysis more difficult.
```go
key := make([]byte, 32)
rand.Read(key)
s, _ := sigil.NewChaChaPolySigil(key)
ciphertext, _ := s.In([]byte("secret"))
plaintext, _ := s.Out(ciphertext)
// With stronger obfuscation:
s2, _ := sigil.NewChaChaPolySigilWithObfuscator(key, &sigil.ShuffleMaskObfuscator{})
```
Each call to `In` generates a fresh random nonce, so encrypting the same plaintext twice produces different ciphertexts.
## workspace Package
**File:** `workspace/service.go`
A higher-level service that integrates with the Core DI container (`forge.lthn.ai/core/go`). It manages encrypted workspaces stored under `~/.core/workspaces/`.
Each workspace:
- Is identified by a SHA-256 hash of the user-provided identifier
- Contains subdirectories: `config/`, `log/`, `data/`, `files/`, `keys/`
- Has a PGP keypair generated via the Core crypt service
- Supports file get/set operations on the `files/` subdirectory
- Handles IPC events (`workspace.create`, `workspace.switch`) for integration with the Core message bus
The workspace service implements `core.Workspace` and uses `io.Local` as its storage medium.
## Data Flow Summary
```
Application code
|
v
io.Medium (interface)
|
+-- local.Medium --> os package (with sandbox validation)
+-- s3.Medium --> AWS SDK S3 client
+-- sqlite.Medium --> modernc.org/sqlite
+-- node.Node --> in-memory map + tar serialisation
+-- datanode.Medium --> Borg DataNode + sync.RWMutex
+-- store.Medium --> store.Store (SQLite KV) --> Medium adapter
+-- MockMedium --> map[string]string (for tests)
```
Every backend normalises paths using the same `path.Clean("/" + p)` pattern, ensuring consistent behaviour regardless of which backend is in use.

216
docs/development.md Normal file
View file

@ -0,0 +1,216 @@
---
title: Development
description: How to build, test, and contribute to go-io.
---
# Development
This guide covers everything needed to work on `go-io` locally.
## Prerequisites
- **Go 1.26.0** or later
- **No C compiler required** -- all dependencies (including SQLite) are pure Go
- The module is part of the Go workspace at `~/Code/go.work`. If you are working outside that workspace, ensure `GOPRIVATE=forge.lthn.ai/*` is set so the Go toolchain can fetch private dependencies.
## Building
`go-io` is a library with no binary output. To verify it compiles:
```bash
cd /path/to/go-io
go build ./...
```
If using the Core CLI:
```bash
core go fmt # format
core go vet # static analysis
core go lint # linter
core go test # run all tests
core go qa # fmt + vet + lint + test
core go qa full # + race detector, vulnerability scan, security audit
```
## Running Tests
All packages have thorough test suites. Tests use `testify/assert` and `testify/require` for assertions.
```bash
# All tests
go test ./...
# A single package
go test ./sigil/
# A single test by name
go test ./local/ -run TestValidatePath_Security
# With race detector
go test -race ./...
# With coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
```
Or via the Core CLI:
```bash
core go test
core go test --run TestChaChaPolySigil_Good_RoundTrip
core go cov --open
```
## Test Naming Convention
Tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
| Suffix | Meaning |
|--------|---------|
| `_Good` | Happy path -- the operation succeeds as expected |
| `_Bad` | Expected error conditions -- missing files, invalid input, permission denied |
| `_Ugly` | Edge cases and boundary conditions -- nil input, empty paths, panics |
Example:
```go
func TestDelete_Good(t *testing.T) { /* deletes a file successfully */ }
func TestDelete_Bad_NotFound(t *testing.T) { /* returns error for missing file */ }
func TestDelete_Bad_DirNotEmpty(t *testing.T) { /* returns error for non-empty dir */ }
```
## Writing Tests Against Medium
Use `MockMedium` from the root package for unit tests that need a storage backend but should not touch disk:
```go
func TestMyFeature(t *testing.T) {
m := io.NewMockMedium()
m.Files["config.yaml"] = "key: value"
m.Dirs["data"] = true
// Your code under test receives m as an io.Medium
result, err := myFunction(m)
assert.NoError(t, err)
assert.Equal(t, "expected", m.Files["output.txt"])
}
```
For tests that need a real but ephemeral filesystem, use `local.New` with `t.TempDir()`:
```go
func TestWithRealFS(t *testing.T) {
m, err := local.New(t.TempDir())
require.NoError(t, err)
_ = m.Write("file.txt", "hello")
content, _ := m.Read("file.txt")
assert.Equal(t, "hello", content)
}
```
For SQLite-backed tests, use `:memory:`:
```go
func TestWithSQLite(t *testing.T) {
m, err := sqlite.New(":memory:")
require.NoError(t, err)
defer m.Close()
_ = m.Write("file.txt", "hello")
}
```
## Adding a New Backend
To add a new `Medium` implementation:
1. Create a new package directory (e.g., `sftp/`).
2. Define a struct that implements all 18 methods of `io.Medium`.
3. Add a compile-time check at the top of your file:
```go
var _ coreio.Medium = (*Medium)(nil)
```
4. Normalise paths using `path.Clean("/" + p)` to prevent traversal escapes. This is the convention followed by every existing backend.
5. Handle `nil` and empty input consistently: check how `MockMedium` and `local.Medium` behave and match that behaviour.
6. Write tests using the `_Good` / `_Bad` / `_Ugly` naming convention.
7. Add your package to the table in `docs/index.md`.
## Adding a New Sigil
To add a new data transformation:
1. Create a struct in `sigil/` that implements the `Sigil` interface (`In` and `Out`).
2. Handle `nil` input by returning `nil, nil`.
3. Handle empty input by returning `[]byte{}, nil`.
4. Register it in the `NewSigil` factory function in `sigils.go`.
5. Add tests covering `_Good` (round-trip), `_Bad` (invalid input), and `_Ugly` (nil/empty edge cases).
## Code Style
- **UK English** in comments and documentation: colour, organisation, centre, serialise, defence.
- **`declare(strict_types=1)`** equivalent: all functions have explicit parameter and return types.
- Errors use the `go-log` helper: `coreerr.E("package.Method", "what failed", underlyingErr)`.
- No blank imports except for database drivers (`_ "modernc.org/sqlite"`).
- Formatting: standard `gofmt` / `goimports`.
## Project Structure
```
go-io/
├── io.go # Medium interface, helpers, MockMedium
├── client_test.go # Tests for MockMedium and helpers
├── bench_test.go # Benchmarks
├── go.mod
├── local/
│ ├── client.go # Local filesystem backend
│ └── client_test.go
├── s3/
│ ├── s3.go # S3 backend
│ └── s3_test.go
├── sqlite/
│ ├── sqlite.go # SQLite virtual filesystem
│ └── sqlite_test.go
├── node/
│ ├── node.go # In-memory fs.FS + Medium
│ └── node_test.go
├── datanode/
│ ├── client.go # Borg DataNode Medium wrapper
│ └── client_test.go
├── store/
│ ├── store.go # KV store
│ ├── medium.go # Medium adapter for KV store
│ ├── store_test.go
│ └── medium_test.go
├── sigil/
│ ├── sigil.go # Sigil interface, Transmute/Untransmute
│ ├── sigils.go # Built-in sigils (hex, base64, gzip, hash, etc.)
│ ├── crypto_sigil.go # ChaChaPolySigil + obfuscators
│ ├── sigil_test.go
│ └── crypto_sigil_test.go
├── workspace/
│ ├── service.go # Encrypted workspace service
│ └── service_test.go
├── docs/ # This documentation
└── .core/
├── build.yaml # Build configuration
└── release.yaml # Release configuration
```
## Licence
EUPL-1.2

121
docs/index.md Normal file
View file

@ -0,0 +1,121 @@
---
title: go-io
description: Unified storage abstraction for Go with pluggable backends — local filesystem, S3, SQLite, in-memory, and key-value.
---
# go-io
`forge.lthn.ai/core/go-io` is a storage abstraction library that provides a single `Medium` interface for reading and writing files across different backends. Write your code against `Medium` once, then swap between local disk, S3, SQLite, or in-memory storage without changing a line of business logic.
The library also includes `sigil`, a composable data-transformation pipeline for encoding, compression, hashing, and authenticated encryption.
## Quick Start
```go
import (
io "forge.lthn.ai/core/go-io"
"forge.lthn.ai/core/go-io/s3"
"forge.lthn.ai/core/go-io/node"
)
// Use the pre-initialised local filesystem (unsandboxed, rooted at "/").
content, _ := io.Local.Read("/etc/hostname")
// Create a sandboxed medium restricted to a single directory.
sandbox, _ := io.NewSandboxed("/var/data/myapp")
_ = sandbox.Write("config.yaml", "key: value")
// In-memory filesystem with tar serialisation.
mem := node.New()
mem.AddData("hello.txt", []byte("world"))
tarball, _ := mem.ToTar()
// S3 backend (requires an *s3.Client from the AWS SDK).
bucket, _ := s3.New("my-bucket", s3.WithClient(awsClient), s3.WithPrefix("uploads/"))
_ = bucket.Write("photo.jpg", rawData)
```
## Package Layout
| Package | Import Path | Purpose |
|---------|-------------|---------|
| `io` (root) | `forge.lthn.ai/core/go-io` | `Medium` interface, helper functions, `MockMedium` for tests |
| `local` | `forge.lthn.ai/core/go-io/local` | Local filesystem backend with path sandboxing and symlink-escape protection |
| `s3` | `forge.lthn.ai/core/go-io/s3` | Amazon S3 / S3-compatible backend (Garage, MinIO, etc.) |
| `sqlite` | `forge.lthn.ai/core/go-io/sqlite` | SQLite-backed virtual filesystem (pure Go driver, no CGO) |
| `node` | `forge.lthn.ai/core/go-io/node` | In-memory filesystem implementing both `Medium` and `fs.FS`, with tar round-tripping |
| `datanode` | `forge.lthn.ai/core/go-io/datanode` | Thread-safe in-memory `Medium` backed by Borg's DataNode, with snapshot/restore |
| `store` | `forge.lthn.ai/core/go-io/store` | Group-namespaced key-value store (SQLite), with a `Medium` adapter and Go template rendering |
| `sigil` | `forge.lthn.ai/core/go-io/sigil` | Composable data transformations: encoding, compression, hashing, XChaCha20-Poly1305 encryption |
| `workspace` | `forge.lthn.ai/core/go-io/workspace` | Encrypted workspace service integrated with the Core DI container |
## The Medium Interface
Every storage backend implements the same 18-method interface:
```go
type Medium interface {
// Content operations
Read(path string) (string, error)
Write(path, content string) error
FileGet(path string) (string, error) // alias for Read
FileSet(path, content string) error // alias for Write
// Streaming (for large files)
ReadStream(path string) (io.ReadCloser, error)
WriteStream(path string) (io.WriteCloser, error)
Open(path string) (fs.File, error)
Create(path string) (io.WriteCloser, error)
Append(path string) (io.WriteCloser, error)
// Directory operations
EnsureDir(path string) error
List(path string) ([]fs.DirEntry, error)
// Metadata
Stat(path string) (fs.FileInfo, error)
Exists(path string) bool
IsFile(path string) bool
IsDir(path string) bool
// Mutation
Delete(path string) error
DeleteAll(path string) error
Rename(oldPath, newPath string) error
}
```
All backends implement this interface fully. Backends where a method has no natural equivalent (e.g., `EnsureDir` on S3) provide a safe no-op.
## Cross-Medium Operations
The root package provides helper functions that accept any `Medium`:
```go
// Copy a file between any two backends.
err := io.Copy(localMedium, "source.txt", s3Medium, "dest.txt")
// Read/Write wrappers that take an explicit medium.
content, err := io.Read(medium, "path")
err := io.Write(medium, "path", "content")
```
## Dependencies
| Dependency | Role |
|------------|------|
| `forge.lthn.ai/core/go-log` | Structured error helper (`E()`) |
| `forge.lthn.ai/Snider/Borg` | DataNode in-memory FS (used by `datanode` package) |
| `github.com/aws/aws-sdk-go-v2` | S3 client (used by `s3` package) |
| `golang.org/x/crypto` | BLAKE2, SHA-3, RIPEMD-160, XChaCha20-Poly1305 (used by `sigil`) |
| `modernc.org/sqlite` | Pure Go SQLite driver (used by `sqlite` and `store`) |
| `github.com/stretchr/testify` | Test assertions |
Go version: **1.26.0**
Licence: **EUPL-1.2**