go-io/docs/architecture.md
Virgil c713bafd48
Some checks failed
CI / test (push) Failing after 3s
CI / auto-fix (push) Failing after 0s
CI / auto-merge (push) Failing after 0s
refactor(ax): align remaining AX examples and names
Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-31 14:27:58 +00:00

275 lines
11 KiB
Markdown

---
title: Architecture
description: Internal design of go-io — the Medium interface, backend implementations, sigil transformation pipeline, and security model.
---
# Architecture
This document explains how `go-io` is structured internally, how the key types relate to one another, and how data flows through the system.
## Design Principles
1. **One interface, many backends.** All storage operations go through `Medium`. Business logic never imports a specific backend.
2. **Path sandboxing by default.** The local backend validates every path against symlink escapes. Sandboxed mediums cannot reach outside their root.
3. **Composable transforms.** The `sigil` package lets you chain encoding, compression, and encryption steps into reversible pipelines.
4. **No CGO.** The SQLite driver (`modernc.org/sqlite`) is pure Go. The entire module compiles without a C toolchain.
## Core Types
### Medium (root package)
The `Medium` interface is defined in `io.go`. It is the only type that consuming code needs to know about. The root package also provides:
- **`io.Local`** — a package-level variable initialised in `init()` via `local.New("/")`. This gives unsandboxed access to the host filesystem, mirroring the behaviour of the standard `os` package.
- **`io.NewSandboxed(root)`** — creates a `local.Medium` restricted to `root`. All path resolution is confined within that directory.
- **`io.Copy(src, srcPath, dst, dstPath)`** — copies a file between any two mediums by reading from one and writing to the other.
- **`io.NewMemoryMedium()`** — a fully functional in-memory implementation for unit tests. It tracks files, directories, and modification times in plain maps.
### FileInfo and DirEntry (root package)
Simple struct implementations of `fs.FileInfo` and `fs.DirEntry` are exported from the root package for use in mocks and tests. Each backend also defines its own unexported equivalents internally.
## Backend Implementations
### local.Medium
**File:** `local/medium.go`
The local backend wraps the standard `os` package with two layers of path protection:
1. **`path(p string)`** — normalises paths using `filepath.Clean("/" + p)` before joining with the root. This neutralises `..` traversal at the string level. When root is `"/"`, absolute paths pass through unchanged and relative paths resolve against the working directory.
2. **`validatePath(p string)`** — walks each path component, calling `filepath.EvalSymlinks` at every step. If any resolved component lands outside the root, it logs a security event to stderr (including timestamp, root, attempted path, and OS username) and returns `os.ErrPermission`.
Delete operations refuse to remove `"/"` or the user's home directory as an additional safety rail.
```
caller -> path(p) -> validatePath(p) -> os.ReadFile / os.WriteFile / ...
```
### s3.Medium
**File:** `s3/s3.go`
The S3 backend translates `Medium` operations into AWS SDK calls. Key design decisions:
- **Key construction:** `key(p)` uses `path.Clean("/" + p)` to sandbox traversal, then prepends the optional prefix. This means `../secret` resolves to `secret`, not an escape.
- **Directory semantics:** S3 has no real directories. `EnsureDir` is a no-op. `IsDir` and `Exists` for directory-like paths use `ListObjectsV2` with `MaxKeys: 1` to check for objects under the prefix.
- **Rename:** Implemented as copy-then-delete, since S3 has no atomic rename.
- **Append:** Downloads existing content, appends in memory, re-uploads on `Close()`. This is the only viable approach given S3's immutable-object model.
- **Testability:** The `Client` interface abstracts the six SDK methods used. Tests inject a `mockS3` that stores objects in a `map[string][]byte` with a `sync.RWMutex`.
### sqlite.Medium
**File:** `sqlite/sqlite.go`
Stores files and directories as rows in a single SQLite table:
```sql
CREATE TABLE IF NOT EXISTS files (
path TEXT PRIMARY KEY,
content BLOB NOT NULL,
mode INTEGER DEFAULT 420, -- 0644
is_dir BOOLEAN DEFAULT FALSE,
mtime DATETIME DEFAULT CURRENT_TIMESTAMP
)
```
- **WAL mode** is enabled at connection time for better concurrent read performance.
- **Path cleaning** uses the same `path.Clean("/" + p)` pattern as other backends.
- **Rename** is transactional: it reads the source row, inserts at the destination, deletes the source, and moves all children (if it is a directory) within a single transaction.
- **Custom tables** are supported via `sqlite.Options{Path: ":memory:", Table: "name"}` to allow multiple logical filesystems in one database.
- **`:memory:`** databases work out of the box for tests.
### node.Node
**File:** `node/node.go`
A pure in-memory filesystem that implements both `Medium` and the standard library's `fs.FS`, `fs.StatFS`, `fs.ReadDirFS`, and `fs.ReadFileFS` interfaces. Directories are implicit -- they exist whenever a stored file path contains a `"/"`.
Key capabilities beyond `Medium`:
- **`ToTar()` / `FromTar()`** — serialise the entire tree to a tar archive and back. This enables snapshotting, transport, and archival.
- **`Walk()` with `WalkOptions`** — extends `fs.WalkDir` with `MaxDepth`, `Filter`, and `SkipErrors` controls.
- **`CopyFile(src, dst, perm)`** — copies a file from the in-memory tree to the real filesystem.
- **`CopyTo(target Medium, src, dst)`** — copies a file or directory tree to any other `Medium`.
- **`ReadFile(name)`** — returns a defensive copy of file content, preventing callers from mutating internal state.
### datanode.Medium
**File:** `datanode/medium.go`
A thread-safe `Medium` backed by Borg's `DataNode` (an in-memory `fs.FS` with tar serialisation). It adds:
- **`sync.RWMutex`** on every operation for concurrent safety.
- **Explicit directory tracking** via a `map[string]bool`, since `DataNode` only stores files.
- **`Snapshot()` / `Restore(data)`** — serialise and deserialise the entire filesystem as a tarball.
- **`DataNode()`** — exposes the underlying Borg DataNode for integration with TIM containers.
- **File deletion** is handled by rebuilding the DataNode without the target file, since the upstream type does not expose a `Remove` method.
## store Package
**Files:** `store/store.go`, `store/medium.go`
The store package provides two complementary APIs:
### KeyValueStore (key-value)
A group-namespaced key-value store backed by SQLite:
```sql
CREATE TABLE IF NOT EXISTS kv (
grp TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (grp, key)
)
```
Operations: `Get`, `Set`, `Delete`, `Count`, `DeleteGroup`, `GetAll`, `Render`.
The `Render` method loads all key-value pairs from a group into a `map[string]string` and executes a Go `text/template` against them:
```go
keyValueStore, _ := store.New(store.Options{Path: ":memory:"})
keyValueStore.Set("user", "pool", "pool.lthn.io:3333")
keyValueStore.Set("user", "wallet", "iz...")
renderedText, _ := keyValueStore.Render(`{"pool":"{{ .pool }}"}`, "user")
assert.Equal(t, `{"pool":"pool.lthn.io:3333"}`, renderedText)
```
### store.Medium (Medium adapter)
Wraps a `KeyValueStore` to satisfy the `Medium` interface. Paths are split as `group/key`:
- `Read("config/theme")` calls `Get("config", "theme")`
- `List("")` returns all groups as directories
- `List("config")` returns all keys in the `config` group as files
- `IsDir("config")` returns true if the group has entries
You can create it directly (`store.NewMedium(store.Options{Path: ":memory:"})`) or adapt an existing store (`keyValueStore.AsMedium()`).
## sigil Package
**Files:** `sigil/sigil.go`, `sigil/sigils.go`, `sigil/crypto_sigil.go`
The sigil package implements composable, reversible data transformations.
### Interface
```go
type Sigil interface {
In(data []byte) ([]byte, error)
Out(data []byte) ([]byte, error)
}
```
Contracts:
- Reversible sigils: `Out(In(x)) == x`
- Irreversible sigils (hashes): `Out` returns input unchanged
- Symmetric sigils (reverse): `In(x) == Out(x)`
- `nil` input returns `nil` without error
- Empty input returns empty without error
### Available Sigils
Created via `NewSigil(name)`:
| Name | Type | Reversible |
|------|------|------------|
| `reverse` | Byte reversal | Yes (symmetric) |
| `hex` | Hexadecimal encoding | Yes |
| `base64` | Base64 encoding | Yes |
| `gzip` | Gzip compression | Yes |
| `json` | JSON compaction | No (`Out` is passthrough) |
| `json-indent` | JSON pretty-printing | No (`Out` is passthrough) |
| `md4`, `md5`, `sha1` | Legacy hashes | No |
| `sha224` .. `sha512` | SHA-2 family | No |
| `sha3-224` .. `sha3-512` | SHA-3 family | No |
| `sha512-224`, `sha512-256` | Truncated SHA-512 | No |
| `ripemd160` | RIPEMD-160 | No |
| `blake2s-256` | BLAKE2s | No |
| `blake2b-256` .. `blake2b-512` | BLAKE2b | No |
### Pipeline Functions
```go
encoded, _ := sigil.Transmute(data, []sigil.Sigil{gzipSigil, hexSigil})
original, _ := sigil.Untransmute(encoded, []sigil.Sigil{gzipSigil, hexSigil})
```
### Authenticated Encryption: ChaChaPolySigil
`ChaChaPolySigil` provides XChaCha20-Poly1305 authenticated encryption with a pre-obfuscation layer. It implements the `Sigil` interface, so it composes naturally into pipelines.
**Encryption flow:**
```
plaintext -> obfuscate(nonce) -> XChaCha20-Poly1305 encrypt -> [nonce || ciphertext || tag]
```
**Decryption flow:**
```
[nonce || ciphertext || tag] -> decrypt -> deobfuscate(nonce) -> plaintext
```
The pre-obfuscation layer ensures that raw plaintext patterns are never sent directly to CPU encryption routines, providing defence-in-depth against side-channel attacks. Two obfuscators are provided:
- **`XORObfuscator`** (default) — XORs data with a SHA-256 counter-mode key stream derived from the nonce.
- **`ShuffleMaskObfuscator`** — applies XOR masking followed by a deterministic Fisher-Yates byte shuffle, making both value and position analysis more difficult.
```go
key := make([]byte, 32)
rand.Read(key)
cipherSigil, _ := sigil.NewChaChaPolySigil(key, nil)
ciphertext, _ := cipherSigil.In([]byte("secret"))
plaintext, _ := cipherSigil.Out(ciphertext)
shuffleCipherSigil, _ := sigil.NewChaChaPolySigil(key, &sigil.ShuffleMaskObfuscator{})
```
Each call to `In` generates a fresh random nonce, so encrypting the same plaintext twice produces different ciphertexts.
## workspace Package
**File:** `workspace/service.go`
A higher-level service that integrates with the Core DI container (`forge.lthn.ai/core/go`). It manages encrypted workspaces stored under `~/.core/workspaces/`.
Each workspace:
- Is identified by a SHA-256 hash of the user-provided identifier
- Contains subdirectories: `config/`, `log/`, `data/`, `files/`, `keys/`
- Has a PGP keypair generated via the Core crypt service
- Supports file get/set operations on the `files/` subdirectory
- Handles IPC events (`workspace.create`, `workspace.switch`) for integration with the Core message bus
The workspace service implements `core.Workspace` and uses `io.Local` as its storage medium.
## Data Flow Summary
```
Application code
|
v
io.Medium (interface)
|
+-- local.Medium --> os package (with sandbox validation)
+-- s3.Medium --> AWS SDK S3 client
+-- sqlite.Medium --> modernc.org/sqlite
+-- node.Node --> in-memory map + tar serialisation
+-- datanode.Medium --> Borg DataNode + sync.RWMutex
+-- store.Medium --> store.KeyValueStore (SQLite KV) --> Medium adapter
+-- MemoryMedium --> map[string]string (for tests)
```
Every backend normalises paths using the same `path.Clean("/" + p)` pattern, ensuring consistent behaviour regardless of which backend is in use.