diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..7145304 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,277 @@ +--- +title: Architecture +description: Internal design of go-io — the Medium interface, backend implementations, sigil transformation pipeline, and security model. +--- + +# Architecture + +This document explains how `go-io` is structured internally, how the key types relate to one another, and how data flows through the system. + + +## Design Principles + +1. **One interface, many backends.** All storage operations go through `Medium`. Business logic never imports a specific backend. +2. **Path sandboxing by default.** The local backend validates every path against symlink escapes. Sandboxed mediums cannot reach outside their root. +3. **Composable transforms.** The `sigil` package lets you chain encoding, compression, and encryption steps into reversible pipelines. +4. **No CGO.** The SQLite driver (`modernc.org/sqlite`) is pure Go. The entire module compiles without a C toolchain. + + +## Core Types + +### Medium (root package) + +The `Medium` interface is defined in `io.go`. It is the only type that consuming code needs to know about. The root package also provides: + +- **`io.Local`** — a package-level variable initialised in `init()` via `local.New("/")`. This gives unsandboxed access to the host filesystem, mirroring the behaviour of the standard `os` package. +- **`io.NewSandboxed(root)`** — creates a `local.Medium` restricted to `root`. All path resolution is confined within that directory. +- **`io.Copy(src, srcPath, dst, dstPath)`** — copies a file between any two mediums by reading from one and writing to the other. +- **`io.MockMedium`** — a fully functional in-memory implementation for unit tests. It tracks files, directories, and modification times in plain maps. + +### FileInfo and DirEntry (root package) + +Simple struct implementations of `fs.FileInfo` and `fs.DirEntry` are exported from the root package for use in mocks and tests. Each backend also defines its own unexported equivalents internally. + + +## Backend Implementations + +### local.Medium + +**File:** `local/client.go` + +The local backend wraps the standard `os` package with two layers of path protection: + +1. **`path(p string)`** — normalises paths using `filepath.Clean("/" + p)` before joining with the root. This neutralises `..` traversal at the string level. When root is `"/"`, absolute paths pass through unchanged and relative paths resolve against the working directory. + +2. **`validatePath(p string)`** — walks each path component, calling `filepath.EvalSymlinks` at every step. If any resolved component lands outside the root, it logs a security event to stderr (including timestamp, root, attempted path, and OS username) and returns `os.ErrPermission`. + +Delete operations refuse to remove `"/"` or the user's home directory as an additional safety rail. + +``` +caller -> path(p) -> validatePath(p) -> os.ReadFile / os.WriteFile / ... +``` + +### s3.Medium + +**File:** `s3/s3.go` + +The S3 backend translates `Medium` operations into AWS SDK calls. Key design decisions: + +- **Key construction:** `key(p)` uses `path.Clean("/" + p)` to sandbox traversal, then prepends the optional prefix. This means `../secret` resolves to `secret`, not an escape. +- **Directory semantics:** S3 has no real directories. `EnsureDir` is a no-op. `IsDir` and `Exists` for directory-like paths use `ListObjectsV2` with `MaxKeys: 1` to check for objects under the prefix. +- **Rename:** Implemented as copy-then-delete, since S3 has no atomic rename. +- **Append:** Downloads existing content, appends in memory, re-uploads on `Close()`. This is the only viable approach given S3's immutable-object model. +- **Testability:** The `s3API` interface (unexported) abstracts the six SDK methods used. Tests inject a `mockS3` that stores objects in a `map[string][]byte` with a `sync.RWMutex`. + +### sqlite.Medium + +**File:** `sqlite/sqlite.go` + +Stores files and directories as rows in a single SQLite table: + +```sql +CREATE TABLE IF NOT EXISTS files ( + path TEXT PRIMARY KEY, + content BLOB NOT NULL, + mode INTEGER DEFAULT 420, -- 0644 + is_dir BOOLEAN DEFAULT FALSE, + mtime DATETIME DEFAULT CURRENT_TIMESTAMP +) +``` + +- **WAL mode** is enabled at connection time for better concurrent read performance. +- **Path cleaning** uses the same `path.Clean("/" + p)` pattern as other backends. +- **Rename** is transactional: it reads the source row, inserts at the destination, deletes the source, and moves all children (if it is a directory) within a single transaction. +- **Custom tables** are supported via `WithTable("name")` to allow multiple logical filesystems in one database. +- **`:memory:`** databases work out of the box for tests. + +### node.Node + +**File:** `node/node.go` + +A pure in-memory filesystem that implements both `Medium` and the standard library's `fs.FS`, `fs.StatFS`, `fs.ReadDirFS`, and `fs.ReadFileFS` interfaces. Directories are implicit -- they exist whenever a stored file path contains a `"/"`. + +Key capabilities beyond `Medium`: + +- **`ToTar()` / `FromTar()`** — serialise the entire tree to a tar archive and back. This enables snapshotting, transport, and archival. +- **`Walk()` with `WalkOptions`** — extends `fs.WalkDir` with `MaxDepth`, `Filter`, and `SkipErrors` controls. +- **`CopyFile(src, dst, perm)`** — copies a file from the in-memory tree to the real filesystem. +- **`CopyTo(target Medium, src, dst)`** — copies a file or directory tree to any other `Medium`. +- **`ReadFile(name)`** — returns a defensive copy of file content, preventing callers from mutating internal state. + +### datanode.Medium + +**File:** `datanode/client.go` + +A thread-safe `Medium` backed by Borg's `DataNode` (an in-memory `fs.FS` with tar serialisation). It adds: + +- **`sync.RWMutex`** on every operation for concurrent safety. +- **Explicit directory tracking** via a `map[string]bool`, since `DataNode` only stores files. +- **`Snapshot()` / `Restore(data)`** — serialise and deserialise the entire filesystem as a tarball. +- **`DataNode()`** — exposes the underlying Borg DataNode for integration with TIM containers. +- **File deletion** is handled by rebuilding the DataNode without the target file, since the upstream type does not expose a `Remove` method. + + +## store Package + +**Files:** `store/store.go`, `store/medium.go` + +The store package provides two complementary APIs: + +### Store (key-value) + +A group-namespaced key-value store backed by SQLite: + +```sql +CREATE TABLE IF NOT EXISTS kv ( + grp TEXT NOT NULL, + key TEXT NOT NULL, + value TEXT NOT NULL, + PRIMARY KEY (grp, key) +) +``` + +Operations: `Get`, `Set`, `Delete`, `Count`, `DeleteGroup`, `GetAll`, `Render`. + +The `Render` method loads all key-value pairs from a group into a `map[string]string` and executes a Go `text/template` against them: + +```go +s.Set("user", "pool", "pool.lthn.io:3333") +s.Set("user", "wallet", "iz...") +out, _ := s.Render(`{"pool":"{{ .pool }}"}`, "user") +// out: {"pool":"pool.lthn.io:3333"} +``` + +### store.Medium (Medium adapter) + +Wraps a `Store` to satisfy the `Medium` interface. Paths are split as `group/key`: + +- `Read("config/theme")` calls `Get("config", "theme")` +- `List("")` returns all groups as directories +- `List("config")` returns all keys in the `config` group as files +- `IsDir("config")` returns true if the group has entries + +You can create it directly (`NewMedium(":memory:")`) or adapt an existing store (`store.AsMedium()`). + + +## sigil Package + +**Files:** `sigil/sigil.go`, `sigil/sigils.go`, `sigil/crypto_sigil.go` + +The sigil package implements composable, reversible data transformations. + +### Interface + +```go +type Sigil interface { + In(data []byte) ([]byte, error) // forward transform + Out(data []byte) ([]byte, error) // reverse transform +} +``` + +Contracts: +- Reversible sigils: `Out(In(x)) == x` +- Irreversible sigils (hashes): `Out` returns input unchanged +- Symmetric sigils (reverse): `In(x) == Out(x)` +- `nil` input returns `nil` without error +- Empty input returns empty without error + +### Available Sigils + +Created via `NewSigil(name)`: + +| Name | Type | Reversible | +|------|------|------------| +| `reverse` | Byte reversal | Yes (symmetric) | +| `hex` | Hexadecimal encoding | Yes | +| `base64` | Base64 encoding | Yes | +| `gzip` | Gzip compression | Yes | +| `json` | JSON compaction | No (`Out` is passthrough) | +| `json-indent` | JSON pretty-printing | No (`Out` is passthrough) | +| `md4`, `md5`, `sha1` | Legacy hashes | No | +| `sha224` .. `sha512` | SHA-2 family | No | +| `sha3-224` .. `sha3-512` | SHA-3 family | No | +| `sha512-224`, `sha512-256` | Truncated SHA-512 | No | +| `ripemd160` | RIPEMD-160 | No | +| `blake2s-256` | BLAKE2s | No | +| `blake2b-256` .. `blake2b-512` | BLAKE2b | No | + +### Pipeline Functions + +```go +// Apply sigils left-to-right. +encoded, _ := sigil.Transmute(data, []sigil.Sigil{gzipSigil, hexSigil}) + +// Reverse sigils right-to-left. +original, _ := sigil.Untransmute(encoded, []sigil.Sigil{gzipSigil, hexSigil}) +``` + +### Authenticated Encryption: ChaChaPolySigil + +`ChaChaPolySigil` provides XChaCha20-Poly1305 authenticated encryption with a pre-obfuscation layer. It implements the `Sigil` interface, so it composes naturally into pipelines. + +**Encryption flow:** + +``` +plaintext -> obfuscate(nonce) -> XChaCha20-Poly1305 encrypt -> [nonce || ciphertext || tag] +``` + +**Decryption flow:** + +``` +[nonce || ciphertext || tag] -> decrypt -> deobfuscate(nonce) -> plaintext +``` + +The pre-obfuscation layer ensures that raw plaintext patterns are never sent directly to CPU encryption routines, providing defence-in-depth against side-channel attacks. Two obfuscators are provided: + +- **`XORObfuscator`** (default) — XORs data with a SHA-256 counter-mode key stream derived from the nonce. +- **`ShuffleMaskObfuscator`** — applies XOR masking followed by a deterministic Fisher-Yates byte shuffle, making both value and position analysis more difficult. + +```go +key := make([]byte, 32) +rand.Read(key) + +s, _ := sigil.NewChaChaPolySigil(key) +ciphertext, _ := s.In([]byte("secret")) +plaintext, _ := s.Out(ciphertext) + +// With stronger obfuscation: +s2, _ := sigil.NewChaChaPolySigilWithObfuscator(key, &sigil.ShuffleMaskObfuscator{}) +``` + +Each call to `In` generates a fresh random nonce, so encrypting the same plaintext twice produces different ciphertexts. + + +## workspace Package + +**File:** `workspace/service.go` + +A higher-level service that integrates with the Core DI container (`forge.lthn.ai/core/go`). It manages encrypted workspaces stored under `~/.core/workspaces/`. + +Each workspace: +- Is identified by a SHA-256 hash of the user-provided identifier +- Contains subdirectories: `config/`, `log/`, `data/`, `files/`, `keys/` +- Has a PGP keypair generated via the Core crypt service +- Supports file get/set operations on the `files/` subdirectory +- Handles IPC events (`workspace.create`, `workspace.switch`) for integration with the Core message bus + +The workspace service implements `core.Workspace` and uses `io.Local` as its storage medium. + + +## Data Flow Summary + +``` +Application code + | + v + io.Medium (interface) + | + +-- local.Medium --> os package (with sandbox validation) + +-- s3.Medium --> AWS SDK S3 client + +-- sqlite.Medium --> modernc.org/sqlite + +-- node.Node --> in-memory map + tar serialisation + +-- datanode.Medium --> Borg DataNode + sync.RWMutex + +-- store.Medium --> store.Store (SQLite KV) --> Medium adapter + +-- MockMedium --> map[string]string (for tests) +``` + +Every backend normalises paths using the same `path.Clean("/" + p)` pattern, ensuring consistent behaviour regardless of which backend is in use. diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..5c63913 --- /dev/null +++ b/docs/development.md @@ -0,0 +1,216 @@ +--- +title: Development +description: How to build, test, and contribute to go-io. +--- + +# Development + +This guide covers everything needed to work on `go-io` locally. + + +## Prerequisites + +- **Go 1.26.0** or later +- **No C compiler required** -- all dependencies (including SQLite) are pure Go +- The module is part of the Go workspace at `~/Code/go.work`. If you are working outside that workspace, ensure `GOPRIVATE=forge.lthn.ai/*` is set so the Go toolchain can fetch private dependencies. + + +## Building + +`go-io` is a library with no binary output. To verify it compiles: + +```bash +cd /path/to/go-io +go build ./... +``` + +If using the Core CLI: + +```bash +core go fmt # format +core go vet # static analysis +core go lint # linter +core go test # run all tests +core go qa # fmt + vet + lint + test +core go qa full # + race detector, vulnerability scan, security audit +``` + + +## Running Tests + +All packages have thorough test suites. Tests use `testify/assert` and `testify/require` for assertions. + +```bash +# All tests +go test ./... + +# A single package +go test ./sigil/ + +# A single test by name +go test ./local/ -run TestValidatePath_Security + +# With race detector +go test -race ./... + +# With coverage +go test -coverprofile=coverage.out ./... +go tool cover -html=coverage.out +``` + +Or via the Core CLI: + +```bash +core go test +core go test --run TestChaChaPolySigil_Good_RoundTrip +core go cov --open +``` + + +## Test Naming Convention + +Tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern: + +| Suffix | Meaning | +|--------|---------| +| `_Good` | Happy path -- the operation succeeds as expected | +| `_Bad` | Expected error conditions -- missing files, invalid input, permission denied | +| `_Ugly` | Edge cases and boundary conditions -- nil input, empty paths, panics | + +Example: + +```go +func TestDelete_Good(t *testing.T) { /* deletes a file successfully */ } +func TestDelete_Bad_NotFound(t *testing.T) { /* returns error for missing file */ } +func TestDelete_Bad_DirNotEmpty(t *testing.T) { /* returns error for non-empty dir */ } +``` + + +## Writing Tests Against Medium + +Use `MockMedium` from the root package for unit tests that need a storage backend but should not touch disk: + +```go +func TestMyFeature(t *testing.T) { + m := io.NewMockMedium() + m.Files["config.yaml"] = "key: value" + m.Dirs["data"] = true + + // Your code under test receives m as an io.Medium + result, err := myFunction(m) + assert.NoError(t, err) + assert.Equal(t, "expected", m.Files["output.txt"]) +} +``` + +For tests that need a real but ephemeral filesystem, use `local.New` with `t.TempDir()`: + +```go +func TestWithRealFS(t *testing.T) { + m, err := local.New(t.TempDir()) + require.NoError(t, err) + + _ = m.Write("file.txt", "hello") + content, _ := m.Read("file.txt") + assert.Equal(t, "hello", content) +} +``` + +For SQLite-backed tests, use `:memory:`: + +```go +func TestWithSQLite(t *testing.T) { + m, err := sqlite.New(":memory:") + require.NoError(t, err) + defer m.Close() + + _ = m.Write("file.txt", "hello") +} +``` + + +## Adding a New Backend + +To add a new `Medium` implementation: + +1. Create a new package directory (e.g., `sftp/`). +2. Define a struct that implements all 18 methods of `io.Medium`. +3. Add a compile-time check at the top of your file: + +```go +var _ coreio.Medium = (*Medium)(nil) +``` + +4. Normalise paths using `path.Clean("/" + p)` to prevent traversal escapes. This is the convention followed by every existing backend. +5. Handle `nil` and empty input consistently: check how `MockMedium` and `local.Medium` behave and match that behaviour. +6. Write tests using the `_Good` / `_Bad` / `_Ugly` naming convention. +7. Add your package to the table in `docs/index.md`. + + +## Adding a New Sigil + +To add a new data transformation: + +1. Create a struct in `sigil/` that implements the `Sigil` interface (`In` and `Out`). +2. Handle `nil` input by returning `nil, nil`. +3. Handle empty input by returning `[]byte{}, nil`. +4. Register it in the `NewSigil` factory function in `sigils.go`. +5. Add tests covering `_Good` (round-trip), `_Bad` (invalid input), and `_Ugly` (nil/empty edge cases). + + +## Code Style + +- **UK English** in comments and documentation: colour, organisation, centre, serialise, defence. +- **`declare(strict_types=1)`** equivalent: all functions have explicit parameter and return types. +- Errors use the `go-log` helper: `coreerr.E("package.Method", "what failed", underlyingErr)`. +- No blank imports except for database drivers (`_ "modernc.org/sqlite"`). +- Formatting: standard `gofmt` / `goimports`. + + +## Project Structure + +``` +go-io/ +├── io.go # Medium interface, helpers, MockMedium +├── client_test.go # Tests for MockMedium and helpers +├── bench_test.go # Benchmarks +├── go.mod +├── local/ +│ ├── client.go # Local filesystem backend +│ └── client_test.go +├── s3/ +│ ├── s3.go # S3 backend +│ └── s3_test.go +├── sqlite/ +│ ├── sqlite.go # SQLite virtual filesystem +│ └── sqlite_test.go +├── node/ +│ ├── node.go # In-memory fs.FS + Medium +│ └── node_test.go +├── datanode/ +│ ├── client.go # Borg DataNode Medium wrapper +│ └── client_test.go +├── store/ +│ ├── store.go # KV store +│ ├── medium.go # Medium adapter for KV store +│ ├── store_test.go +│ └── medium_test.go +├── sigil/ +│ ├── sigil.go # Sigil interface, Transmute/Untransmute +│ ├── sigils.go # Built-in sigils (hex, base64, gzip, hash, etc.) +│ ├── crypto_sigil.go # ChaChaPolySigil + obfuscators +│ ├── sigil_test.go +│ └── crypto_sigil_test.go +├── workspace/ +│ ├── service.go # Encrypted workspace service +│ └── service_test.go +├── docs/ # This documentation +└── .core/ + ├── build.yaml # Build configuration + └── release.yaml # Release configuration +``` + + +## Licence + +EUPL-1.2 diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..bd33262 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,121 @@ +--- +title: go-io +description: Unified storage abstraction for Go with pluggable backends — local filesystem, S3, SQLite, in-memory, and key-value. +--- + +# go-io + +`forge.lthn.ai/core/go-io` is a storage abstraction library that provides a single `Medium` interface for reading and writing files across different backends. Write your code against `Medium` once, then swap between local disk, S3, SQLite, or in-memory storage without changing a line of business logic. + +The library also includes `sigil`, a composable data-transformation pipeline for encoding, compression, hashing, and authenticated encryption. + + +## Quick Start + +```go +import ( + io "forge.lthn.ai/core/go-io" + "forge.lthn.ai/core/go-io/s3" + "forge.lthn.ai/core/go-io/node" +) + +// Use the pre-initialised local filesystem (unsandboxed, rooted at "/"). +content, _ := io.Local.Read("/etc/hostname") + +// Create a sandboxed medium restricted to a single directory. +sandbox, _ := io.NewSandboxed("/var/data/myapp") +_ = sandbox.Write("config.yaml", "key: value") + +// In-memory filesystem with tar serialisation. +mem := node.New() +mem.AddData("hello.txt", []byte("world")) +tarball, _ := mem.ToTar() + +// S3 backend (requires an *s3.Client from the AWS SDK). +bucket, _ := s3.New("my-bucket", s3.WithClient(awsClient), s3.WithPrefix("uploads/")) +_ = bucket.Write("photo.jpg", rawData) +``` + + +## Package Layout + +| Package | Import Path | Purpose | +|---------|-------------|---------| +| `io` (root) | `forge.lthn.ai/core/go-io` | `Medium` interface, helper functions, `MockMedium` for tests | +| `local` | `forge.lthn.ai/core/go-io/local` | Local filesystem backend with path sandboxing and symlink-escape protection | +| `s3` | `forge.lthn.ai/core/go-io/s3` | Amazon S3 / S3-compatible backend (Garage, MinIO, etc.) | +| `sqlite` | `forge.lthn.ai/core/go-io/sqlite` | SQLite-backed virtual filesystem (pure Go driver, no CGO) | +| `node` | `forge.lthn.ai/core/go-io/node` | In-memory filesystem implementing both `Medium` and `fs.FS`, with tar round-tripping | +| `datanode` | `forge.lthn.ai/core/go-io/datanode` | Thread-safe in-memory `Medium` backed by Borg's DataNode, with snapshot/restore | +| `store` | `forge.lthn.ai/core/go-io/store` | Group-namespaced key-value store (SQLite), with a `Medium` adapter and Go template rendering | +| `sigil` | `forge.lthn.ai/core/go-io/sigil` | Composable data transformations: encoding, compression, hashing, XChaCha20-Poly1305 encryption | +| `workspace` | `forge.lthn.ai/core/go-io/workspace` | Encrypted workspace service integrated with the Core DI container | + + +## The Medium Interface + +Every storage backend implements the same 18-method interface: + +```go +type Medium interface { + // Content operations + Read(path string) (string, error) + Write(path, content string) error + FileGet(path string) (string, error) // alias for Read + FileSet(path, content string) error // alias for Write + + // Streaming (for large files) + ReadStream(path string) (io.ReadCloser, error) + WriteStream(path string) (io.WriteCloser, error) + Open(path string) (fs.File, error) + Create(path string) (io.WriteCloser, error) + Append(path string) (io.WriteCloser, error) + + // Directory operations + EnsureDir(path string) error + List(path string) ([]fs.DirEntry, error) + + // Metadata + Stat(path string) (fs.FileInfo, error) + Exists(path string) bool + IsFile(path string) bool + IsDir(path string) bool + + // Mutation + Delete(path string) error + DeleteAll(path string) error + Rename(oldPath, newPath string) error +} +``` + +All backends implement this interface fully. Backends where a method has no natural equivalent (e.g., `EnsureDir` on S3) provide a safe no-op. + + +## Cross-Medium Operations + +The root package provides helper functions that accept any `Medium`: + +```go +// Copy a file between any two backends. +err := io.Copy(localMedium, "source.txt", s3Medium, "dest.txt") + +// Read/Write wrappers that take an explicit medium. +content, err := io.Read(medium, "path") +err := io.Write(medium, "path", "content") +``` + + +## Dependencies + +| Dependency | Role | +|------------|------| +| `forge.lthn.ai/core/go-log` | Structured error helper (`E()`) | +| `forge.lthn.ai/Snider/Borg` | DataNode in-memory FS (used by `datanode` package) | +| `github.com/aws/aws-sdk-go-v2` | S3 client (used by `s3` package) | +| `golang.org/x/crypto` | BLAKE2, SHA-3, RIPEMD-160, XChaCha20-Poly1305 (used by `sigil`) | +| `modernc.org/sqlite` | Pure Go SQLite driver (used by `sqlite` and `store`) | +| `github.com/stretchr/testify` | Test assertions | + +Go version: **1.26.0** + +Licence: **EUPL-1.2**