diff --git a/CLAUDE.md b/CLAUDE.md index 91ae743..928c671 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -9,56 +9,13 @@ ```bash go test ./... # Run all tests go test -run TestName ./... # Single test +go test -race ./... # Race detector (required before any PR) +go test -short ./... # Skip integration tests go test -cover ./node # Coverage for node package go test -bench . ./... # Benchmarks go vet ./... # Static analysis ``` -## Architecture - -Three packages: - -### node/ — P2P Mesh -- **identity.go**: Ed25519 keypair, PEM serialisation, X25519 ECDH, challenge-response auth -- **transport.go**: Encrypted WebSocket (gorilla/websocket + Borg SMSG), handshake, keepalive, dedup, rate limiting -- **peer.go**: Registry with KD-tree scoring (Poindexter), persistence, auth modes (open/allowlist) -- **message.go**: 15 typed protocol messages (handshake, ping, stats, miner, deploy, logs, error) -- **protocol.go**: Response handler with validation and typed parsing -- **worker.go**: Command handlers (ping, stats, miner start/stop, deploy, logs) -- **controller.go**: Remote node operations (connect, command, disconnect) -- **dispatcher.go**: UEPS packet routing skeleton (STUB — needs implementation) -- **bundle.go**: TIM encryption, tarball extraction with Zip Slip defence - -### ueps/ — Wire Protocol (RFC-021) -- **packet.go**: PacketBuilder with TLV encoding and HMAC-SHA256 signing -- **reader.go**: Stream parser with integrity verification -- TLV tags: 0x01-0x05 (header), 0x06 (HMAC), 0xFF (payload marker) -- Header: Version (0x09), CurrentLayer, TargetLayer, IntentID, ThreatScore - -### logging/ — Structured Logger -- Levelled (DEBUG/INFO/WARN/ERROR) with key-value pairs and component scoping - -## Dependencies - -- `github.com/Snider/Borg` — STMF crypto, SMSG encryption, TIM -- `github.com/Snider/Poindexter` — KD-tree for peer selection -- `github.com/Snider/Enchantrix` — Secure environment (via Borg) -- `github.com/gorilla/websocket` — WebSocket transport -- `github.com/google/uuid` — Peer/message IDs -- Lethean codenames: Borg (Secure/Blob), Poindexter (Secure/Pointer), Enchantrix (Secure/Environment) - -## Coding Standards - -- UK English (colour, organisation, centre) -- All types annotated -- Tests use `testify` assert/require -- Licence: EUPL-1.2 -- Security-first: HMAC on all wire traffic, challenge-response auth, Zip Slip defence, rate limiting - -## Test Conventions - -Use table-driven subtests with `t.Run()`. - ## Key Interfaces ```go @@ -75,3 +32,26 @@ type ProfileManager interface { ApplyProfile(name string, data []byte) error } ``` + +## Coding Standards + +- UK English (colour, organisation, centre) +- All parameters and return types explicitly annotated +- Tests use `testify` assert/require; table-driven subtests with `t.Run()` +- Licence: EUPL-1.2 +- Security-first: do not weaken HMAC, challenge-response, Zip Slip defence, or rate limiting +- Use `logging` package only — no `fmt.Println` or `log.Printf` in library code + +## Commit Format + +``` +type(scope): description + +Co-Authored-By: Virgil +``` + +## Documentation + +- `docs/architecture.md` — full package and component reference +- `docs/development.md` — build, test, benchmark, standards guide +- `docs/history.md` — completed phases, known limitations, bugs fixed diff --git a/FINDINGS.md b/FINDINGS.md deleted file mode 100644 index cdf71f0..0000000 --- a/FINDINGS.md +++ /dev/null @@ -1,66 +0,0 @@ -# Findings - -## Code Quality - -- **100 tests in node/, all pass** — 71.9% statement coverage (dispatcher adds 10 test funcs / 17 subtests) -- **logging/ fully tested** (12 tests, 100% coverage) -- **UEPS 88.5% coverage** — wire protocol tests added in Phase 1 -- **`go vet` clean** — no static analysis warnings -- **`go test -race` clean** — no data races (GracefulClose race fixed, see below) -- **Zero TODOs/FIXMEs** in codebase - -## Security Posture (Strong) - -- X25519 ECDH key exchange with Borg STMF -- Challenge-response authentication (HMAC-SHA256) -- TLS 1.2+ with hardened cipher suites -- Message deduplication (5-min TTL, prevents amplification) -- Per-peer rate limiting (100 burst, 50 msg/sec) -- Tarball extraction: Zip Slip defence, 100 MB per-file limit, symlink/hardlink rejection -- Peer auth modes: open or public-key allowlist -- UEPS threat circuit breaker: packets with ThreatScore > 50,000 dropped before intent routing - -## Architecture Strengths - -- Clean separation: identity / transport / peers / protocol / worker / controller -- KD-tree peer selection via Poindexter: [PingMS × 1.0, Hops × 0.7, GeoKM × 0.2, (100-Score) × 1.2] -- Debounced persistence (5s coalesce window for peer registry) -- Buffer pool for JSON encoding (reduces GC pressure) -- Decoupled MinerManager/ProfileManager interfaces -- UEPS dispatcher: functional IntentHandler type, RWMutex-protected handler map, sentinel errors - -## Critical Test Gaps - -| File | Lines | Tests | Coverage | -|------|-------|-------|----------| -| identity.go | 290 | 5 tests | Good | -| peer.go | 708 | 19 tests | Good | -| message.go | 237 | 8 tests | Good | -| worker.go | 402 | 10 tests | Good | -| bundle.go | 355 | 9 tests | Good | -| protocol.go | 88 | 5 tests | Good | -| transport.go | 934 | 11 tests | Good (Phase 2) | -| controller.go | 327 | 14 tests | Good (Phase 3) | -| dispatcher.go | 120 | 10 tests (17 subtests) | 100% (Phase 4) | -| ueps/packet.go | 124 | 9 tests | Good (Phase 1) | -| ueps/reader.go | 138 | 9 tests | Good (Phase 1) | - -## Known Issues - -1. ~~**dispatcher.go is a stub**~~ — Fully implemented (Phase 4). Threat circuit breaker and intent routing operational. -2. **UEPS 0xFF payload length ambiguous** — Relies on external TCP framing, not self-delimiting. Comments note this but no solution implemented. -3. **~~Potential race in controller.go~~** — ~~`transport.OnMessage(c.handleResponse)` called during init~~ — Not a real issue. The pending map is initialised in `NewController` before `OnMessage` is called, and `handleResponse` uses a mutex. No panic possible. -4. **No resource cleanup on some error paths** — transport.handleWSUpgrade doesn't clean up on handshake timeout; transport.Connect doesn't clean up temp connection on error. -5. ~~**Threat score semantics undefined**~~ — ThreatScoreThreshold (50,000) defined in dispatcher. Packets above threshold dropped and logged. Intent routing implemented for 0x01/0x20/0x30/0xFF. - -## Phase 4 Design Decisions - -1. **IntentHandler as func type, not interface** — Matches the codebase's `MessageHandler` pattern in transport.go. Lighter weight than an interface for a single-method contract. -2. **Sentinel errors over silent drops** — The stub comments suggested silent drops, but returning typed errors (`ErrThreatScoreExceeded`, `ErrUnknownIntent`, `ErrNilPacket`) gives callers the option to inspect outcomes. The dispatcher still logs at WARN level regardless. -3. **Threat check before intent routing** — A high-threat packet with an unknown intent returns `ErrThreatScoreExceeded`, not `ErrUnknownIntent`. The circuit breaker is the first line of defence; no packet metadata is inspected beyond ThreatScore before the drop. -4. **Threshold at 50,000 (not configurable)** — Kept as a constant to match the original stub. Can be made configurable via functional options if needed later. -5. **RWMutex for handler map** — Read-heavy workload (dispatches far outnumber registrations), so RWMutex is appropriate. Registration takes a write lock, dispatch takes a read lock. - -## Bugs Fixed - -1. **P2P-RACE-1: GracefulClose data race** (Phase 3) — `GracefulClose` called `pc.Conn.SetWriteDeadline()` outside of `writeMu`, racing with concurrent `Send()` calls that also modify the write deadline. Fixed by removing the bare `SetWriteDeadline` call and relying on `Send()` which already manages deadlines under the lock. Detected by `go test -race`. diff --git a/TODO.md b/TODO.md deleted file mode 100644 index 736abeb..0000000 --- a/TODO.md +++ /dev/null @@ -1,93 +0,0 @@ -# TODO.md — go-p2p Task Queue - -Dispatched from core/go orchestration. Pick up tasks in phase order. - ---- - -## Phase 1: UEPS Wire Protocol Tests — COMPLETE (88.5% coverage) - -All crypto wire protocol tests implemented. Commit `2bc53ba`. - -- [x] **PacketBuilder round-trip** — Basic, binary, threat score, large payload variants -- [x] **HMAC verification** — Payload tampering + header tampering both caught -- [x] **Wrong shared secret** — HMAC mismatch detected -- [x] **Empty payload** — Nil and empty slice both produce valid packets -- [x] **Max ThreatScore boundary** — uint16 max round-trips correctly -- [x] **Missing HMAC tag** — Error returned -- [x] **TLV value too large** — writeTLV error for >255 bytes -- [x] **Truncated packet** — EOF mid-TLV detected at multiple cut points -- [x] **Unknown TLV tag** — Reader skips unknown tags, included in signature - -## Phase 2: Transport Tests — COMPLETE (node/ 42% → 63.5%) - -All transport layer tests implemented with real WebSocket connections. Commit `3ee5553`. - -- [x] **Test pair setup helper** — Reusable helper for identities + registries + transports -- [x] **Full handshake** — Challenge-response completes, shared secret derived -- [x] **Handshake rejection: wrong protocol version** — Rejection before disconnect -- [x] **Handshake rejection: allowlist** — "not authorized" rejection -- [x] **Encrypted message round-trip** — SMSG encrypt/decrypt verified -- [x] **Message deduplication** — Duplicate ID dropped silently -- [x] **Rate limiting** — Burst >100 messages, drops after token bucket empties -- [x] **MaxConns enforcement** — 503 rejection on overflow -- [x] **Keepalive timeout** — Connection cleaned up after PingInterval+PongTimeout -- [x] **Graceful close** — MsgDisconnect sent before close -- [x] **Concurrent sends** — No races (writeMu protects) - -## Phase 3: Controller Tests — COMPLETE (node/ 63.5% → 72.1%) - -All controller tests implemented with real WebSocket transport pairs. 14 tests total. Commit `33eda7b`. -Also fixed pre-existing data race in GracefulClose (P2P-RACE-1). - -- [x] **Request-response correlation** — Send request, worker replies with ReplyTo set, controller matches correctly. -- [x] **Request timeout** — No response within deadline, returns timeout error. -- [x] **Auto-connect** — Peer not connected, controller auto-connects via transport before sending. -- [x] **GetAllStats** — Multiple connected peers, verify parallel stat collection completes. -- [x] **PingPeer RTT** — Send ping, receive pong, RTT calculated and peer metrics updated. -- [x] **Concurrent requests** — Multiple requests in flight to different peers, correct correlation. -- [x] **Dead peer cleanup** — Response channel cleaned up after timeout (no goroutine/memory leak). - -## Phase 4: Dispatcher Implementation — COMPLETE (dispatcher 100% coverage) - -UEPS packet dispatcher with threat circuit breaker and intent routing. Commit `a60dfdf`. - -- [x] **Uncomment and implement DispatchUEPS** — Dispatcher struct with RegisterHandler/Dispatch, IntentHandler func type, sentinel errors. -- [x] **Threat circuit breaker** — Drop packets with ThreatScore > 50000. Logged at WARN level with threat_score, threshold, intent_id, version fields. -- [x] **Intent router** — Route by IntentID: 0x01 handshake, 0x20 compute, 0x30 rehab, 0xFF custom. Unknown intents logged and dropped. -- [x] **Dispatcher tests** — 10 test functions, 17 subtests: register/dispatch, threat boundary (at/above/max/zero), unknown intent, multi-handler routing, nil/empty payload, concurrent dispatch, concurrent register+dispatch, handler replacement, threat-before-routing ordering, intent constant verification. - -## Phase 5: Integration & Benchmarks — COMPLETE - -All integration tests, benchmarks, and bufpool tests implemented. Race-free under `-race`. - -- [x] **Full integration test** — Two nodes on localhost: identity creation, handshake, encrypted message exchange, controller ping/pong, UEPS packet routing via dispatcher, threat circuit breaker, graceful shutdown with disconnect message. 3 integration test functions. -- [x] **Benchmarks** — 13 benchmark functions across node/ and ueps/: identity keygen (217us), shared secret derivation (53us), message serialise (4us), SMSG encrypt+decrypt (4.7us), challenge sign+verify (505ns), peer scoring (KD-tree select 349ns, rebuild 2.5us), UEPS marshal (621ns), UEPS read+verify (1us), bufpool get/put (8ns zero-alloc), challenge generation (211ns). -- [x] **bufpool.go tests** — 9 test functions: get/put round-trip, buffer reuse verification, large buffer eviction (>64KB not pooled), concurrent get/put (100 goroutines x 50 iterations), buffer independence, MarshalJSON correctness (7 types), independent copy verification, HTML escaping disabled, concurrent MarshalJSON. - ---- - -## Known Issues - -1. **UEPS 0xFF payload has no length prefix** — Relies on external TCP framing (io.ReadAll reads to EOF). Not self-delimiting. -2. **Potential race in controller.go** — `transport.OnMessage(c.handleResponse)` called during init; message arriving before pending map is ready could theoretically panic. -3. **Resource cleanup gaps** — transport.handleWSUpgrade doesn't clean up on handshake timeout; transport.Connect doesn't clean up temp connection on error. -4. ~~**Threat score semantics undefined**~~ — Dispatcher now defines ThreatScoreThreshold (50,000) and drops packets exceeding it. Routing by IntentID implemented. - -## Wiki Inconsistencies Found (Charon, 19 Feb 2026) - -Fixed in wiki update: -- ~~Node-Identity page says PublicKey is "hex-encoded"~~ — Code says base64 (identity.go:63) -- ~~Protocol-Messages page uses `Sender` field~~ — Code uses `From`/`To` (message.go:66-67) -- ~~Peer-Discovery page says Score is 0.0–1.0~~ — Code uses float64 range 0-100 (peer.go:31) - -## Platform - -- **OS**: Ubuntu (linux/amd64) — snider-linux -- **Co-located with**: go-rocm, go-rag - -## Workflow - -1. Charon dispatches tasks here after review -2. Pick up tasks in phase order -3. Mark `[x]` when done, note commit hash -4. New discoveries → add notes, flag in FINDINGS.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..2915608 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,266 @@ +# Architecture — go-p2p + +`go-p2p` is the P2P networking layer for the Lethean network. Module path: `forge.lthn.ai/core/go-p2p`. + +## Package Structure + +Three packages compose the library: + +``` +go-p2p/ +├── node/ — P2P mesh: identity, transport, peers, protocol, workers, controller, dispatcher +├── ueps/ — UEPS wire protocol (RFC-021): packet builder and stream reader +└── logging/ — Structured levelled logger with component scoping +``` + +## node/ — P2P Mesh + +### identity.go — Node Identity + +Each node holds an Ed25519 keypair generated via Borg STMF (X25519 curve). The private key is stored at `~/.local/share/lethean-desktop/node/private.key` (mode 0600) and the public identity JSON at `~/.config/lethean-desktop/node.json`. + +`NodeIdentity` carries: +- `ID` — 32-character hex string derived from SHA-256 of the public key (first 16 bytes) +- `PublicKey` — base64-encoded X25519 public key +- `Role` — `controller`, `worker`, or `dual` + +Shared secrets are derived via X25519 ECDH and then hashed with SHA-256, producing a 32-byte symmetric key used for all subsequent SMSG encryption on that connection. + +Challenge-response authentication uses HMAC-SHA256 over a 32-byte random challenge. The challenger generates the nonce, the responder signs it with the shared secret, and the challenger verifies with `hmac.Equal` to prevent timing attacks. + +### transport.go — Encrypted WebSocket Transport + +The `Transport` manages a WebSocket server (gorilla/websocket) and outbound connections. All post-handshake messages are encrypted with Borg SMSG using the per-connection shared secret. + +**Configuration** (`TransportConfig`): + +| Field | Default | Purpose | +|-------|---------|---------| +| `ListenAddr` | `:9091` | HTTP bind address | +| `WSPath` | `/ws` | WebSocket endpoint | +| `MaxConns` | 100 | Maximum concurrent connections | +| `MaxMessageSize` | 1 MB | Read limit per message | +| `PingInterval` | 30 s | Keepalive ping period | +| `PongTimeout` | 10 s | Maximum time to wait for pong | + +**TLS hardening**: When `TLSCertPath` and `TLSKeyPath` are set the server enforces TLS 1.2 minimum with a curated cipher suite (AES-128-GCM, AES-256-GCM, ChaCha20-Poly1305) and curve preferences (X25519, P-256). + +**Connection lifecycle**: + +1. Client dials WebSocket, sends unencrypted `MsgHandshake` containing its `NodeIdentity`, a 32-byte random challenge, and the protocol version string. +2. Server checks `IsProtocolVersionSupported`, derives the shared secret from the client's public key, checks `IsPeerAllowed` (open or allowlist mode), and replies with `MsgHandshakeAck` containing its own identity and an HMAC-SHA256 signature of the challenge. +3. Client verifies the challenge response, stores the shared secret, and transitions to encrypted mode. +4. Subsequent messages are SMSG-encrypted binary WebSocket frames. + +**Deduplication**: A `MessageDeduplicator` with a 5-minute TTL tracks seen message UUIDs. Duplicate messages arriving within the window are dropped silently, preventing amplification attacks. + +**Rate limiting**: Each `PeerConnection` holds a `PeerRateLimiter` (token bucket: 100 burst, 50 tokens/second refill). Messages from rate-limited peers are dropped in the read loop. + +**MaxConns enforcement**: The handler tracks `pendingConns` (atomic counter) during the handshake phase in addition to established connections, preventing races where a surge of simultaneous inbounds could exceed the limit. + +**Keepalive**: A goroutine per connection ticks at `PingInterval`. If `LastActivity` has not been updated within `PingInterval + PongTimeout`, the connection is removed. + +**Graceful close**: `GracefulClose` sends `MsgDisconnect` before closing the underlying WebSocket. Write deadlines are managed exclusively inside `Send()` under `writeMu` to prevent the race (P2P-RACE-1) where a bare `SetWriteDeadline` call could race with concurrent sends. + +**Buffer pool**: `MarshalJSON` uses a `sync.Pool` of `bytes.Buffer` (initial capacity 1 KB, maximum pooled size 64 KB) to reduce allocation pressure in the message serialisation hot path. HTML escaping is disabled to match `json.Marshal` semantics. + +### peer.go — Peer Registry with KD-Tree Selection + +`PeerRegistry` maintains the set of known remote nodes and selects optimal peers via a 4-dimensional KD-tree (Poindexter library). + +**Peer fields persisted**: +- `ID`, `Name`, `PublicKey`, `Address`, `Role`, `AddedAt`, `LastSeen` +- `PingMS`, `Hops`, `GeoKM`, `Score` (float64, 0–100) + +**KD-tree dimensions** (lower is better in all axes): + +| Dimension | Weight | Rationale | +|-----------|--------|-----------| +| `PingMS` | 1.0 | Latency dominates interactive performance | +| `Hops` | 0.7 | Network hop count (routing cost) | +| `GeoKM` | 0.2 | Geographic distance (minor factor) | +| `100 - Score` | 1.2 | Reliability (inverted so lower = better peer) | + +`SelectOptimalPeer()` queries the tree for the point nearest to the origin (ideal: zero latency, zero hops, zero distance, maximum score). `SelectNearestPeers(n)` returns the n best. + +**Persistence**: Writes are debounced with a 5-second coalesce window (`scheduleSave`). The actual write uses an atomic rename pattern (write to `.tmp`, then `os.Rename`) to prevent partial file corruption. `Close()` flushes any pending dirty state synchronously. + +**Auth modes**: +- `PeerAuthOpen` — any connecting peer is accepted (default). +- `PeerAuthAllowlist` — only pre-registered peer IDs or explicitly allowlisted public keys are accepted. + +**Score bookkeeping**: + +| Event | Delta | +|-------|-------| +| Success | +1.0 (capped at 100) | +| Failure | −5.0 (floored at 0) | +| Timeout | −3.0 (floored at 0) | +| Default (new peer) | 50.0 | + +**Peer name validation**: Names must be 1–64 characters, start and end with an alphanumeric character, and contain only alphanumeric, hyphen, underscore, or space characters. + +### message.go — Protocol Messages + +`Message` is the top-level envelope for all node-to-node communication: + +```go +type Message struct { + ID string // UUID v4 + Type MessageType + From string // Sender node ID + To string // Recipient node ID (empty = broadcast) + Timestamp time.Time + Payload json.RawMessage + ReplyTo string // Set on responses; correlates to original message ID +} +``` + +**15 message types** across four categories: + +| Category | Types | +|----------|-------| +| Connection lifecycle | `handshake`, `handshake_ack`, `ping`, `pong`, `disconnect` | +| Miner operations | `get_stats`, `stats`, `start_miner`, `stop_miner`, `miner_ack` | +| Deployment | `deploy`, `deploy_ack` | +| Logs | `get_logs`, `logs` | +| Error | `error` | + +Protocol version negotiation is performed during handshake. `SupportedProtocolVersions` lists all accepted versions (currently `["1.0"]`). + +### protocol.go — Response Validation + +`ParseResponse` and `ValidateResponse` provide typed helpers for correlating request/response pairs. They check that the response message type matches the expected type and unmarshal the payload into a typed struct. + +### worker.go — Command Handlers + +`Worker` handles incoming requests on behalf of a node. It processes miner start/stop, stats retrieval, log fetching, and deployment via two decoupled interfaces: + +```go +type MinerManager interface { + StartMiner(config map[string]any) error + StopMiner(id string) error + GetStats() map[string]any + GetLogs(id string, lines int) ([]string, error) +} + +type ProfileManager interface { + ApplyProfile(name string, data []byte) error +} +``` + +These interfaces allow the worker to be driven by any concrete miner implementation without importing it directly. + +### controller.go — Remote Node Operations + +`Controller` issues requests to remote peers and correlates responses using a pending-map pattern: + +```go +pending map[string]chan *Message // message ID -> response channel +``` + +`sendRequest` registers a response channel, sends the message, and blocks with a `context.WithTimeout` until the response arrives or the deadline expires. `handleResponse` (registered as the transport `OnMessage` handler) routes incoming replies to the correct channel by matching `msg.ReplyTo`. + +Auto-connect: if the target peer is not yet connected, `sendRequest` calls `transport.Connect` transparently before sending. + +`GetAllStats` collects statistics from all connected peers in parallel using goroutines. + +### dispatcher.go — UEPS Intent Routing + +`Dispatcher` sits between the transport layer and application logic. It routes verified UEPS packets to registered intent handlers after enforcing the threat circuit breaker. + +**Threat circuit breaker**: Any packet with `ThreatScore > ThreatScoreThreshold` (50,000) is dropped and logged at WARN level before intent routing begins. The threshold sits at approximately 76% of the `uint16` maximum (50,000 / 65,535), providing headroom for legitimately elevated-risk traffic. + +**Intent routing**: Handlers are registered per `IntentID` (1:1 mapping). A `sync.RWMutex` protects the handler map: registration takes a write lock; dispatch takes a read lock (read-heavy workload). + +**Well-known intents**: + +| Constant | Value | Meaning | +|----------|-------|---------| +| `IntentHandshake` | `0x01` | Connection establishment | +| `IntentCompute` | `0x20` | Compute job request | +| `IntentRehab` | `0x30` | Benevolent intervention (pause execution) | +| `IntentCustom` | `0xFF` | Application-level sub-protocols | + +**Sentinel errors**: +- `ErrThreatScoreExceeded` — threat circuit breaker fired +- `ErrUnknownIntent` — no handler registered for the `IntentID` +- `ErrNilPacket` — nil packet passed to `Dispatch` + +### bundle.go — TIM Deployment Bundles + +`Bundle` wraps an encrypted deployment artefact (profile JSON or miner binary + config). Encryption uses Borg TIM (`tim.ToSigil` / `tim.FromSigil`) with a password-derived key. Integrity is verified with a SHA-256 checksum stored alongside the encrypted data. + +Tarball extraction (`extractTarball`) defends against: +- **Zip Slip** — rejects absolute paths and entries containing `..` traversal sequences; verifies every resolved path is still within the destination directory. +- **Decompression bombs** — limits each file to 100 MB. +- **Symlink attacks** — silently skips `tar.TypeSymlink` and `tar.TypeLink` entries. + +## ueps/ — UEPS Wire Protocol (RFC-021) + +The Unified Encrypted Packet Structure defines a TLV-encoded binary frame authenticated with HMAC-SHA256. + +### Packet Format + +``` +[0x01][len][Version] Header: Version (0x09 = IPv9) +[0x02][len][CurrentLayer] Header: Current network layer +[0x03][len][TargetLayer] Header: Target network layer +[0x04][len][IntentID] Header: Semantic routing token +[0x05][0x02][ThreatScore] Header: uint16, big-endian +[0x06][0x20][HMAC-SHA256] Signature: 32 bytes, covers header TLVs + payload data +[0xFF][...payload...] Data: no length prefix (relies on external framing) +``` + +**HMAC coverage**: The signature is computed over the serialised header TLVs (tags 0x01–0x05) concatenated with the raw payload bytes. The HMAC TLV itself (tag 0x06) and the payload tag byte (0xFF) are excluded from the signed data. + +### PacketBuilder + +`NewBuilder(intentID, payload)` creates a builder with sensible defaults (Version 0x09, layer 5/application, ThreatScore 0). `MarshalAndSign(sharedSecret)` serialises the frame and appends the HMAC. + +### ReadAndVerify + +`ReadAndVerify(r *bufio.Reader, sharedSecret)` reads a stream, decodes the TLV fields in order, reconstructs the signed data buffer, and verifies the HMAC with `hmac.Equal`. Unknown TLV tags are accumulated into the signed data buffer (forward-compatible extension mechanism) but their semantics are ignored. + +**Known limitation**: Tag 0xFF carries no length prefix. The reader calls `io.ReadAll` on the remaining stream, which requires external TCP framing (e.g. a 4-byte length prefix on the enclosing connection) to delimit the packet boundary. The packet is not self-delimiting. + +## logging/ — Structured Logger + +`Logger` writes structured lines to any `io.Writer` (default: `os.Stderr`) at four levels: DEBUG, INFO, WARN, ERROR. Each line carries a timestamp, level tag, optional component tag, the message string, and key-value pairs. + +Format: `2006/01/02 15:04:05 [LEVEL] [component] message | key=value key=value` + +A global logger instance is available via `logging.Debug(...)`, `logging.Info(...)`, etc. `logging.New(Config{...})` constructs a scoped logger for use within specific components (e.g., the dispatcher creates one with `Component: "dispatcher"`). + +## Concurrency Model + +| Resource | Protection | +|----------|------------| +| `Transport.conns` | `sync.RWMutex` | +| `Transport.handler` | `sync.RWMutex` | +| `PeerConnection` writes | `sync.Mutex` (`writeMu`) | +| `PeerConnection` close | `sync.Once` (`closeOnce`) | +| `PeerRegistry.peers` + KD-tree | `sync.RWMutex` | +| `PeerRegistry.allowedPublicKeys` | separate `sync.RWMutex` | +| `PeerRegistry.saveTimer` / `dirty` | `sync.Mutex` (`saveMu`) | +| `Controller.pending` | `sync.RWMutex` | +| `MessageDeduplicator.seen` | `sync.RWMutex` | +| `Dispatcher.handlers` | `sync.RWMutex` | +| `Transport.pendingConns` | `atomic.Int32` | + +The codebase is verified race-free under `go test -race`. + +## Dependency Graph + +``` +node/ ──► ueps/ +node/ ──► logging/ +node/ ──► github.com/Snider/Borg (STMF crypto, SMSG encryption, TIM) +node/ ──► github.com/Snider/Poindexter (KD-tree peer selection) +node/ ──► github.com/gorilla/websocket +node/ ──► github.com/google/uuid +ueps/ ──► (stdlib only) +logging/ ──► (stdlib only) +``` + +Borg transitively pulls in Enchantrix (secure environment) and ProtonMail go-crypto. diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..42045f9 --- /dev/null +++ b/docs/development.md @@ -0,0 +1,260 @@ +# Development Guide — go-p2p + +## Prerequisites + +- Go 1.25 or later (the module declares `go 1.25.5`) +- Network access to `forge.lthn.ai` for private dependencies (Borg, Poindexter, Enchantrix) +- SSH key configured for `git@forge.lthn.ai:2223` (HTTPS auth is not supported on Forge) + +Private modules are hosted at `forge.lthn.ai`. Ensure your `GONOSUMCHECK` or `GONOSUMDB` environment variable includes `forge.lthn.ai` if sum database verification fails for those paths, and that `GOPRIVATE=forge.lthn.ai` is set so the Go toolchain does not proxy them through `proxy.golang.org`. + +## Build and Test + +```bash +# Run all tests +go test ./... + +# Run a single test by name +go test -run TestName ./... + +# Run tests with race detector (required before any PR) +go test -race ./... + +# Skip integration tests (they bind real TCP ports) +go test -short ./... + +# Run benchmarks +go test -bench . ./... +go test -bench BenchmarkName ./... + +# Coverage per package +go test -cover ./node +go test -cover ./ueps +go test -cover ./logging + +# Coverage report (HTML) +go test -coverprofile=cover.out ./... && go tool cover -html=cover.out + +# Static analysis +go vet ./... +``` + +## Test Patterns + +### Table-Driven Subtests + +All tests use table-driven subtests with `t.Run()`. A test that does not follow this pattern should be refactored before merging. + +```go +func TestFoo(t *testing.T) { + cases := []struct { + name string + input string + want string + wantErr bool + }{ + {name: "valid input", input: "abc", want: "ABC"}, + {name: "empty input", input: "", wantErr: true}, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + got, err := Foo(tc.input) + if tc.wantErr { + require.Error(t, err) + return + } + require.NoError(t, err) + assert.Equal(t, tc.want, got) + }) + } +} +``` + +### Test Naming Suffixes + +Inherited from the wider go-p2p test tradition: + +| Suffix | Meaning | +|--------|---------| +| `_Good` | Happy path | +| `_Bad` | Expected error conditions | +| `_Ugly` | Panic or edge-case conditions | + +### Assertions + +Use `github.com/stretchr/testify`. Import both `assert` (non-fatal) and `require` (fatal on failure): + +```go +import ( + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) +``` + +Use `require` for setup steps and preconditions. Use `assert` for verification steps where partial results are still informative. + +### Transport Test Helper + +The `node` package provides a reusable helper for tests that need two live transport endpoints: + +```go +tp := setupTestTransportPair(t) +// tp.Server, tp.Client — *Transport +// tp.ServerNode, tp.ClientNode — *NodeManager +// tp.ServerReg, tp.ClientReg — *PeerRegistry + +pc := tp.connectClient(t) // performs handshake, returns *PeerConnection +``` + +`setupTestTransportPairWithConfig` accepts custom `TransportConfig` for each side, useful for testing keepalive and rate limiting behaviours. + +The helper registers a `t.Cleanup` function that calls `Stop()` on both transports, so tests do not need to manage teardown. + +### Integration Tests + +Integration tests are gated with `testing.Short()`: + +```go +if testing.Short() { + t.Skip("skipping integration test in short mode") +} +``` + +Run them explicitly with `go test ./...` (without `-short`). They bind real localhost TCP ports and are safe to run in parallel with distinct transports because each test uses an ephemeral listen address (`:0`-style via `net/http/httptest` internally). + +### Benchmark Structure + +Benchmarks live in `bench_test.go` files within each package. They follow the standard Go benchmark pattern: + +```go +func BenchmarkFoo(b *testing.B) { + // setup outside loop + b.ResetTimer() + for i := 0; i < b.N; i++ { + Foo() + } +} +``` + +Run with `-benchmem` to track allocations: + +```bash +go test -bench . -benchmem ./... +``` + +Reference timings (Apple M-series, 2025): + +| Benchmark | Time | Allocs | +|-----------|------|--------| +| Identity keygen | 217 µs | — | +| Shared secret derivation | 53 µs | — | +| Message serialise | 4 µs | — | +| SMSG encrypt+decrypt | 4.7 µs | — | +| Challenge sign+verify | 505 ns | — | +| KD-tree peer select | 349 ns | — | +| KD-tree rebuild | 2.5 µs | — | +| UEPS marshal | 621 ns | — | +| UEPS read+verify | 1 µs | — | +| bufpool get/put | 8 ns | 0 | +| Challenge generation | 211 ns | — | + +## Coding Standards + +### UK English + +All identifiers, comments, log messages, and documentation must use UK English spellings: + +- colour (not color) +- organisation (not organization) +- centre (not center) +- behaviour (not behavior) +- recognise (not recognize) + +### Strict Types + +All parameters and return types must carry explicit type annotations. Avoid `interface{}` except where a generic pool or JSON-raw interface genuinely requires it; prefer `any` (the Go 1.18 alias) if you must. Do not use blank identifiers to discard typed return values without good reason. + +### Error Handling + +- Never discard errors silently. +- Wrap errors with context using `fmt.Errorf("context: %w", err)`. +- Return typed sentinel errors for conditions callers need to inspect programmatically. + +### Licence Header + +Every new file must carry the EUPL-1.2 licence identifier. The module's `LICENSE` file governs the package. Do not include the full licence text in each file; a short SPDX identifier comment at the top is sufficient for new files: + +```go +// SPDX-License-Identifier: EUPL-1.2 +``` + +### Security-First + +- HMAC verification is required on all wire traffic (UEPS frames, not negotiable). +- Challenge-response authentication must not be weakened or bypassed in tests; use the `setupTestTransportPair` helper, which performs a real handshake. +- Any code that extracts archives must use `extractTarball` (or equivalent defensive logic) with Zip Slip defence, symlink rejection, and a size limit. +- Rate limiting and deduplication are not optional features; they are core to the security posture. + +### Logging + +Use the `logging` package throughout. Do not use `fmt.Println` or `log.Printf` in library code. + +```go +logging.Debug("connected to peer", logging.Fields{"peer_id": pc.Peer.ID}) +logging.Warn("peer rate limited", logging.Fields{"peer_id": pc.Peer.ID}) +``` + +For hot paths (read loop), use the debug log sampling pattern already established in `transport.go` to avoid flooding logs: + +```go +if debugLogCounter.Add(1)%debugLogInterval == 0 { + logging.Debug("received message", logging.Fields{...}) +} +``` + +## Conventional Commits + +All commits follow the Conventional Commits specification: + +``` +type(scope): short description + +Body (optional): longer explanation of the why, not the what. + +Co-Authored-By: Virgil +``` + +**Types**: `feat`, `fix`, `test`, `refactor`, `docs`, `chore`, `perf`, `ci` + +**Scopes**: `node`, `ueps`, `logging`, `transport`, `peer`, `dispatcher`, `identity`, `bundle`, `controller` + +Examples: + +``` +feat(dispatcher): implement UEPS threat circuit breaker + +test(transport): add keepalive timeout and MaxConns enforcement tests + +fix(peer): prevent data race in GracefulClose (P2P-RACE-1) +``` + +## Forge Remote + +The canonical remote is: + +``` +ssh://git@forge.lthn.ai:2223/core/go-p2p.git +``` + +Push to `forge` remote only. GitHub remotes are disabled for push. + +## Dependency Management + +After adding or removing a dependency: + +```bash +go mod tidy +go work sync # if working within the go-p2p workspace +``` + +Do not vendor dependencies. The module uses the standard module proxy for public packages and Forge for private ones. diff --git a/docs/history.md b/docs/history.md new file mode 100644 index 0000000..02f5819 --- /dev/null +++ b/docs/history.md @@ -0,0 +1,119 @@ +# Project History — go-p2p + +## Phases + +### Phase 1 — UEPS Wire Protocol Tests + +Commit `2bc53ba`. Coverage: ueps/ 88.5%. + +Implemented the complete test suite for the UEPS binary framing layer. Tests covered every aspect of the TLV encoding and HMAC-SHA256 signing: + +- PacketBuilder round-trip: basic, binary payload, elevated threat score, large payload +- HMAC verification: payload tampering detected, header tampering detected, wrong shared secret detected +- Boundary conditions: nil payload, empty slice payload, `uint16` max ThreatScore (65,535), TLV value exceeding 255 bytes (`writeTLV` error path) +- Stream robustness: truncated packets detected at multiple cut points (EOF mid-tag, mid-length, mid-value), missing HMAC tag, unknown TLV tags skipped and included in signed data + +The 11.5% gap from 100% coverage is the reader's `io.ReadAll` error path, which requires a contrived broken `io.Reader` to exercise. + +### Phase 2 — Transport Tests + +Commit `3ee5553`. Coverage: node/ 42% to 63.5%. + +Implemented transport layer tests with real WebSocket connections (no mocks). A reusable `setupTestTransportPair` helper creates two live transports on ephemeral ports and performs identity generation. + +Tests covered: +- Full handshake: challenge-response completes, 32-byte shared secret derived +- Handshake rejection: incompatible protocol version (rejection message sent before disconnect) +- Handshake rejection: allowlist mode, peer not authorised +- Encrypted message round-trip: SMSG encrypt on one side, decrypt on other +- Message deduplication: duplicate UUID dropped silently +- Rate limiting: burst of more than 100 messages, subsequent drops after token bucket empties +- MaxConns enforcement: 503 HTTP rejection when limit is reached +- Keepalive timeout: connection cleaned up after `PingInterval + PongTimeout` elapses +- Graceful close: `MsgDisconnect` sent before underlying WebSocket close +- Concurrent sends: no data races under `go test -race` (`writeMu` protects all writes) + +### Phase 3 — Controller Tests + +Commit `33eda7b`. Coverage: node/ 63.5% to 72.1%. 14 test functions. + +Also fixed bug P2P-RACE-1 (see Known Issues). + +Tests covered: +- Request-response correlation: message sent, worker replies with `ReplyTo` set, controller matches by ID +- Request timeout: no response within deadline, `sendRequest` returns timeout error, pending channel cleaned up +- Auto-connect: peer not yet connected, controller calls `transport.Connect` transparently +- GetAllStats: multiple connected peers, parallel stat collection, all results collected +- PingPeer RTT: ping sent, pong received, RTT calculated in milliseconds, peer metrics updated in registry +- Concurrent requests: multiple in-flight requests to different peers, correct correlation under load +- Dead peer cleanup: response channel closed and removed from pending map after timeout (no goroutine leak) + +### Phase 4 — Dispatcher Implementation + +Commit `a60dfdf`. Coverage: dispatcher.go 100%. + +Replaced the dispatcher stub with a complete implementation. 10 test functions, 17 subtests. + +Design decisions recorded at the time: + +1. `IntentHandler` as a `func` type rather than an interface, matching the `MessageHandler` pattern already used in `transport.go`. Lighter weight for a single-method contract. +2. Sentinel errors (`ErrThreatScoreExceeded`, `ErrUnknownIntent`, `ErrNilPacket`) rather than silent drops. Callers can inspect outcomes; the dispatcher still logs at WARN level regardless. +3. Threat check occurs before intent routing. A high-threat packet with an unknown intent returns `ErrThreatScoreExceeded`, not `ErrUnknownIntent`. The circuit breaker is the first line of defence. +4. Threshold fixed at 50,000 (a constant, not configurable) to match the original stub specification. The value sits at approximately 76% of `uint16` max. +5. `sync.RWMutex` for the handler map. Registration is infrequent (write lock); dispatch is read-heavy (read lock). + +Tests covered: register/dispatch, threat boundary conditions (at threshold, above threshold, `uint16` max, zero), unknown intent, multi-handler routing, nil packet, empty payload, concurrent dispatch (50 goroutines), concurrent register-and-dispatch, handler replacement, threat-before-routing ordering, intent constant value verification. + +### Phase 5 — Integration Tests and Benchmarks + +Coverage: race-free under `go test -race`. + +Three integration tests (`TestIntegration_*`) exercise the full stack end-to-end: + +- `TestIntegration_TwoNodeHandshakeAndMessage`: two nodes on localhost, identity creation, handshake, encrypted message exchange, controller ping/pong with RTT measurement, UEPS packet routing via dispatcher, threat circuit breaker verification, graceful shutdown with disconnect message. +- `TestIntegration_SharedSecretAgreement`: verifies that two independently created nodes derive identical 32-byte shared secrets via X25519 ECDH (fundamental correctness property). +- `TestIntegration_GetRemoteStats_EndToEnd`: full stats retrieval across a real WebSocket connection with worker and controller wired together. + +13 benchmark functions across `node/` and `ueps/`: +- Identity operations: keygen, shared secret derivation, challenge generation, challenge sign+verify +- Message operations: serialise +- Transport operations: SMSG encrypt+decrypt +- Peer registry: KD-tree select, KD-tree rebuild +- UEPS: marshal, read+verify +- Buffer pool: get/put (zero allocations confirmed) + +9 buffer pool tests (`bufpool_test.go`): get/put round-trip, buffer reuse verification, large buffer eviction (buffers exceeding 64 KB are not returned to the pool), concurrent get/put (100 goroutines × 50 iterations), buffer independence, `MarshalJSON` correctness for 7 payload types, independent copy verification, HTML escaping disabled, concurrent `MarshalJSON`. + +## Known Limitations + +### UEPS 0xFF Payload Not Self-Delimiting + +The `TagPayload` (0xFF) field carries no length prefix. `ReadAndVerify` calls `io.ReadAll` on the remaining stream, which means the packet format relies on external TCP framing to delimit the packet boundary. The enclosing transport must provide a length-prefixed frame before calling `ReadAndVerify`. This is noted in comments in both `packet.go` and `reader.go` but no solution is implemented. + +Consequence: UEPS packets cannot be chained in a raw stream without an outer framing protocol. The current WebSocket transport encapsulates each UEPS frame in a single WebSocket message, which provides the necessary boundary implicitly. + +### No Resource Cleanup on Some Error Paths + +`transport.handleWSUpgrade` does not clean up on handshake timeout (the `pendingConns` counter is decremented correctly via `defer`, but the underlying WebSocket connection may linger briefly before the read deadline fires). `transport.Connect` does not clean up the temporary connection object on handshake failure (the raw WebSocket `conn` is closed, but there is no registry or metrics cleanup for the partially constructed `PeerConnection`). + +These are low-severity gaps. They do not cause goroutine leaks under the current implementation because the connection's read loop is not started until after a successful handshake. + +### Controller Race (Resolved) + +The originally identified risk — that `transport.OnMessage(c.handleResponse)` is called during `NewController` initialisation and a message arriving before the pending map is ready could cause a panic — was confirmed to be a false alarm. The pending map is initialised in `NewController` before `OnMessage` is called, and `handleResponse` uses a mutex on all map access. No panic is possible. + +## Bugs Fixed + +### P2P-RACE-1 — GracefulClose Data Race (Phase 3) + +`GracefulClose` previously called `pc.Conn.SetWriteDeadline()` outside of `writeMu`, racing with concurrent `Send()` calls that also set the write deadline. Detected by `go test -race`. + +Fix: removed the bare `SetWriteDeadline` call from `GracefulClose`. The method now relies entirely on `Send()`, which manages write deadlines under `writeMu`. This is documented in a comment in `transport.go` to prevent the pattern from being reintroduced. + +## Wiki Corrections (19 February 2026) + +Three wiki inconsistencies were identified and corrected: + +- The Node-Identity page stated `PublicKey` is hex-encoded. The code uses base64 (`identity.go:63`). +- The Protocol-Messages page used a `Sender` field. The code uses `From` and `To` (`message.go:66-67`). +- The Peer-Discovery page stated `Score` is in the range 0.0–1.0. The code uses a float64 range of 0–100 (`peer.go:31`).