go-p2p/docs/history.md
Snider 3dd204705a docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:01:55 +00:00

8.4 KiB
Raw Blame History

Project History — go-p2p

Phases

Phase 1 — UEPS Wire Protocol Tests

Commit 2bc53ba. Coverage: ueps/ 88.5%.

Implemented the complete test suite for the UEPS binary framing layer. Tests covered every aspect of the TLV encoding and HMAC-SHA256 signing:

  • PacketBuilder round-trip: basic, binary payload, elevated threat score, large payload
  • HMAC verification: payload tampering detected, header tampering detected, wrong shared secret detected
  • Boundary conditions: nil payload, empty slice payload, uint16 max ThreatScore (65,535), TLV value exceeding 255 bytes (writeTLV error path)
  • Stream robustness: truncated packets detected at multiple cut points (EOF mid-tag, mid-length, mid-value), missing HMAC tag, unknown TLV tags skipped and included in signed data

The 11.5% gap from 100% coverage is the reader's io.ReadAll error path, which requires a contrived broken io.Reader to exercise.

Phase 2 — Transport Tests

Commit 3ee5553. Coverage: node/ 42% to 63.5%.

Implemented transport layer tests with real WebSocket connections (no mocks). A reusable setupTestTransportPair helper creates two live transports on ephemeral ports and performs identity generation.

Tests covered:

  • Full handshake: challenge-response completes, 32-byte shared secret derived
  • Handshake rejection: incompatible protocol version (rejection message sent before disconnect)
  • Handshake rejection: allowlist mode, peer not authorised
  • Encrypted message round-trip: SMSG encrypt on one side, decrypt on other
  • Message deduplication: duplicate UUID dropped silently
  • Rate limiting: burst of more than 100 messages, subsequent drops after token bucket empties
  • MaxConns enforcement: 503 HTTP rejection when limit is reached
  • Keepalive timeout: connection cleaned up after PingInterval + PongTimeout elapses
  • Graceful close: MsgDisconnect sent before underlying WebSocket close
  • Concurrent sends: no data races under go test -race (writeMu protects all writes)

Phase 3 — Controller Tests

Commit 33eda7b. Coverage: node/ 63.5% to 72.1%. 14 test functions.

Also fixed bug P2P-RACE-1 (see Known Issues).

Tests covered:

  • Request-response correlation: message sent, worker replies with ReplyTo set, controller matches by ID
  • Request timeout: no response within deadline, sendRequest returns timeout error, pending channel cleaned up
  • Auto-connect: peer not yet connected, controller calls transport.Connect transparently
  • GetAllStats: multiple connected peers, parallel stat collection, all results collected
  • PingPeer RTT: ping sent, pong received, RTT calculated in milliseconds, peer metrics updated in registry
  • Concurrent requests: multiple in-flight requests to different peers, correct correlation under load
  • Dead peer cleanup: response channel closed and removed from pending map after timeout (no goroutine leak)

Phase 4 — Dispatcher Implementation

Commit a60dfdf. Coverage: dispatcher.go 100%.

Replaced the dispatcher stub with a complete implementation. 10 test functions, 17 subtests.

Design decisions recorded at the time:

  1. IntentHandler as a func type rather than an interface, matching the MessageHandler pattern already used in transport.go. Lighter weight for a single-method contract.
  2. Sentinel errors (ErrThreatScoreExceeded, ErrUnknownIntent, ErrNilPacket) rather than silent drops. Callers can inspect outcomes; the dispatcher still logs at WARN level regardless.
  3. Threat check occurs before intent routing. A high-threat packet with an unknown intent returns ErrThreatScoreExceeded, not ErrUnknownIntent. The circuit breaker is the first line of defence.
  4. Threshold fixed at 50,000 (a constant, not configurable) to match the original stub specification. The value sits at approximately 76% of uint16 max.
  5. sync.RWMutex for the handler map. Registration is infrequent (write lock); dispatch is read-heavy (read lock).

Tests covered: register/dispatch, threat boundary conditions (at threshold, above threshold, uint16 max, zero), unknown intent, multi-handler routing, nil packet, empty payload, concurrent dispatch (50 goroutines), concurrent register-and-dispatch, handler replacement, threat-before-routing ordering, intent constant value verification.

Phase 5 — Integration Tests and Benchmarks

Coverage: race-free under go test -race.

Three integration tests (TestIntegration_*) exercise the full stack end-to-end:

  • TestIntegration_TwoNodeHandshakeAndMessage: two nodes on localhost, identity creation, handshake, encrypted message exchange, controller ping/pong with RTT measurement, UEPS packet routing via dispatcher, threat circuit breaker verification, graceful shutdown with disconnect message.
  • TestIntegration_SharedSecretAgreement: verifies that two independently created nodes derive identical 32-byte shared secrets via X25519 ECDH (fundamental correctness property).
  • TestIntegration_GetRemoteStats_EndToEnd: full stats retrieval across a real WebSocket connection with worker and controller wired together.

13 benchmark functions across node/ and ueps/:

  • Identity operations: keygen, shared secret derivation, challenge generation, challenge sign+verify
  • Message operations: serialise
  • Transport operations: SMSG encrypt+decrypt
  • Peer registry: KD-tree select, KD-tree rebuild
  • UEPS: marshal, read+verify
  • Buffer pool: get/put (zero allocations confirmed)

9 buffer pool tests (bufpool_test.go): get/put round-trip, buffer reuse verification, large buffer eviction (buffers exceeding 64 KB are not returned to the pool), concurrent get/put (100 goroutines × 50 iterations), buffer independence, MarshalJSON correctness for 7 payload types, independent copy verification, HTML escaping disabled, concurrent MarshalJSON.

Known Limitations

UEPS 0xFF Payload Not Self-Delimiting

The TagPayload (0xFF) field carries no length prefix. ReadAndVerify calls io.ReadAll on the remaining stream, which means the packet format relies on external TCP framing to delimit the packet boundary. The enclosing transport must provide a length-prefixed frame before calling ReadAndVerify. This is noted in comments in both packet.go and reader.go but no solution is implemented.

Consequence: UEPS packets cannot be chained in a raw stream without an outer framing protocol. The current WebSocket transport encapsulates each UEPS frame in a single WebSocket message, which provides the necessary boundary implicitly.

No Resource Cleanup on Some Error Paths

transport.handleWSUpgrade does not clean up on handshake timeout (the pendingConns counter is decremented correctly via defer, but the underlying WebSocket connection may linger briefly before the read deadline fires). transport.Connect does not clean up the temporary connection object on handshake failure (the raw WebSocket conn is closed, but there is no registry or metrics cleanup for the partially constructed PeerConnection).

These are low-severity gaps. They do not cause goroutine leaks under the current implementation because the connection's read loop is not started until after a successful handshake.

Controller Race (Resolved)

The originally identified risk — that transport.OnMessage(c.handleResponse) is called during NewController initialisation and a message arriving before the pending map is ready could cause a panic — was confirmed to be a false alarm. The pending map is initialised in NewController before OnMessage is called, and handleResponse uses a mutex on all map access. No panic is possible.

Bugs Fixed

P2P-RACE-1 — GracefulClose Data Race (Phase 3)

GracefulClose previously called pc.Conn.SetWriteDeadline() outside of writeMu, racing with concurrent Send() calls that also set the write deadline. Detected by go test -race.

Fix: removed the bare SetWriteDeadline call from GracefulClose. The method now relies entirely on Send(), which manages write deadlines under writeMu. This is documented in a comment in transport.go to prevent the pattern from being reintroduced.

Wiki Corrections (19 February 2026)

Three wiki inconsistencies were identified and corrected:

  • The Node-Identity page stated PublicKey is hex-encoded. The code uses base64 (identity.go:63).
  • The Protocol-Messages page used a Sender field. The code uses From and To (message.go:66-67).
  • The Peer-Discovery page stated Score is in the range 0.01.0. The code uses a float64 range of 0100 (peer.go:31).