go-p2p/docs/discovery.md
Virgil 2d63a8ba18
Some checks failed
Security Scan / security (push) Successful in 9s
Test / test (push) Failing after 52s
refactor(node): add AX-native aliases for component and path APIs
Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-30 19:32:26 +00:00

6.9 KiB

title description
Peer Discovery KD-tree peer selection across four weighted dimensions, score tracking, and allowlist authentication.

Peer Discovery

The PeerRegistry manages known peers with intelligent selection via a 4-dimensional KD-tree (powered by Poindexter). Peers are scored based on network metrics and reliability, persisted with debounced writes, and gated by configurable authentication modes.

Peer Structure

type Peer struct {
    ID        string    `json:"id"`        // Node ID (derived from public key)
    Name      string    `json:"name"`      // Human-readable name
    PublicKey string    `json:"publicKey"` // X25519 public key (base64)
    Address   string    `json:"address"`   // host:port for WebSocket connection
    Role      NodeRole  `json:"role"`      // controller, worker, or dual
    AddedAt   time.Time `json:"addedAt"`
    LastSeen  time.Time `json:"lastSeen"`

    // Poindexter metrics (updated dynamically)
    PingMS float64 `json:"pingMs"` // Latency in milliseconds
    Hops   int     `json:"hops"`   // Network hop count
    GeoKM  float64 `json:"geoKm"`  // Geographic distance in kilometres
    Score  float64 `json:"score"`  // Reliability score 0--100

    Connected bool `json:"-"` // Not persisted
}

Authentication Modes

const (
    PeerAuthOpen      PeerAuthMode = iota // Accept any peer that authenticates
    PeerAuthAllowlist                      // Only accept pre-approved peers
)
Mode Behaviour
PeerAuthOpen Any node that completes the challenge-response handshake is accepted
PeerAuthAllowlist Only nodes whose public keys appear on the allowlist, or that are already registered, are accepted

Allowlist Management

registry.SetAuthMode(node.PeerAuthAllowlist)

// Add/revoke public keys
registry.AllowPublicKey(peerPublicKeyBase64)
registry.RevokePublicKey(peerPublicKeyBase64)

// Check
ok := registry.IsPublicKeyAllowed(peerPublicKeyBase64)

// List all allowed keys
keys := registry.ListAllowedPublicKeys()

// Iterate
for key := range registry.AllowedPublicKeys() {
    // ...
}

The IsPeerAllowed(peerID, publicKey) method is called during the transport handshake. It returns true if:

  • Auth mode is PeerAuthOpen, or
  • The peer ID is already registered in the registry, or
  • The public key is on the allowlist.

Peer Name Validation

Peer names are validated on AddPeer():

  • 1--64 characters
  • Must start and end with alphanumeric characters
  • May contain alphanumeric, hyphens, underscores, and spaces
  • Empty names are permitted (the field is optional)

KD-Tree Peer Selection

The registry maintains a 4-dimensional KD-tree for optimal peer selection. Each peer is represented as a weighted point:

Dimension Source Weight Direction
Latency PingMS 1.0 Lower is better
Hops Hops 0.7 Lower is better
Geographic distance GeoKM 0.2 Lower is better
Reliability 100 - Score 1.2 Inverted so lower is better

The score dimension is inverted so that the "ideal peer" target point [0, 0, 0, 0] represents zero latency, zero hops, zero distance, and maximum reliability (score 100).

Selecting Peers

// Best single peer (nearest to ideal in 4D space)
best := registry.SelectOptimalPeer()

// Top N peers
top3 := registry.SelectNearestPeers(3)

// All peers sorted by score (highest first)
ranked := registry.GetPeersByScore()

// Iterator over peers by score
for peer := range registry.PeersByScore() {
    // ...
}

The KD-tree uses Euclidean distance (configured via poindexter.WithMetric(poindexter.EuclideanDistance{})). It is rebuilt whenever peers are added, removed, or their metrics change.

Score Tracking

Peer reliability is tracked with a score between 0 and 100 (default 50 for new peers).

const (
    ScoreSuccessIncrement = 1.0   // +1 per successful interaction
    ScoreFailureDecrement = 5.0   // -5 per failure
    ScoreTimeoutDecrement = 3.0   // -3 per timeout
)
registry.RecordSuccess(peerID)  // Score += 1, capped at 100
registry.RecordFailure(peerID)  // Score -= 5, floored at 0
registry.RecordTimeout(peerID)  // Score -= 3, floored at 0

// Direct score update (clamped to 0--100)
registry.UpdateScore(peerID, 75.0)

The asymmetric adjustments (+1 success vs -5 failure) mean that a peer must sustain consistent reliability to maintain a high score. A single failure costs five successful interactions to recover.

Metric Updates

registry.UpdateMetrics(peerID, pingMS, geoKM, hops)

This also updates LastSeen and triggers a KD-tree rebuild.

Registry Operations

// Create
registry, err := node.NewPeerRegistry()             // XDG paths
registry, err := node.NewPeerRegistryFromPath(path)  // Custom path (testing)

// CRUD
err := registry.AddPeer(peer)
err := registry.UpdatePeer(peer)
err := registry.RemovePeer(peerID)
peer := registry.GetPeer(peerID)    // Returns a copy

// Lists and iterators
peers := registry.ListPeers()
count := registry.Count()

for peer := range registry.Peers() {
    // Each peer is a copy to prevent mutation
}

// Connection state
registry.SetConnected(peerID, true)
connected := registry.GetConnectedPeers()

for peer := range registry.ConnectedPeers() {
    // ...
}

Persistence

Peers are persisted to ~/.config/lethean-desktop/peers.json as a JSON array.

Debounced Writes

To avoid excessive disk I/O, saves are debounced with a 5-second coalesce interval. Multiple mutations within that window produce a single disk write. The write uses an atomic rename pattern (write to .tmp, then rename) to prevent corruption on crash.

// Flush pending changes on shutdown
err := registry.Close()

Always call Close() before shutdown to ensure unsaved changes are flushed.

Peer Lifecycle

Discovery          Authentication         Active              Stale
   |                    |                   |                   |
   |  handshake    +---------+     +--------+------+     +-----+
   +-------------->| Auth    |---->| Score  | Ping |---->| Evict|
                   | Check   |     | Update | Loop |     | or   |
                   +---------+     +--------+------+     | Retry|
                        |                                 +-----+
                   [rejected]
  1. Discovery -- New peer connects or is discovered via mesh communication.
  2. Authentication -- Challenge-response handshake (see identity.md). Allowlist checked if in allowlist mode.
  3. Active -- Metrics updated via ping, score adjusted on success/failure, eligible for task assignment.
  4. Stale -- No response after keepalive timeout; connection removed, Connected set to false.

Thread Safety

PeerRegistry is safe for concurrent use. A sync.RWMutex protects the peer map and KD-tree. Allowlist operations use a separate sync.RWMutex. All public methods that return peers return copies to prevent external mutation.