Poindexter/docs/kdtree-multidimensional.md

289 lines
8.7 KiB
Markdown
Raw Normal View History

# KDTree: MultiDimensional Search (DHT peers)
This example extends the singledimension "best ping" demo to a realistic multidimensional selection:
- ping_ms (lower is better)
- hop_count (lower is better)
- geo_distance_km (lower is better)
- score (higher is better — e.g., capacity/reputation)
We will:
- Build 4D points over these features
- Run `Nearest`, `KNearest`, and `Radius` queries
- Show subsets: ping+hop (2D) and ping+hop+geo (3D)
- Demonstrate weighting/normalization to balance disparate units
> Tip: KDTree distances are geometric. Mixing units (ms, hops, km, arbitrary score) requires scaling so that each axis contributes proportionally to your decision policy.
## Dataset
Well use a small, madeup set of DHT peers in each runnable example below. Each example declares its own `Peer` type and dataset so you can copypaste and run independently.
## Normalization and weights
To make heterogeneous units comparable (ms, hops, km, score), use the library helpers which:
- Minmax normalize each axis to [0,1] over your provided dataset
- Optionally invert axes where “higher is better” so they become “lower cost”
- Apply peraxis weights so you can emphasize what matters
Build 4D points and query them with helpers (full program):
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
GeoKM float64
Score float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3, GeoKM: 1200, Score: 0.86},
{ID: "B", PingMS: 34, Hops: 2, GeoKM: 800, Score: 0.91},
{ID: "C", PingMS: 15, Hops: 4, GeoKM: 4500, Score: 0.70},
{ID: "D", PingMS: 55, Hops: 1, GeoKM: 300, Score: 0.95},
{ID: "E", PingMS: 18, Hops: 2, GeoKM: 2200, Score: 0.80},
}
func main() {
// Build 4D KDTree using Euclidean (L2)
weights4 := [4]float64{1.0, 0.7, 0.2, 1.2}
invert4 := [4]bool{false, false, false, true} // invert score (higher is better)
pts, err := poindexter.Build4D(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
func(p Peer) float64 { return p.GeoKM },
func(p Peer) float64 { return p.Score },
weights4, invert4,
)
if err != nil { panic(err) }
tree, _ := poindexter.NewKDTree(pts, poindexter.WithMetric(poindexter.EuclideanDistance{}))
// Query target preferences (construct a query in normalized/weighted space)
// Example: seek very low ping, low hops, moderate geo, high score (low score_cost)
query := []float64{weights4[0]*0.0, weights4[1]*0.2, weights4[2]*0.3, weights4[3]*0.0}
// 1NN
best, dist, ok := tree.Nearest(query)
if ok {
fmt.Printf("Best peer: %s (dist=%.4f)\n", best.ID, dist)
}
// kNN (top 3)
neigh, dists := tree.KNearest(query, 3)
for i := range neigh {
fmt.Printf("%d) %s dist=%.4f\n", i+1, neigh[i].ID, dists[i])
}
// Radius query
within, wd := tree.Radius(query, 0.35)
fmt.Printf("Within radius 0.35: ")
for i := range within {
fmt.Printf("%s(%.3f) ", within[i].ID, wd[i])
}
fmt.Println()
}
```
## 2D: Ping + Hop
Sometimes you want a strict tradeoff between just latency and path length. Build 2D points using helpers:
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3},
{ID: "B", PingMS: 34, Hops: 2},
{ID: "C", PingMS: 15, Hops: 4},
{ID: "D", PingMS: 55, Hops: 1},
{ID: "E", PingMS: 18, Hops: 2},
}
func main() {
weights2 := [2]float64{1.0, 1.0}
invert2 := [2]bool{false, false}
pts2, err := poindexter.Build2D(
peers,
func(p Peer) string { return p.ID }, // id
func(p Peer) float64 { return p.PingMS },// f1: ping
func(p Peer) float64 { return p.Hops }, // f2: hops
weights2, invert2,
)
if err != nil { panic(err) }
tree2, _ := poindexter.NewKDTree(pts2, poindexter.WithMetric(poindexter.ManhattanDistance{})) // L1 favors axisaligned tradeoffs
// Prefer very low ping, modest hops
query2 := []float64{weights2[0]*0.0, weights2[1]*0.3}
best2, _, _ := tree2.Nearest(query2)
fmt.Println("2D best (ping+hop):", best2.ID)
}
```
## 3D: Ping + Hop + Geo
Add geography to discourage far peers when latency is similar. Use the 3D helper:
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
GeoKM float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3, GeoKM: 1200},
{ID: "B", PingMS: 34, Hops: 2, GeoKM: 800},
{ID: "C", PingMS: 15, Hops: 4, GeoKM: 4500},
{ID: "D", PingMS: 55, Hops: 1, GeoKM: 300},
{ID: "E", PingMS: 18, Hops: 2, GeoKM: 2200},
}
func main() {
weights3 := [3]float64{1.0, 0.7, 0.3}
invert3 := [3]bool{false, false, false}
pts3, err := poindexter.Build3D(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
func(p Peer) float64 { return p.GeoKM },
weights3, invert3,
)
if err != nil { panic(err) }
tree3, _ := poindexter.NewKDTree(pts3, poindexter.WithMetric(poindexter.EuclideanDistance{}))
// Prefer low ping/hop, modest geo
query3 := []float64{weights3[0]*0.0, weights3[1]*0.2, weights3[2]*0.4}
top3, _, _ := tree3.Nearest(query3)
fmt.Println("3D best (ping+hop+geo):", top3.ID)
}
```
## Dynamic updates
Your routing table changes constantly. Insert/remove peers. For consistent normalization, compute and reuse your min/max stats (preferred) or rebuild points when the candidate set changes.
Tip: Use the WithStats helpers to reuse normalization across updates:
```go
// Compute once over your baseline
stats := poindexter.ComputeNormStats2D(peers,
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
)
// Build now or later using the same stats
ts, _ := poindexter.Build2DWithStats(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
[2]float64{1,1}, [2]bool{false,false}, stats,
)
```
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3},
{ID: "B", PingMS: 34, Hops: 2},
{ID: "C", PingMS: 15, Hops: 4},
}
func main() {
// Initial 2D build (ping + hops)
weights2 := [2]float64{1.0, 1.0}
invert2 := [2]bool{false, false}
// Compute normalization stats once over your baseline set
stats := poindexter.ComputeNormStats2D(
peers,
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
)
// Build using the precomputed stats so future inserts share the same scale
pts, _ := poindexter.Build2DWithStats(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
weights2, invert2, stats,
)
tree, _ := poindexter.NewKDTree(pts)
// Insert a new peer: reuse the same normalization stats to keep scale consistent
newPeer := Peer{ID: "Z", PingMS: 12, Hops: 2}
addPts, _ := poindexter.Build2DWithStats(
[]Peer{newPeer},
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
weights2, invert2, stats,
)
_ = tree.Insert(addPts[0])
// Verify nearest now prefers Z for low ping target
best, _, _ := tree.Nearest([]float64{0, 0})
fmt.Println("Best after insert:", best.ID)
// Delete by ID when peer goes offline
_ = tree.DeleteByID("Z")
}
```
## Choosing a metric
- Euclidean (L2): smooth tradeoffs across axes; good default for blended preferences
- Manhattan (L1): emphasizes peraxis absolute differences; useful when each unit of ping/hop matters equally
- Chebyshev (L∞): minmax style; dominated by the worst axis (e.g., reject any peer with too many hops regardless of ping)
## Notes on production use
- Keep and reuse normalization parameters (min/max or mean/std) rather than recomputing per query to avoid drift.
- Consider capping outliers (e.g., clamp geo distances > 5000 km).
- For large N (≥ 1e5) and low dims (≤ 8), consider swapping the internal engine to `gonum.org/v1/gonum/spatial/kdtree` behind the same API for faster queries.