Poindexter/docs/kdtree-multidimensional.md

288 lines
8.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# KDTree: MultiDimensional Search (DHT peers)
This example extends the singledimension "best ping" demo to a realistic multidimensional selection:
- ping_ms (lower is better)
- hop_count (lower is better)
- geo_distance_km (lower is better)
- score (higher is better — e.g., capacity/reputation)
We will:
- Build 4D points over these features
- Run `Nearest`, `KNearest`, and `Radius` queries
- Show subsets: ping+hop (2D) and ping+hop+geo (3D)
- Demonstrate weighting/normalization to balance disparate units
> Tip: KDTree distances are geometric. Mixing units (ms, hops, km, arbitrary score) requires scaling so that each axis contributes proportionally to your decision policy.
## Dataset
Well use a small, madeup set of DHT peers in each runnable example below. Each example declares its own `Peer` type and dataset so you can copypaste and run independently.
## Normalization and weights
To make heterogeneous units comparable (ms, hops, km, score), use the library helpers which:
- Minmax normalize each axis to [0,1] over your provided dataset
- Optionally invert axes where “higher is better” so they become “lower cost”
- Apply peraxis weights so you can emphasize what matters
Build 4D points and query them with helpers (full program):
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
GeoKM float64
Score float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3, GeoKM: 1200, Score: 0.86},
{ID: "B", PingMS: 34, Hops: 2, GeoKM: 800, Score: 0.91},
{ID: "C", PingMS: 15, Hops: 4, GeoKM: 4500, Score: 0.70},
{ID: "D", PingMS: 55, Hops: 1, GeoKM: 300, Score: 0.95},
{ID: "E", PingMS: 18, Hops: 2, GeoKM: 2200, Score: 0.80},
}
func main() {
// Build 4D KDTree using Euclidean (L2)
weights4 := [4]float64{1.0, 0.7, 0.2, 1.2}
invert4 := [4]bool{false, false, false, true} // invert score (higher is better)
pts, err := poindexter.Build4D(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
func(p Peer) float64 { return p.GeoKM },
func(p Peer) float64 { return p.Score },
weights4, invert4,
)
if err != nil { panic(err) }
tree, _ := poindexter.NewKDTree(pts, poindexter.WithMetric(poindexter.EuclideanDistance{}))
// Query target preferences (construct a query in normalized/weighted space)
// Example: seek very low ping, low hops, moderate geo, high score (low score_cost)
query := []float64{weights4[0]*0.0, weights4[1]*0.2, weights4[2]*0.3, weights4[3]*0.0}
// 1NN
best, dist, ok := tree.Nearest(query)
if ok {
fmt.Printf("Best peer: %s (dist=%.4f)\n", best.ID, dist)
}
// kNN (top 3)
neigh, dists := tree.KNearest(query, 3)
for i := range neigh {
fmt.Printf("%d) %s dist=%.4f\n", i+1, neigh[i].ID, dists[i])
}
// Radius query
within, wd := tree.Radius(query, 0.35)
fmt.Printf("Within radius 0.35: ")
for i := range within {
fmt.Printf("%s(%.3f) ", within[i].ID, wd[i])
}
fmt.Println()
}
```
## 2D: Ping + Hop
Sometimes you want a strict tradeoff between just latency and path length. Build 2D points using helpers:
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3},
{ID: "B", PingMS: 34, Hops: 2},
{ID: "C", PingMS: 15, Hops: 4},
{ID: "D", PingMS: 55, Hops: 1},
{ID: "E", PingMS: 18, Hops: 2},
}
func main() {
weights2 := [2]float64{1.0, 1.0}
invert2 := [2]bool{false, false}
pts2, err := poindexter.Build2D(
peers,
func(p Peer) string { return p.ID }, // id
func(p Peer) float64 { return p.PingMS },// f1: ping
func(p Peer) float64 { return p.Hops }, // f2: hops
weights2, invert2,
)
if err != nil { panic(err) }
tree2, _ := poindexter.NewKDTree(pts2, poindexter.WithMetric(poindexter.ManhattanDistance{})) // L1 favors axisaligned tradeoffs
// Prefer very low ping, modest hops
query2 := []float64{weights2[0]*0.0, weights2[1]*0.3}
best2, _, _ := tree2.Nearest(query2)
fmt.Println("2D best (ping+hop):", best2.ID)
}
```
## 3D: Ping + Hop + Geo
Add geography to discourage far peers when latency is similar. Use the 3D helper:
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
GeoKM float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3, GeoKM: 1200},
{ID: "B", PingMS: 34, Hops: 2, GeoKM: 800},
{ID: "C", PingMS: 15, Hops: 4, GeoKM: 4500},
{ID: "D", PingMS: 55, Hops: 1, GeoKM: 300},
{ID: "E", PingMS: 18, Hops: 2, GeoKM: 2200},
}
func main() {
weights3 := [3]float64{1.0, 0.7, 0.3}
invert3 := [3]bool{false, false, false}
pts3, err := poindexter.Build3D(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
func(p Peer) float64 { return p.GeoKM },
weights3, invert3,
)
if err != nil { panic(err) }
tree3, _ := poindexter.NewKDTree(pts3, poindexter.WithMetric(poindexter.EuclideanDistance{}))
// Prefer low ping/hop, modest geo
query3 := []float64{weights3[0]*0.0, weights3[1]*0.2, weights3[2]*0.4}
top3, _, _ := tree3.Nearest(query3)
fmt.Println("3D best (ping+hop+geo):", top3.ID)
}
```
## Dynamic updates
Your routing table changes constantly. Insert/remove peers. For consistent normalization, compute and reuse your min/max stats (preferred) or rebuild points when the candidate set changes.
Tip: Use the WithStats helpers to reuse normalization across updates:
```go
// Compute once over your baseline
stats := poindexter.ComputeNormStats2D(peers,
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
)
// Build now or later using the same stats
ts, _ := poindexter.Build2DWithStats(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
[2]float64{1,1}, [2]bool{false,false}, stats,
)
```
```go
package main
import (
"fmt"
poindexter "github.com/Snider/Poindexter"
)
type Peer struct {
ID string
PingMS float64
Hops float64
}
var peers = []Peer{
{ID: "A", PingMS: 22, Hops: 3},
{ID: "B", PingMS: 34, Hops: 2},
{ID: "C", PingMS: 15, Hops: 4},
}
func main() {
// Initial 2D build (ping + hops)
weights2 := [2]float64{1.0, 1.0}
invert2 := [2]bool{false, false}
// Compute normalization stats once over your baseline set
stats := poindexter.ComputeNormStats2D(
peers,
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
)
// Build using the precomputed stats so future inserts share the same scale
pts, _ := poindexter.Build2DWithStats(
peers,
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
weights2, invert2, stats,
)
tree, _ := poindexter.NewKDTree(pts)
// Insert a new peer: reuse the same normalization stats to keep scale consistent
newPeer := Peer{ID: "Z", PingMS: 12, Hops: 2}
addPts, _ := poindexter.Build2DWithStats(
[]Peer{newPeer},
func(p Peer) string { return p.ID },
func(p Peer) float64 { return p.PingMS },
func(p Peer) float64 { return p.Hops },
weights2, invert2, stats,
)
_ = tree.Insert(addPts[0])
// Verify nearest now prefers Z for low ping target
best, _, _ := tree.Nearest([]float64{0, 0})
fmt.Println("Best after insert:", best.ID)
// Delete by ID when peer goes offline
_ = tree.DeleteByID("Z")
}
```
## Choosing a metric
- Euclidean (L2): smooth tradeoffs across axes; good default for blended preferences
- Manhattan (L1): emphasizes peraxis absolute differences; useful when each unit of ping/hop matters equally
- Chebyshev (L∞): minmax style; dominated by the worst axis (e.g., reject any peer with too many hops regardless of ping)
## Notes on production use
- Keep and reuse normalization parameters (min/max or mean/std) rather than recomputing per query to avoid drift.
- Consider capping outliers (e.g., clamp geo distances > 5000 km).
- For large N (≥ 1e5) and low dims (≤ 8), consider swapping the internal engine to `gonum.org/v1/gonum/spatial/kdtree` behind the same API for faster queries.