2025-11-03 18:26:40 +00:00
# Performance: KDTree benchmarks and guidance
2025-11-04 01:44:16 +00:00
This page summarizes how to measure KDTree performance in this repository and how to compare the two internal backends (Linear vs Gonum) that you can select at build/runtime.
2025-11-03 18:26:40 +00:00
## How benchmarks are organized
2025-11-04 01:44:16 +00:00
- Micro-benchmarks live in `bench_kdtree_test.go` and `bench_kdtree_dual_test.go` and cover:
2025-11-03 18:26:40 +00:00
- `Nearest` in 2D and 4D with N = 1k, 10k
2025-11-04 01:44:16 +00:00
- `KNearest(k=10)` in 2D/4D with N = 1k, 10k
- `Radius` (mid radius r≈0.5 after normalization) in 2D/4D with N = 1k, 10k
- Datasets: Uniform and 3-cluster synthetic generators in normalized [0,1] spaces.
- Backends: Linear (always available) and Gonum (enabled when built with `-tags=gonum` ).
2025-11-03 18:26:40 +00:00
Run them locally:
```bash
2025-11-04 01:44:16 +00:00
# Linear backend (default)
2025-11-03 18:26:40 +00:00
go test -bench . -benchmem -run=^$ ./...
2025-11-04 01:44:16 +00:00
# Gonum backend (optimized KD; requires build tag)
go test -tags=gonum -bench . -benchmem -run=^$ ./...
2025-11-03 18:26:40 +00:00
```
2025-11-04 01:44:16 +00:00
GitHub Actions publishes benchmark artifacts on every push/PR:
- Linear job: artifact `bench-linear.txt`
- Gonum job: artifact `bench-gonum.txt`
## Backend selection and defaults
- Default backend is Linear.
- If you build with `-tags=gonum` , the default switches to the optimized KD backend.
- You can override at runtime:
```
// Force Linear
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendLinear))
// Force Gonum (requires build tag)
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendGonum))
```
Supported metrics in the optimized backend: L2 (Euclidean), L1 (Manhattan), L∞ (Chebyshev). Cosine/Weighted-Cosine currently use the Linear backend.
2025-11-03 18:26:40 +00:00
## What to expect (rule of thumb)
2025-11-04 01:44:16 +00:00
- Linear backend: O(n) per query; fast for small-to-medium datasets (≤10k), especially in low dims (≤4).
- Gonum backend: typically sub-linear for prunable datasets and dims ≤ ~8, with noticeable gains as N grows (≥10k– 100k), especially on uniform or moderately clustered data and moderate radii.
- For large radii (many points within r) or highly correlated/pathological data, pruning may be less effective and behavior approaches O(n) even with KD-trees.
2025-11-03 18:26:40 +00:00
## Interpreting results
Benchmarks output something like:
```
2025-11-04 01:44:16 +00:00
BenchmarkNearest_10k_4D_Gonum_Uniform-8 50000 12,300 ns/op 0 B/op 0 allocs/op
2025-11-03 18:26:40 +00:00
```
- `ns/op` : lower is better (nanoseconds per operation)
- `B/op` and `allocs/op` : memory behavior; fewer is better
2025-11-04 01:44:16 +00:00
- `KNearest` incurs extra work due to sorting; `Radius` cost scales with the number of hits.
2025-11-03 18:26:40 +00:00
## Improving performance
2025-11-04 01:44:16 +00:00
- Normalize and weight features once; reuse across queries (see `Build*WithStats` helpers).
- Choose a metric aligned with your policy: L2 usually a solid default; L1 for per-axis penalties; L∞ for hard-threshold dominated objectives.
- Batch queries to benefit from CPU caches.
- Prefer the Gonum backend for larger N and dims ≤ ~8; stick to Linear for tiny datasets or when using Cosine metrics.
2025-11-03 18:26:40 +00:00
## Reproducing and tracking performance
2025-11-04 01:44:16 +00:00
- Local (Linear): `go test -bench . -benchmem -run=^$ ./...`
- Local (Gonum): `go test -tags=gonum -bench . -benchmem -run=^$ ./...`
- CI artifacts: download `bench-linear.txt` and `bench-gonum.txt` from the latest workflow run.
- Optional: add historical trend graphs via Benchstat or Codecov integration.
## Sample results (from a recent local run)
Results vary by machine, Go version, and dataset seed. The following run was captured locally and is provided as a reference point.
- Machine: darwin/arm64, Apple M3 Ultra
- Package: `github.com/Snider/Poindexter`
- Command: `go test -bench . -benchmem -run=^$ ./... | tee bench.txt`
Full output:
```
goos: darwin
goarch: arm64
pkg: github.com/Snider/Poindexter
BenchmarkNearest_Linear_Uniform_1k_2D-32 409321 3001 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_1k_2D-32 413823 2888 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_10k_2D-32 43053 27809 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_10k_2D-32 42996 27936 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_1k_4D-32 326492 3746 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_1k_4D-32 338983 3857 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_10k_4D-32 35661 32985 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_10k_4D-32 35678 33388 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Clustered_1k_2D-32 425220 2874 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Clustered_1k_2D-32 420080 2849 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Clustered_10k_2D-32 43242 27776 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Clustered_10k_2D-32 42392 27889 ns/op 0 B/op 0 allocs/op
BenchmarkKNN10_Linear_Uniform_10k_2D-32 1206 977599 ns/op 164492 B/op 6 allocs/op
BenchmarkKNN10_Gonum_Uniform_10k_2D-32 1239 972501 ns/op 164488 B/op 6 allocs/op
BenchmarkKNN10_Linear_Clustered_10k_2D-32 1219 973242 ns/op 164492 B/op 6 allocs/op
BenchmarkKNN10_Gonum_Clustered_10k_2D-32 1214 971017 ns/op 164488 B/op 6 allocs/op
BenchmarkRadiusMid_Linear_Uniform_10k_2D-32 1279 917692 ns/op 947529 B/op 23 allocs/op
BenchmarkRadiusMid_Gonum_Uniform_10k_2D-32 1299 918176 ns/op 947529 B/op 23 allocs/op
BenchmarkRadiusMid_Linear_Clustered_10k_2D-32 1059 1123281 ns/op 1217866 B/op 24 allocs/op
BenchmarkRadiusMid_Gonum_Clustered_10k_2D-32 1063 1149507 ns/op 1217871 B/op 24 allocs/op
BenchmarkNearest_1k_2D-32 401595 2964 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_10k_2D-32 42129 28229 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_1k_4D-32 365626 3642 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_10k_4D-32 36298 33176 ns/op 0 B/op 0 allocs/op
BenchmarkKNearest10_1k_2D-32 20348 59568 ns/op 17032 B/op 6 allocs/op
BenchmarkKNearest10_10k_2D-32 1224 969093 ns/op 164488 B/op 6 allocs/op
BenchmarkRadiusMid_1k_2D-32 21867 53273 ns/op 77512 B/op 16 allocs/op
BenchmarkRadiusMid_10k_2D-32 1302 933791 ns/op 955720 B/op 23 allocs/op
PASS
ok github.com/Snider/Poindexter 40.102s
PASS
ok github.com/Snider/Poindexter/examples/dht_ping_1d 0.348s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_2d_ping_hop 0.266s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_3d_ping_hop_geo 0.272s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_4d_ping_hop_geo_score 0.269s
```
Notes:
- The first block shows dual-backend benchmarks (Linear vs Gonum) for uniform and clustered datasets at 2D/4D with N=1k/10k.
- The final block includes the legacy single-backend benchmarks for additional sizes; both are useful for comparison.
To compare against the optimized KD backend explicitly, build with `-tags=gonum` and/or download `bench-gonum.txt` from CI artifacts.