133 lines
7.3 KiB
Markdown
133 lines
7.3 KiB
Markdown
# Performance: KDTree benchmarks and guidance
|
||
|
||
This page summarizes how to measure KDTree performance in this repository and how to compare the two internal backends (Linear vs Gonum) that you can select at build/runtime.
|
||
|
||
## How benchmarks are organized
|
||
|
||
- Micro-benchmarks live in `bench_kdtree_test.go` and `bench_kdtree_dual_test.go` and cover:
|
||
- `Nearest` in 2D and 4D with N = 1k, 10k
|
||
- `KNearest(k=10)` in 2D/4D with N = 1k, 10k
|
||
- `Radius` (mid radius r≈0.5 after normalization) in 2D/4D with N = 1k, 10k
|
||
- Datasets: Uniform and 3-cluster synthetic generators in normalized [0,1] spaces.
|
||
- Backends: Linear (always available) and Gonum (enabled when built with `-tags=gonum`).
|
||
|
||
Run them locally:
|
||
|
||
```bash
|
||
# Linear backend (default)
|
||
go test -bench . -benchmem -run=^$ ./...
|
||
|
||
# Gonum backend (optimized KD; requires build tag)
|
||
go test -tags=gonum -bench . -benchmem -run=^$ ./...
|
||
```
|
||
|
||
GitHub Actions publishes benchmark artifacts on every push/PR:
|
||
- Linear job: artifact `bench-linear.txt`
|
||
- Gonum job: artifact `bench-gonum.txt`
|
||
|
||
## Backend selection and defaults
|
||
|
||
- Default backend is Linear.
|
||
- If you build with `-tags=gonum`, the default switches to the optimized KD backend.
|
||
- You can override at runtime:
|
||
|
||
```
|
||
// Force Linear
|
||
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendLinear))
|
||
// Force Gonum (requires build tag)
|
||
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendGonum))
|
||
```
|
||
|
||
Supported metrics in the optimized backend: L2 (Euclidean), L1 (Manhattan), L∞ (Chebyshev). Cosine/Weighted-Cosine currently use the Linear backend.
|
||
|
||
## What to expect (rule of thumb)
|
||
|
||
- Linear backend: O(n) per query; fast for small-to-medium datasets (≤10k), especially in low dims (≤4).
|
||
- Gonum backend: typically sub-linear for prunable datasets and dims ≤ ~8, with noticeable gains as N grows (≥10k–100k), especially on uniform or moderately clustered data and moderate radii.
|
||
- For large radii (many points within r) or highly correlated/pathological data, pruning may be less effective and behavior approaches O(n) even with KD-trees.
|
||
|
||
## Interpreting results
|
||
|
||
Benchmarks output something like:
|
||
|
||
```
|
||
BenchmarkNearest_10k_4D_Gonum_Uniform-8 50000 12,300 ns/op 0 B/op 0 allocs/op
|
||
```
|
||
|
||
- `ns/op`: lower is better (nanoseconds per operation)
|
||
- `B/op` and `allocs/op`: memory behavior; fewer is better
|
||
- `KNearest` incurs extra work due to sorting; `Radius` cost scales with the number of hits.
|
||
|
||
## Improving performance
|
||
|
||
- Normalize and weight features once; reuse across queries (see `Build*WithStats` helpers).
|
||
- Choose a metric aligned with your policy: L2 usually a solid default; L1 for per-axis penalties; L∞ for hard-threshold dominated objectives.
|
||
- Batch queries to benefit from CPU caches.
|
||
- Prefer the Gonum backend for larger N and dims ≤ ~8; stick to Linear for tiny datasets or when using Cosine metrics.
|
||
|
||
## Reproducing and tracking performance
|
||
|
||
- Local (Linear): `go test -bench . -benchmem -run=^$ ./...`
|
||
- Local (Gonum): `go test -tags=gonum -bench . -benchmem -run=^$ ./...`
|
||
- CI artifacts: download `bench-linear.txt` and `bench-gonum.txt` from the latest workflow run.
|
||
- Optional: add historical trend graphs via Benchstat or Codecov integration.
|
||
|
||
## Sample results (from a recent local run)
|
||
|
||
Results vary by machine, Go version, and dataset seed. The following run was captured locally and is provided as a reference point.
|
||
|
||
- Machine: darwin/arm64, Apple M3 Ultra
|
||
- Package: `github.com/Snider/Poindexter`
|
||
- Command: `go test -bench . -benchmem -run=^$ ./... | tee bench.txt`
|
||
|
||
Full output:
|
||
|
||
```
|
||
goos: darwin
|
||
goarch: arm64
|
||
pkg: github.com/Snider/Poindexter
|
||
BenchmarkNearest_Linear_Uniform_1k_2D-32 409321 3001 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Uniform_1k_2D-32 413823 2888 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Linear_Uniform_10k_2D-32 43053 27809 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Uniform_10k_2D-32 42996 27936 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Linear_Uniform_1k_4D-32 326492 3746 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Uniform_1k_4D-32 338983 3857 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Linear_Uniform_10k_4D-32 35661 32985 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Uniform_10k_4D-32 35678 33388 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Linear_Clustered_1k_2D-32 425220 2874 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Clustered_1k_2D-32 420080 2849 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Linear_Clustered_10k_2D-32 43242 27776 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_Gonum_Clustered_10k_2D-32 42392 27889 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkKNN10_Linear_Uniform_10k_2D-32 1206 977599 ns/op 164492 B/op 6 allocs/op
|
||
BenchmarkKNN10_Gonum_Uniform_10k_2D-32 1239 972501 ns/op 164488 B/op 6 allocs/op
|
||
BenchmarkKNN10_Linear_Clustered_10k_2D-32 1219 973242 ns/op 164492 B/op 6 allocs/op
|
||
BenchmarkKNN10_Gonum_Clustered_10k_2D-32 1214 971017 ns/op 164488 B/op 6 allocs/op
|
||
BenchmarkRadiusMid_Linear_Uniform_10k_2D-32 1279 917692 ns/op 947529 B/op 23 allocs/op
|
||
BenchmarkRadiusMid_Gonum_Uniform_10k_2D-32 1299 918176 ns/op 947529 B/op 23 allocs/op
|
||
BenchmarkRadiusMid_Linear_Clustered_10k_2D-32 1059 1123281 ns/op 1217866 B/op 24 allocs/op
|
||
BenchmarkRadiusMid_Gonum_Clustered_10k_2D-32 1063 1149507 ns/op 1217871 B/op 24 allocs/op
|
||
BenchmarkNearest_1k_2D-32 401595 2964 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_10k_2D-32 42129 28229 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_1k_4D-32 365626 3642 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkNearest_10k_4D-32 36298 33176 ns/op 0 B/op 0 allocs/op
|
||
BenchmarkKNearest10_1k_2D-32 20348 59568 ns/op 17032 B/op 6 allocs/op
|
||
BenchmarkKNearest10_10k_2D-32 1224 969093 ns/op 164488 B/op 6 allocs/op
|
||
BenchmarkRadiusMid_1k_2D-32 21867 53273 ns/op 77512 B/op 16 allocs/op
|
||
BenchmarkRadiusMid_10k_2D-32 1302 933791 ns/op 955720 B/op 23 allocs/op
|
||
PASS
|
||
ok github.com/Snider/Poindexter 40.102s
|
||
PASS
|
||
ok github.com/Snider/Poindexter/examples/dht_ping_1d 0.348s
|
||
PASS
|
||
ok github.com/Snider/Poindexter/examples/kdtree_2d_ping_hop 0.266s
|
||
PASS
|
||
ok github.com/Snider/Poindexter/examples/kdtree_3d_ping_hop_geo 0.272s
|
||
PASS
|
||
ok github.com/Snider/Poindexter/examples/kdtree_4d_ping_hop_geo_score 0.269s
|
||
```
|
||
|
||
Notes:
|
||
- The first block shows dual-backend benchmarks (Linear vs Gonum) for uniform and clustered datasets at 2D/4D with N=1k/10k.
|
||
- The final block includes the legacy single-backend benchmarks for additional sizes; both are useful for comparison.
|
||
|
||
To compare against the optimized KD backend explicitly, build with `-tags=gonum` and/or download `bench-gonum.txt` from CI artifacts.
|