Poindexter/docs/perf.md

133 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Performance: KDTree benchmarks and guidance
This page summarizes how to measure KDTree performance in this repository and how to compare the two internal backends (Linear vs Gonum) that you can select at build/runtime.
## How benchmarks are organized
- Micro-benchmarks live in `bench_kdtree_test.go` and `bench_kdtree_dual_test.go` and cover:
- `Nearest` in 2D and 4D with N = 1k, 10k
- `KNearest(k=10)` in 2D/4D with N = 1k, 10k
- `Radius` (mid radius r≈0.5 after normalization) in 2D/4D with N = 1k, 10k
- Datasets: Uniform and 3-cluster synthetic generators in normalized [0,1] spaces.
- Backends: Linear (always available) and Gonum (enabled when built with `-tags=gonum`).
Run them locally:
```bash
# Linear backend (default)
go test -bench . -benchmem -run=^$ ./...
# Gonum backend (optimized KD; requires build tag)
go test -tags=gonum -bench . -benchmem -run=^$ ./...
```
GitHub Actions publishes benchmark artifacts on every push/PR:
- Linear job: artifact `bench-linear.txt`
- Gonum job: artifact `bench-gonum.txt`
## Backend selection and defaults
- Default backend is Linear.
- If you build with `-tags=gonum`, the default switches to the optimized KD backend.
- You can override at runtime:
```
// Force Linear
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendLinear))
// Force Gonum (requires build tag)
kdt, _ := poindexter.NewKDTree(pts, poindexter.WithBackend(poindexter.BackendGonum))
```
Supported metrics in the optimized backend: L2 (Euclidean), L1 (Manhattan), L∞ (Chebyshev). Cosine/Weighted-Cosine currently use the Linear backend.
## What to expect (rule of thumb)
- Linear backend: O(n) per query; fast for small-to-medium datasets (≤10k), especially in low dims (≤4).
- Gonum backend: typically sub-linear for prunable datasets and dims ≤ ~8, with noticeable gains as N grows (≥10k100k), especially on uniform or moderately clustered data and moderate radii.
- For large radii (many points within r) or highly correlated/pathological data, pruning may be less effective and behavior approaches O(n) even with KD-trees.
## Interpreting results
Benchmarks output something like:
```
BenchmarkNearest_10k_4D_Gonum_Uniform-8 50000 12,300 ns/op 0 B/op 0 allocs/op
```
- `ns/op`: lower is better (nanoseconds per operation)
- `B/op` and `allocs/op`: memory behavior; fewer is better
- `KNearest` incurs extra work due to sorting; `Radius` cost scales with the number of hits.
## Improving performance
- Normalize and weight features once; reuse across queries (see `Build*WithStats` helpers).
- Choose a metric aligned with your policy: L2 usually a solid default; L1 for per-axis penalties; L∞ for hard-threshold dominated objectives.
- Batch queries to benefit from CPU caches.
- Prefer the Gonum backend for larger N and dims ≤ ~8; stick to Linear for tiny datasets or when using Cosine metrics.
## Reproducing and tracking performance
- Local (Linear): `go test -bench . -benchmem -run=^$ ./...`
- Local (Gonum): `go test -tags=gonum -bench . -benchmem -run=^$ ./...`
- CI artifacts: download `bench-linear.txt` and `bench-gonum.txt` from the latest workflow run.
- Optional: add historical trend graphs via Benchstat or Codecov integration.
## Sample results (from a recent local run)
Results vary by machine, Go version, and dataset seed. The following run was captured locally and is provided as a reference point.
- Machine: darwin/arm64, Apple M3 Ultra
- Package: `github.com/Snider/Poindexter`
- Command: `go test -bench . -benchmem -run=^$ ./... | tee bench.txt`
Full output:
```
goos: darwin
goarch: arm64
pkg: github.com/Snider/Poindexter
BenchmarkNearest_Linear_Uniform_1k_2D-32 409321 3001 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_1k_2D-32 413823 2888 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_10k_2D-32 43053 27809 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_10k_2D-32 42996 27936 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_1k_4D-32 326492 3746 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_1k_4D-32 338983 3857 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Uniform_10k_4D-32 35661 32985 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Uniform_10k_4D-32 35678 33388 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Clustered_1k_2D-32 425220 2874 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Clustered_1k_2D-32 420080 2849 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Linear_Clustered_10k_2D-32 43242 27776 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_Gonum_Clustered_10k_2D-32 42392 27889 ns/op 0 B/op 0 allocs/op
BenchmarkKNN10_Linear_Uniform_10k_2D-32 1206 977599 ns/op 164492 B/op 6 allocs/op
BenchmarkKNN10_Gonum_Uniform_10k_2D-32 1239 972501 ns/op 164488 B/op 6 allocs/op
BenchmarkKNN10_Linear_Clustered_10k_2D-32 1219 973242 ns/op 164492 B/op 6 allocs/op
BenchmarkKNN10_Gonum_Clustered_10k_2D-32 1214 971017 ns/op 164488 B/op 6 allocs/op
BenchmarkRadiusMid_Linear_Uniform_10k_2D-32 1279 917692 ns/op 947529 B/op 23 allocs/op
BenchmarkRadiusMid_Gonum_Uniform_10k_2D-32 1299 918176 ns/op 947529 B/op 23 allocs/op
BenchmarkRadiusMid_Linear_Clustered_10k_2D-32 1059 1123281 ns/op 1217866 B/op 24 allocs/op
BenchmarkRadiusMid_Gonum_Clustered_10k_2D-32 1063 1149507 ns/op 1217871 B/op 24 allocs/op
BenchmarkNearest_1k_2D-32 401595 2964 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_10k_2D-32 42129 28229 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_1k_4D-32 365626 3642 ns/op 0 B/op 0 allocs/op
BenchmarkNearest_10k_4D-32 36298 33176 ns/op 0 B/op 0 allocs/op
BenchmarkKNearest10_1k_2D-32 20348 59568 ns/op 17032 B/op 6 allocs/op
BenchmarkKNearest10_10k_2D-32 1224 969093 ns/op 164488 B/op 6 allocs/op
BenchmarkRadiusMid_1k_2D-32 21867 53273 ns/op 77512 B/op 16 allocs/op
BenchmarkRadiusMid_10k_2D-32 1302 933791 ns/op 955720 B/op 23 allocs/op
PASS
ok github.com/Snider/Poindexter 40.102s
PASS
ok github.com/Snider/Poindexter/examples/dht_ping_1d 0.348s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_2d_ping_hop 0.266s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_3d_ping_hop_geo 0.272s
PASS
ok github.com/Snider/Poindexter/examples/kdtree_4d_ping_hop_geo_score 0.269s
```
Notes:
- The first block shows dual-backend benchmarks (Linear vs Gonum) for uniform and clustered datasets at 2D/4D with N=1k/10k.
- The final block includes the legacy single-backend benchmarks for additional sizes; both are useful for comparison.
To compare against the optimized KD backend explicitly, build with `-tags=gonum` and/or download `bench-gonum.txt` from CI artifacts.