Performance Benchmarks

All benchmarks run on Apple M-series (ARM64), single-threaded unless noted. HNSW parameters: M=16, ef_construction=200. Criterion benchmarks use release mode; recall metrics use debug mode (conservative).

Index Performance (ST-HNSW)

Recall@10

Dataset	Metric	ef_search	Recall@10	Reachability
1K D=32	L2	50	1.000	100%
1K D=128	L2	50	0.977	100%
10K D=128	L2	200	0.966	100%
1K D=32	Cosine	50	1.000	100%
1K D=128	Cosine	50	0.989	100%

Key observations:

100% reachability across all configurations (heuristic neighbor selection, RFC-002-03)
Recall degrades gracefully with dimensionality, recoverable by increasing ef_search
Cosine and L2 show comparable recall at same parameters

Temporal Filtering (ST-HNSW)

Configuration	Recall@10
1K D=32, alpha=1.0 (pure semantic)	1.000
1K D=32, Range[25K,75K], alpha=1.0	1.000
1K D=32, alpha=0.5 (mixed)	0.791

Lower recall at alpha=0.5 is expected — the composite distance prioritizes temporal proximity over semantic similarity, trading recall for temporal relevance.

Alpha vs Temporal Proximity

Alpha	Avg Temporal Distance
1.00 (pure semantic)	115,400 µs
0.75	85,900 µs
0.50	85,900 µs
0.25	44,000 µs
0.00 (pure temporal)	32,100 µs

As alpha decreases, results become temporally closer to the query timestamp — validating the composite distance formula $d_{ST} = \alpha \cdot d_{sem} + (1-\alpha) \cdot d_{time}$ .

Distance Kernel Performance

Dimension	Latency	Throughput
D=32	4.5 ns	222M pairs/sec
D=128	14.7 ns	68M pairs/sec
D=256	33.3 ns	30M pairs/sec
D=768	128.7 ns	7.8M pairs/sec
D=1536	305.1 ns	3.3M pairs/sec

Dimension	Latency	Throughput
D=32	10.4 ns	96M pairs/sec
D=128	35.3 ns	28M pairs/sec
D=256	87.2 ns	11M pairs/sec
D=768	328.6 ns	3.0M pairs/sec
D=1536	787.1 ns	1.3M pairs/sec

Dimension	Latency	Throughput
D=32	3.9 ns	256M pairs/sec
D=128	11.7 ns	85M pairs/sec
D=256	28.0 ns	36M pairs/sec
D=768	110.9 ns	9.0M pairs/sec
D=1536	266.8 ns	3.7M pairs/sec

10K cosine pairs at D=768: 3.39 ms (2.95M pairs/sec batch throughput)

Memory Efficiency

Roaring Bitmap (Temporal Filter Index)

Vectors	Memory	Bytes/Vector
1,000	2,016 B	2.016
10,000	8,208 B	0.821
100,000	16,408 B	0.164

Sub-byte per vector at scale — Roaring Bitmap compression becomes more effective with larger cardinality.

Cold Storage (Product Quantization)

Configuration	Compression Ratio
D=768, M=8, K=256	384×
Original: 3,072 bytes/vector	PQ code: 8 bytes/vector

Analytics Performance

Change Point Detection (PELT)

Scenario	Result
Stationary (200 pts, D=3)	0 false positives
1 planted CP (200 pts)	Detected, severity=1.000
3 planted CPs (200 pts)	Precision=1.0, Recall=1.0, F1=1.0

Online Detection (EWMA)

Metric	Value
Stationary FPR (500 obs)	0.0000
Detection latency	0 observations

Stochastic Characterization

Process	Hurst Exponent	Expected	ADF Statistic
Brownian motion (n=2000)	0.542	≈0.5	-3.015
OU process (n=2000, θ=0.3)	0.589	<0.5	-22.227

Test Suite Summary

Crate	Tests	Coverage
cvx-core	47	Types, traits, config, errors
cvx-index	81	HNSW, distance, temporal, concurrent
cvx-storage	71	Memory, hot, warm, cold, WAL, tiered
cvx-analytics	79	ODE, PELT, BOCPD, calculus, ML
cvx-query	11	8 query types
cvx-ingest	20	Validation pipeline
cvx-api	18	REST integration (all endpoints)
cvx-python	22	PyO3 bindings (pytest)
Total	349	—

Reproducing Benchmarks

# Distance kernel benchmarks (Criterion)
cargo bench -p cvx-index --bench distance_kernels

# HNSW search benchmarks (Criterion)
cargo bench -p cvx-index --bench hnsw_search

# Analytics benchmarks (Criterion)
cargo bench -p cvx-analytics --bench analytics_bench

# Recall & metrics report
cargo test -p cvx-index --test metrics_report -- --nocapture
cargo test -p cvx-analytics --test metrics_report -- --nocapture