Skip to content

RFC-008: Temporal Index Architecture

See full RFC: design/CVX_RFC_008_Temporal_Index_Architecture.md

Three architectural improvements that transform CVX from a temporal vector index into a temporally-aware storage engine:

ImprovementWhat it solves
Temporal LSH (T-LSH)Eliminates the 4× over-fetch penalty for composite distance queries (α < 1.0)
Time-Partitioned ShardingSub-linear query latency on historical data via partition pruning
Streaming Window IndexHigh-throughput ingestion with hot buffer + compaction pipeline
  1. Over-fetch penalty: When α < 1.0, HNSW is organized by semantic distance only — temporally relevant but semantically distant nodes are never explored. The 4× over-fetch is a heuristic with bounded recall.

  2. O(N) filter bitmap: build_filter_bitmap() iterates ALL timestamps on every temporally-filtered query. For 10M points, this is a 10M-iteration loop.

  3. No partition pruning: A query for “last 24 hours” traverses the same full HNSW graph as “last 10 years.”

MetricCurrentTarget
”Last 24h” on 1-year dataFull HNSW + 10M filter scan1 partition (~30K pts)
Composite distance recall@10~70% (4× over-fetch)~90% (T-LSH multi-probe)
Write throughput~800 pts/sec~50K pts/sec (WAL + buffer)

T-LSH: Dual-Hash for Spatiotemporal Candidates

Section titled “T-LSH: Dual-Hash for Spatiotemporal Candidates”

Hash function: h_ST(v, t) = h_sem(v) ⊕ h_time(t) — concatenates semantic random hyperplane bits with temporal bucket bits. The bit ratio matches α: for α=0.5, equal semantic/temporal bits.

Fixed-duration partitions (default: 7 days), each with its own HNSW graph. Query routing prunes non-overlapping partitions. Hot → Warm → Cold lifecycle.

Write-Ahead Log → Hot Buffer (flat brute-force) → Compacted HNSW (partitioned). Queries search both buffer and compacted index, merge results.

3 phases:

  1. feat/time-partitions (P0) — PartitionedTemporalHnsw, global entity index, parallel fan-out — 18 tests
  2. feat/temporal-lsh (P1) — T-LSH auxiliary index, multi-probe query, α-adaptive bit allocation — 15 tests
  3. feat/streaming-index (P2) — HotBuffer + compaction pipeline + TemporalIndexAccess — 13 tests

Status: All 3 phases implemented and merged. 46 total tests.

MetricResult
Ingestion (ef=25)915 pts/sec
Ingestion (ef=200)482 pts/sec
Search latency (all time)0.35ms mean, 0.47ms p99
Search latency (temporal filter)0.28-0.33ms mean
Temporal filter speedup10-20% faster than unfiltered

See notebooks/T_Benchmark_Partitioned.ipynb for full results.