Skip to content

RFC-002: Correctness & Performance

This RFC documents 10 identified weaknesses in the current ChronosVector implementation and proposes concrete fixes for each. Issues span durability, algorithmic correctness, concurrency, numerical stability, and API robustness.

#SeverityIssueEffortStatus
01CRITICALWAL fsync not guaranteed on commitLow✅ Done
02HIGHNaN panic in distance sortLow✅ Done
03HIGHHeuristic neighbor condition too restrictiveMedium✅ Done
04HIGHSingle-writer RwLock bottleneckHigh✅ Done
05HIGHgRPC delivery semantics undefinedHigh✅ Done
06MEDIUMPELT candidate set unbounded growthMedium✅ Done
07MEDIUMRandom level generation unboundedLow✅ Done
08MEDIUMWarm store O(N) range scansMedium✅ Done
09MEDIUMPQ codebook initialization biasMedium✅ Done
10LOWODE solver no stiffness detectionHighDeferred

RFC-002-01, 002-02, 002-07 — Less than 20 lines total. Prevent data loss and panics.

RFC-002-03, 002-06 — Fix heuristic neighbor selection (Malkov §4.2 Algorithm 4) and PELT pruning bounds.

RFC-002-04, 002-05, 002-08, 002-09 — Concurrent insert architecture, gRPC exactly-once semantics, warm store zone maps, PQ k-means++ initialization.

RFC-002-10 — ODE stiffness detection. Triggers when burn ML backend is integrated.


Severity: CRITICAL | Effort: Low

Wal::commit() calls BufWriter::flush() (pushes data to OS page cache) and persist_meta() (atomic rename), but never calls sync_all() on the segment file. If the process crashes between flush and the OS flushing the page cache to disk, committed entries are lost.

wal/mod.rs
pub fn commit(&mut self, sequence: u64) -> Result<(), StorageError> {
self.meta.committed_sequence = sequence;
self.flush()?; // BufWriter::flush — NOT fsync
self.persist_meta()?; // rename — NOT directory fsync
Ok(())
}
pub fn commit(&mut self, sequence: u64) -> Result<(), StorageError> {
self.meta.committed_sequence = sequence;
if let Some(ref mut writer) = self.current_writer {
writer.flush()?;
writer.get_ref().sync_all()?; // Sync segment to disk
}
self.persist_meta()?;
let dir = std::fs::File::open(&self.dir)?;
dir.sync_all()?; // Sync directory to make rename durable
Ok(())
}
  • Pillai et al., “All File Systems Are Not Created Equal” (OSDI 2014)
  • Zheng et al., “BtrBlk: Efficient B-tree Logging” (SIGMOD 2022)

Severity: HIGH | Effort: Low

6 occurrences of partial_cmp(&b.1).unwrap() across HNSW modules. If any distance is NaN, the server panics.

// Replace all occurrences:
result_vec.sort_by(|a, b| a.1.total_cmp(&b.1));

f32::total_cmp (stable since Rust 1.62) defines total ordering including NaN.

  • IEEE 754-2019, §5.10: totalOrder predicate

Severity: HIGH | Effort: Medium

Current condition uses <= (non-strict), should be < (strict) per Malkov Algorithm 4:

// Current (too permissive for equidistant candidates):
cand_dist <= dist_to_selected
// Correct (per paper):
cand_dist < dist_to_selected

1-3% recall improvement at high dimensionality (D≥128).

  • Malkov & Yashunin, “Efficient and Robust Approximate Nearest Neighbor Using HNSW Graphs” (IEEE TPAMI 2018), §4.2
  • Fu et al., “Fast ANN Search with the Navigating Spreading-out Graph” (VLDB 2019)

RFC-002-04: Concurrent Insert Architecture

Section titled “RFC-002-04: Concurrent Insert Architecture”

Severity: HIGH | Effort: High

Single RwLock<TemporalHnsw> serializes all inserts and blocks all searches during insert.

OptionApproachLatency ImpactComplexity
A (recommended)Insert queue + batch commitEventual visibility (~10ms)Medium
BPer-node locksImmediate visibilityHigh
CSegmented graphImmediate, partitionedHigh
  • Singh et al., “FreshDiskANN” (arXiv 2023) — per-node concurrent insert
  • Subramanya et al., “DiskANN” (NeurIPS 2019) — lock-free graph traversal

Severity: HIGH | Effort: High

  • No deduplication on re-sent points
  • Only last ACK returned (partial failures invisible)
  • No backpressure mechanism

Client assigns sequence numbers. Server ACKs every N points with committed WAL sequence. On reconnect, client resumes from last ACK.

  • Kreps et al., “Kafka: A Distributed Messaging System” (NetDB 2011)

Severity: MEDIUM | Effort: Medium

Pruning fallback f[t] == f64::INFINITY retains all candidates during early iterations, allowing O(N) candidate set growth.

Remove infinity fallback and add explicit cap on candidate set size.

  • Killick et al., “Optimal Detection of Changepoints with Linear Computational Cost” (JASA 2012), Theorem 3.1

Severity: MEDIUM | Effort: Low

let level = (-r.ln() * self.config.level_mult).floor() as usize;
level.min(32) // Cap at theoretical max for 10^38 nodes
  • Malkov & Yashunin (2018), §4.1

Severity: MEDIUM | Effort: Medium

Range queries deserialize ALL points then filter in memory. O(N) for selective queries.

Per-chunk manifest with (min_ts, max_ts). Skip non-overlapping chunks.

  • Lamb et al., “The Vertica Analytic Database” (VLDB 2012) — zone maps

Severity: MEDIUM | Effort: Medium

Sequential initialization from temporally ordered data biases centroids toward early observations.

k-means++ initialization: sample centroids proportional to D2(x)D^2(x).

  • Arthur & Vassilvitskii, “k-means++: The Advantages of Careful Seeding” (SODA 2007)
  • Jégou et al., “Product Quantization for Nearest Neighbor Search” (IEEE TPAMI 2011)

Severity: LOW | Status: Deferred

Dormand-Prince (explicit RK45) cannot handle stiff systems efficiently. Deferred until Neural ODE training (burn backend) is implemented.

  • Hairer & Wanner, “Solving ODEs II: Stiff Problems” (Springer, 2002)
  • Chen et al., “Neural Ordinary Differential Equations” (NeurIPS 2018)

  1. Arthur & Vassilvitskii. “k-means++: The Advantages of Careful Seeding.” SODA 2007.
  2. Chen et al. “Neural Ordinary Differential Equations.” NeurIPS 2018.
  3. Fu et al. “Fast ANN Search with the Navigating Spreading-out Graph.” VLDB 2019.
  4. Ge et al. “Optimized Product Quantization.” IEEE TPAMI 2014.
  5. Hairer & Wanner. “Solving Ordinary Differential Equations II.” Springer, 2002.
  6. IEEE 754-2019. “Standard for Floating-Point Arithmetic.”
  7. Jégou, Douze & Schmid. “Product Quantization for Nearest Neighbor Search.” IEEE TPAMI 2011.
  8. Kidger. “On Neural Differential Equations.” PhD Thesis, Oxford, 2022.
  9. Killick, Fearnhead & Eckley. “Optimal Detection of Changepoints with Linear Computational Cost.” JASA 2012.
  10. Kreps, Narkhede & Rao. “Kafka: A Distributed Messaging System.” NetDB 2011.
  11. Lamb et al. “The Vertica Analytic Database: C-Store 7 Years Later.” VLDB 2012.
  12. Maidstone et al. “On Optimal Multiple Changepoint Algorithms for Large Data.” Statistics & Computing 2017.
  13. Malkov & Yashunin. “Efficient and Robust ANN Using HNSW Graphs.” IEEE TPAMI 2018.
  14. Pillai et al. “All File Systems Are Not Created Equal.” OSDI 2014.
  15. Singh et al. “FreshDiskANN: Streaming Similarity Search.” arXiv 2023.
  16. Subramanya et al. “DiskANN: Billion-point Nearest Neighbor Search.” NeurIPS 2019.
  17. Zheng et al. “BtrBlk: Efficient B-tree Logging for Database Storage.” SIGMOD 2022.