RFC-012: Performance, Correctness & Agent Memory Architecture
Status: Proposed
Section titled “Status: Proposed”Part A: Parallel HNSW Construction
Section titled “Part A: Parallel HNSW Construction”Problem
Section titled “Problem”Building an HNSW index for 1.3M points × D=768 takes ~30min on a single core.
bulk_insert is fully sequential — each insertion does a greedy search + neighbor connection.
Proposed Solution
Section titled “Proposed Solution”Two-phase parallel construction with rayon:
- Sequential node allocation (fast): assign node IDs and levels, add vectors to storage
- Parallel neighbor connection (slow part): partition nodes into chunks, each thread connects neighbors using
RwLockon the graph adjacency lists
Expected speedup
Section titled “Expected speedup”~4-6x on 8 cores. The bottleneck is distance computation during neighbor search (O(ef_construction × D) per node), which is embarrassingly parallel across nodes in the same level.
Alternatives considered
Section titled “Alternatives considered”| Approach | Speedup | Complexity | Trade-off |
|---|---|---|---|
| Parallel rayon insertion | 4-6x | Medium | Lock contention on shared neighbors |
| PCA pre-reduction (768→128) | ~6x | Low | Loses precision for anchor projections |
| Scalar quantization during build | ~2x | Low (already implemented) | Approximate distances |
| Bottom-up batch construction | ~10x | High | Requires restructuring the graph builder |
Implementation notes
Section titled “Implementation notes”- rayon already in workspace dependencies
ConcurrentTemporalHnswalready usesRwLock— extend to build phase- Must maintain insertion order determinism for reproducibility (optional flag)
Part B: Native Embedding Space Centering (Anisotropy Correction)
Section titled “Part B: Native Embedding Space Centering (Anisotropy Correction)”Problem
Section titled “Problem”Modern sentence embedding models (MentalRoBERTa, sentence-transformers, OpenAI, Cohere) produce embeddings that occupy a narrow cone in the high-dimensional space — a phenomenon known as representation anisotropy. All vectors share a dominant component (the “average text” direction), and the discriminative signal is compressed into a small residual.
Empirically observed in CVX with MentalRoBERTa (D=768) on eRisk data:
| Metric | Before centering | After centering |
|---|---|---|
| Depression user → depressed_mood anchor | cosine sim 0.975 | cosine sim 0.42 |
| Control user → depressed_mood anchor | cosine sim 0.964 | cosine sim 0.09 |
| Discriminative gap | 0.011 | 0.33 |
The gap increases 30× after centering. Without centering, anchor projections, drift measurements, and similarity searches all operate on a signal buried under shared bias.
Background & References
Section titled “Background & References”The anisotropy problem in contextual embeddings is well-documented:
-
Ethayarajh (2019) — “How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2”. EMNLP 2019. First systematic measurement showing BERT embeddings are anisotropic — all representations occupy a narrow cone, with average cosine similarity between random sentences > 0.95.
-
Li et al. (2020) — “On the Sentence Embeddings from Pre-trained Language Models”. EMNLP 2020. Shows that BERT sentence embeddings have a dominant direction that accounts for most of the variance. Proposes BERT-flow (normalizing flow transformation) to correct the distribution.
-
Su et al. (2021) — “Whitening Sentence Representations for Better Semantics and Faster Retrieval”. ACL 2021. Proposes whitening (centering + rotation to decorrelate dimensions) as a simpler alternative to flow-based correction. Shows that even simple mean-centering significantly improves semantic similarity tasks.
-
Huang et al. (2021) — “WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach”. EMNLP Findings 2021. Confirms that centering + optional whitening improves STS benchmarks without any fine-tuning, across multiple models.
-
Rajaee & Pilehvar (2021) — “A Cluster-based Approach for Improving Isotropy in Contextual Embedding Space”. ACL 2021. Analyzes the geometric structure of the anisotropic cone and proposes cluster-based correction.
The consistent finding across all papers: subtracting the mean embedding vector is the single most impactful correction, often recovering 70-90% of the performance gap between anisotropic and isotropic representations.
Relevance to CVX
Section titled “Relevance to CVX”CVX computes temporal analytics (drift, velocity, changepoints, anchor projections) on embedding trajectories. All of these operations use cosine distance. In an anisotropic space, cosine distances are compressed into a narrow range, causing:
- Anchor projections (
project_to_anchors): All posts equidistant to all anchors - Drift measurements (
drift,velocity): Signal-to-noise ratio degraded - HNSW search (
search): Nearest-neighbor quality reduced (many false ties) - Changepoint detection (
detect_changepoints): Reduced sensitivity to real regime changes - Region quality (
regions,region_assignments): Regions semantically less meaningful
Centering is a universal fix that benefits all downstream operations regardless of the specific embedding model used.
Proposed API
Section titled “Proposed API”// In TemporalHnswpub struct TemporalHnsw<D: DistanceMetric> { // ... existing fields ... centroid: Option<Vec<f32>>, // NEW: global mean for centering}Option 1: Manual centroid
Section titled “Option 1: Manual centroid”# Python APIcentroid = index.compute_centroid() # O(N×D) single pass over stored vectorsindex.set_centroid(centroid) # All subsequent operations use centered distances
# Or provide an external centroid (e.g., from a larger corpus)index.set_centroid(precomputed_centroid)Option 2: Auto-centering on build
Section titled “Option 2: Auto-centering on build”index = cvx.TemporalIndex(m=16, ef_construction=200, centering=True)index.bulk_insert(entity_ids, timestamps, vectors)# Centroid computed automatically from inserted vectors# Stored alongside the index in the .cvx fileOption 3: Centering as distance metric wrapper
Section titled “Option 3: Centering as distance metric wrapper”pub struct CenteredCosine { inner: CosineDistance, centroid: Vec<f32>,}
impl DistanceMetric for CenteredCosine { fn distance(&self, a: &[f32], b: &[f32]) -> f32 { // Center both vectors, then compute cosine let a_c: Vec<f32> = a.iter().zip(&self.centroid).map(|(x, c)| x - c).collect(); let b_c: Vec<f32> = b.iter().zip(&self.centroid).map(|(x, c)| x - c).collect(); self.inner.distance(&a_c, &b_c) }}Recommended approach
Section titled “Recommended approach”Option 1 (manual centroid) for initial implementation:
- Simplest, no breaking changes
compute_centroid(): single O(N×D) passset_centroid(): stores in struct, serialized with index- All functions that compute distances check
self.centroid.is_some()and center before computing project_to_anchorscenters both the trajectory vectors AND the anchor vectors
This is non-invasive: existing indices without a centroid continue to work unchanged.
Affected functions
Section titled “Affected functions”| Function | Current behavior | With centering |
|---|---|---|
project_to_anchors | cosine(raw_vec, raw_anchor) | cosine(vec - μ, anchor - μ) |
drift | cosine(raw_v1, raw_v2) | cosine(v1 - μ, v2 - μ) |
velocity | Δ(raw vectors) / Δt | Δ(centered vectors) / Δt |
search | kNN on raw space | kNN on centered space |
detect_changepoints | PELT on raw distances | PELT on centered distances |
region_assignments | assign_region on raw vectors | assign_region on centered vectors |
Beyond centering: whitening
Section titled “Beyond centering: whitening”Full whitening (centering + rotation by inverse covariance) would further decorrelate dimensions, but requires computing and storing a D×D matrix. For D=768, this is 2.4M floats (~9MB). The marginal improvement over centering alone is typically small (Su et al. 2021 report ~2-5% STS improvement from whitening vs centering-only).
Recommendation: Implement centering first. Add whitening as optional enhancement later if benchmarks show meaningful improvement on CVX-specific tasks.
Implementation plan
Section titled “Implementation plan”| Phase | Work | Complexity |
|---|---|---|
| 1 | compute_centroid() + set_centroid() + serialize in snapshot | Low |
| 2 | Center vectors in project_to_anchors, drift, velocity | Low |
| 3 | Center in search and assign_region (affects HNSW traversal) | Medium |
| 4 | Auto-centering mode in bulk_insert | Low |
| 5 | Python bindings + notebook validation | Low |
| 6 | Optional whitening (compute_whitening_transform()) | Medium |
Part C: Architecture Review — Gaps & Refactoring Priorities
Section titled “Part C: Architecture Review — Gaps & Refactoring Priorities”Context
Section titled “Context”An architecture audit was conducted evaluating CVX as a tool for AI agent long-term memory — specifically for storing and retrieving successful action sequences dependent on context. This section documents the findings.
Current capabilities for agent memory
Section titled “Current capabilities for agent memory”CVX already supports episodic memory via episode_encoding.rs:
// entity_id = (episode_id << 16) | step_index// Max 281 trillion episodes × 65535 steps eachencode_entity_id(episode_id, step_index) -> u64decode_entity_id(entity_id) -> (episode_id, step_index)episode_range(episode_id) -> (start_id, end_id)Validated in notebooks E1–E4:
| Experiment | Task | Baseline | CVX Memory | Improvement |
|---|---|---|---|---|
| E1 (code gen) | MBPP → HumanEval | — | 77.8% pass@1 | Episodic retrieval works |
| E3 (ALFWorld) | Interactive RL | 3.3% completion | 20.0% completion | 6× with causal retrieval |
| E4 (debugging) | APPS retries | 28.0% | 31.0% | +3 rescued problems |
Identified architectural gaps
Section titled “Identified architectural gaps”Gap 1: No outcome awareness
Section titled “Gap 1: No outcome awareness”CVX stores vectors but has no concept of success or failure. An agent searching “what did I do in similar states?” retrieves ALL experiences without distinguishing successful from failed ones.
Impact: Retrieval noise — failed strategies pollute the result set.
Proposed extension:
// New field in TemporalPoint or as indexed metadatapub struct OutcomeAnnotation { reward: f32, // Continuous reward signal success: bool, // Binary outcome outcome_vector: Vec<f32>, // Optional: embedding of the final state}Python API:
index.insert(entity_id, timestamp, vector, reward=1.0)results = index.search(query, k=5, min_reward=0.5) # Only successful experiencesComplexity: Low. Reward is a float stored alongside the vector; filtered via bitmap like temporal filtering.
Gap 2: Causal continuation not exposed
Section titled “Gap 2: Causal continuation not exposed”The most valuable pattern for agents: “given a similar state, what steps
came AFTER?”. TemporalGraphIndex (RFC-010) implements this with
predecessor/successor edges, but:
ConcurrentTemporalHnswwrapsTemporalHnsw, NOTTemporalGraphIndexcausal_searchis not available in the Python API- The temporal edge layer is invisible to end users
Impact: The primary agent memory pattern requires manual multi-step reconstruction in Python instead of a single native call.
Proposed fix: Restructure the wrapper chain:
Current: ConcurrentTemporalHnsw<D> → RwLock<TemporalHnsw<D>>Proposed: ConcurrentTemporalHnsw<D> → RwLock<TemporalGraphIndex<D>> └── inner: TemporalHnsw<D> └── edges: TemporalEdgeLayerExpose in Python:
results = index.causal_search( query=embedding, k=5, continuation_steps=10 # Return next 10 steps from each match)# results[i] = {# "match": (entity_id, timestamp, score),# "continuation": [(entity_id, timestamp, vector), ...]# }Complexity: Medium. The data structures exist; needs wiring.
Gap 3: No structured context filtering
Section titled “Gap 3: No structured context filtering”An agent needs: “in situations similar to X when the goal was Y”. Currently, context is mixed into the embedding — there’s no way to filter by goal, task type, or environment state.
Metadata exists (HashMap<String, String>) but is post-filtered (over-fetch
4k candidates → filter → take k). Not indexed.
Proposed extension: Inverted index on metadata keys:
pub struct IndexedMetadata { // key → value → RoaringBitmap of matching node_ids indices: HashMap<String, HashMap<String, RoaringBitmap>>,}This allows O(1) membership checks during HNSW traversal, same as temporal filtering. Pre-filter instead of post-filter.
Python API:
index.insert(entity_id, ts, vec, metadata={"goal": "clean", "room": "kitchen"})results = index.search(query, k=5, metadata={"goal": "clean"}) # Pre-filteredComplexity: Medium. Mirrors the temporal bitmap pattern.
Gap 4: Memory consolidation — deferred to roadmap
Section titled “Gap 4: Memory consolidation — deferred to roadmap”Biological memory consolidates repeated experiences into prototypes. At scale (10M+ episodes), unconsolidated accumulation may degrade retrieval quality through noise.
However, consolidation introduces serious risks:
- Destroys episodic structure: A centroid of 10 episodes has no predecessor/successor edges — causal search breaks on prototypes
- Loses variance: Edge cases (often the most informative) are averaged away
- Consistency is hard: Updating prototypes when source episodes change or are re-evaluated requires complex invalidation policies
Decision: Defer consolidation. For current and near-term scale (1-10M episodes), improving retrieval quality (centering, metadata filtering, outcome weighting) is more impactful and less risky than consolidation.
See Part D for a future design (complementary prototypes with tiered fidelity) when scale demands it.
Gap 5: No recency-weighted retrieval
Section titled “Gap 5: No recency-weighted retrieval”More recent experiences should be more accessible by default. CVX has
time_decay_weight in temporal_edges.rs but doesn’t use it in general
search scoring.
Proposed extension: Optional recency factor in composite distance:
d_final = α·d_semantic + β·d_temporal + γ·recency_penalty(age)Where recency_penalty = 1 - exp(-λ·age) (older = higher penalty).
Complexity: Low. Adds one term to the scoring function.
Design pattern observations
Section titled “Design pattern observations”Strengths
Section titled “Strengths”-
Composition over inheritance: Each layer (HnswGraph → TemporalHnsw → TemporalGraphIndex → ConcurrentWrapper) adds responsibility without modifying the inner layer. Idiomatic Rust decorator pattern.
-
SmallVec for neighbor lists:
SmallVec<[u32; 16]>keeps neighbor lists inline (no heap allocation) for the default M=16. -
RoaringBitmap temporal filtering: Sub-byte per vector, O(1) membership check. Excellent for 1M+ scale.
-
Postcard serialization: Compact binary format with separate snapshot structs — serialization logic doesn’t pollute domain structs.
-
Trait-based polymorphism:
DistanceMetric,TemporalIndexAccess,StorageBackendenable real loose coupling and testability.
Weaknesses identified
Section titled “Weaknesses identified”-
TemporalIndexAccessis a god trait: 12 methods with empty defaults. Violates Interface Segregation. Should split into:TemporalSearch(search_raw, search_with_metadata)TrajectoryAccess(trajectory, vector, entity_id, timestamp)RegionAccess(regions, region_members, region_assignments, region_trajectory)
-
Python API bypasses query engine:
cvx-pythoncallscvx-indexandcvx-analyticsdirectly, not throughcvx-query. Features must be exposed twice. The query engine’sTemporalQueryenum (15 query types) is richer than what Python exposes. -
No snapshot versioning: A struct field change silently breaks deserialization. Needs
version: u32inTemporalSnapshot. -
Composite distance scale mismatch:
α·d_semantic + (1-α)·d_temporalassumes comparable scales, but cosine ∈ [0,2] vs temporal ∈ [0,1]. With α=0.5, semantic has 2× the effective weight. -
Entity ID is untyped: A user, document, and episode are all
u64. No way to distinguish entity types at the index level.
Refactoring priority matrix
Section titled “Refactoring priority matrix”| # | Refactoring | Impact | Effort | Benefit for AI agents |
|---|---|---|---|---|
| 1 | Native centering (Part B) | 30× discrimination | Low | More precise memory retrieval |
| 2 | Expose causal_search in Python | Enables continuation pattern | Medium | Primary agent memory pattern |
| 3 | Indexed metadata filtering | Context-dependent retrieval | Medium | ”Similar state + same goal” |
| 4 | Outcome-aware search | Filter by success/reward | Low | Only retrieve what worked |
| 5 | Snapshot versioning | Robustness | Low | Avoid silent data corruption |
| 6 | Trait segregation | Maintainability | Medium | Cleaner extension points |
| 7 | Recency-weighted search | Temporal relevance | Low | Prefer recent experiences |
| 8 | Distance scale normalization | Correctness | Low | Balanced semantic/temporal weighting |
| 9 | Parallel HNSW build (Part A) | 4-6× build speedup | Medium | 30min→5min for researchers | | 10 | Procrustes model alignment | Cross-model robustness | Medium | Preserve memory across model changes |
Memory consolidation has been deferred to roadmap (see Part D). At current scale, improving retrieval quality through items 1-4 is higher impact and lower risk than lossy consolidation.
Relationship to other RFCs
Section titled “Relationship to other RFCs”- RFC-010 (Temporal Graph Index): Provides the
causal_searchinfrastructure. Gap 2 is about exposing it, not reimplementing it. - RFC-004 (Semantic Regions):
region_assignmentsprovides the clustering infrastructure that future consolidation would build on. - RFC-005 (Region Members): Temporal filtering on regions enables time-scoped analytics and future tiered consolidation.
Part D: Future Directions
Section titled “Part D: Future Directions”D.1 Memory Consolidation via Tiered Fidelity
Section titled “D.1 Memory Consolidation via Tiered Fidelity”Problem
Section titled “Problem”At scale (10M+ episodes), unconsolidated accumulation increases retrieval noise. But naive consolidation (replacing episodes with centroids) destroys episodic structure — a centroid has no steps, no predecessor/successor edges, no causal continuation capability.
Design: Complementary Prototypes (not substitutive)
Section titled “Design: Complementary Prototypes (not substitutive)”Prototypes complement episodes, they never replace them. A prototype is an additional node in the HNSW with metadata linking to its source episodes.
COLD = original episodes (append-only, immutable ground truth)WARM = original episodes + derived prototypes (marked as synthetic)HOT = recent episodes + most-consulted prototypesKey principles:
-
Cold is immutable: Original data is never modified or deleted. This is the source of truth for re-derivation if consolidation introduces artifacts.
-
Prototypes are traceable: Each prototype stores
{type: "prototype", source_episodes: [A, B, C], n_sources: 10}. If retrieval returns a prototype and the agent needs more detail, it follows the links to the source episodes in cold. -
Fidelity degrades gracefully: Hot (fast, possibly consolidated) → Warm (moderate, mixed) → Cold (slow, always original). An agent can “zoom in” from a prototype match to the actual episodes.
-
Consistency policy: Prototypes are invalidated when their source episodes’ metadata changes (e.g., reward updated). Re-derivation is triggered lazily on next access or eagerly via background compaction.
Why not now
Section titled “Why not now”- Current scale (1-10M episodes) doesn’t require consolidation
- HNSW search is O(log N) — even 10M is fast
- Retrieval quality improvements (centering, metadata filtering, outcome weighting) have higher impact at current scale
- The consolidation algorithm itself is a research question: what to consolidate, when, how to preserve episodic structure in prototypes
Prerequisites
Section titled “Prerequisites”- Gap 1 (outcome awareness): Need reward annotations to know which episodes are worth consolidating
- Gap 3 (metadata indexing): Need to mark prototypes as synthetic and link to sources
- Tiered storage wiring: Cold tier PQ code exists but is not connected
D.2 Auxiliary Structures — Evaluation
Section titled “D.2 Auxiliary Structures — Evaluation”Question
Section titled “Question”Should CVX incorporate structures beyond HNSW — specifically Bayesian networks, knowledge graphs, or causal DAGs?
What HNSW cannot represent
Section titled “What HNSW cannot represent”- Causal relationships: “action A caused outcome B” is a directed edge, not a distance
- Conditional dependencies: “strategy X works IF condition Y holds” requires structured inference, not similarity search
- Compositional knowledge: “tool A is-a instrument” — taxonomic relations are discrete and transitive
- Probabilistic reasoning: P(state | observations) requires belief propagation
Potential auxiliary structures
Section titled “Potential auxiliary structures”| Structure | What it adds | Integration point | Use case |
|---|---|---|---|
| Knowledge graph | Typed entities + relations | Indexed metadata | Compositional planning |
| Bayesian network | Conditional probabilities | Region transitions as CPTs | Decision under uncertainty |
| Causal DAG | Directed cause-effect | Granger causality → edges | Counterfactual reasoning |
Recommendation
Section titled “Recommendation”Defer to a future RFC. The gaps in Part C (outcome awareness, context filtering, causal continuation) are prerequisites. Auxiliary structures become valuable only after primary retrieval is reliable and context-aware.
If/when needed, the most natural integrations are:
- Knowledge graph as metadata index: Entity relations stored as indexed metadata, leveraging Gap 3’s infrastructure
- Bayesian network as post-retrieval scorer: Lightweight BN scoring P(success | region, context) over HNSW candidates
- Causal DAG from Granger tests: Materialize Granger causality results (already computed) as persistent directed graph
These would be companion crates (cvx-graph, cvx-bayes), not
modifications to the core index.
D.3 Documentation Debt — Architecture vs Implementation
Section titled “D.3 Documentation Debt — Architecture vs Implementation”An audit identified significant gaps between architecture documentation and actual implementation. The following components are documented as features but have no implementation:
| Component | Docs status | Implementation | Action |
|---|---|---|---|
| Data Virtualization | 10+ sections | 0% | Move to roadmap “Production Ingestion” |
| Distributed Deployment | Full architecture | 0% | Move to roadmap “Phase 5+“ |
| Observability (Prometheus/OTLP) | Detailed | Only tracing crate | Mark as planned |
| Temporal ML (Burn/Torch backends) | 3 backends | Only AnalyticBackend | Mark differentiable as future |
| Multi-Scale Alignment | 4 methods | Only resample() | Keep Procrustes, remove rest |
| Interpretability | 6 artifacts | Only drift attribution | Document what exists |
| gRPC QueryStream | Documented | IngestStream + WatchDrift only | Sufficient for now |
| Cold Storage | PQ codebook | Code exists, not wired | Wire when scale demands |
Conversely, the following implemented features are poorly documented or absent from architecture pages:
| Feature | Implementation | Current docs | Action |
|---|---|---|---|
region_assignments() O(N) | Complete in temporal.rs + Python | Only in examples overview | Add to temporal-index.md |
| Episodic memory data model | episode_encoding.rs, E1-E4 validated | Only in research section | Add to data-model.md and analytics-engine.md |
| Anchor projection pipeline | anchor.rs, anchor_index.rs, Python project_to_anchors() | Only in RFC-006 | Add to analytics-engine.md |
| Centering / anisotropy correction | Manual in notebooks, native planned (RFC-012 Part B) | Only in RFC-012 | Add to analytics-engine.md when implemented |
| Metadata filtering | MetadataStore, MetadataFilter, search_with_metadata() | Not in architecture | Add to temporal-index.md |
| Temporal edges / causal search | TemporalEdgeLayer, TemporalGraphIndex (RFC-010) | Not in temporal-index.md | Add temporal edges section |
region_trajectory() EMA smoothing | Complete in temporal.rs + Python | Not documented | Add to temporal-index.md |
| Scalar quantization | enable_quantization() / disable_quantization() | Mentioned in RFC-002 only | Add to temporal-index.md |
Action plan:
Update intro/vision with actual state(done)Mark unimplemented architecture sections with badges(done)- Update
temporal-index.md: add temporal edges, metadata, regions, SQ - Update
analytics-engine.md: add anchor projection, episodic encoding - Update
data-model.md: add episode encoding scheme, metadata model - Move enterprise features (distributed, data virtualization) to roadmap