Multi-Space & Multi-Scale
The Problem: Embeddings Are Not Homogeneous
Section titled “The Problem: Embeddings Are Not Homogeneous”Real-world embedding systems are not single-model, single-frequency affairs. A production system might need to store and correlate:
- Text embeddings (, BERT) updated daily
- Image embeddings (, CLIP) updated hourly
- User behavior embeddings (, recommendation model) updated in real time
- Graph embeddings (, TransE) updated weekly
These embeddings live in different spaces (different dimensionality, scale, update frequency, distance metric) but represent the same entities or related entities. An ML engineer wants to ask: “When did the text and image representations of this product diverge?” A researcher wants to know: “Does textual semantic evolution predict visual evolution?”
ChronosVector, as a temporal VDB, is uniquely positioned to solve this: it not only stores multiple representations, but can analyze how they evolve with respect to each other over time.
Core Concepts
Section titled “Core Concepts”Embedding Space
Section titled “Embedding Space”An EmbeddingSpace is a registered vector space in CVX with defined properties:
pub struct EmbeddingSpace { pub space_id: u32, pub name: String, // e.g., "text-bert-768" pub dimensionality: u32, pub metric: DistanceMetricType, // cosine, euclidean, etc. pub typical_frequency: Option<TemporalFrequency>, pub normalization: Normalization, // UnitNorm, None, Custom}Each space gets its own ST-HNSW index (different dimensionalities prevent sharing an index). The storage key extends from (entity_id, timestamp) to (entity_id, space_id, timestamp). If no space is specified, space_id = 0 (default) is used for backward compatibility.
Multi-Space Entities
Section titled “Multi-Space Entities”An entity can have vectors in multiple spaces, each with its own temporal trajectory. The fundamental tuple becomes:
This enables a product to have simultaneous text, image, and behavioral trajectories, each evolving independently but analyzable together.
Alignment Methods
Section titled “Alignment Methods”An alignment function measures the coherence between two spaces for the same entity over time. It does not compare vectors directly (they are in different spaces) — it compares behaviors.
1. Structural Alignment (Topology Preservation)
Section titled “1. Structural Alignment (Topology Preservation)”Question: “Do the neighbors of entity X in space A match its neighbors in space B?”
The algorithm computes kNN of the entity in each space at each timestamp and measures the Jaccard similarity of the neighbor sets:
Advantages:
- Works across different dimensionalities (no projection needed)
- Captures whether the entity’s “role” is consistent across modalities
Cost: where is the number of timestamps and is the neighbor count.
2. Behavioral Alignment (Drift Correlation)
Section titled “2. Behavioral Alignment (Drift Correlation)”Question: “When entity X changes fast in space A, does it also change fast in space B?”
Computes the per-step drift magnitude in each space, then correlates the two drift time series using Pearson, Spearman, or Kendall-Tau correlation:
Result: A value in . Positive means the spaces evolve together; negative means one is stable when the other changes.
Advantages:
- Scale-invariant — different dimensionalities and magnitudes do not matter
- Cheap to compute:
3. Procrustes Alignment (Geometric)
Section titled “3. Procrustes Alignment (Geometric)”Question: “What is the best rotation that aligns these two trajectory shapes?”
When spaces share the same dimensionality (or are projected to a common one), Orthogonal Procrustes finds the rotation matrix that minimizes the Frobenius norm:
The solution uses SVD: given , the optimal rotation is . The residual error after alignment measures misalignment.
Cost: for the SVD computation.
This is the same method used for model version alignment, but applied to cross-modal comparison rather than cross-version comparison.
4. Canonical Correlation Analysis (CCA)
Section titled “4. Canonical Correlation Analysis (CCA)”Question: “What subspaces of A and B are maximally correlated?”
For spaces with different dimensionalities, CCA finds projection matrices and that maximize the correlation between the projected spaces. This produces:
- Canonical correlations — sorted from highest to lowest, showing how many dimensions are shared between the spaces
- Projection matrices — enabling cross-space comparison and kNN
- Effective alignment dimensionality — how many dimensions are meaningfully correlated
Cost:
Choosing an Alignment Method
Section titled “Choosing an Alignment Method”| Method | Different dims? | What it measures | Best for |
|---|---|---|---|
| Structural | Yes | Neighborhood consistency | ”Do the same entities cluster together in both spaces?” |
| Behavioral | Yes | Change correlation | ”Do the spaces react to the same events?” |
| Procrustes | No* | Geometric fit | Model version alignment, same-dim cross-modal |
| CCA | Yes | Subspace correlation | Finding shared structure across heterogeneous spaces |
*Requires same dimensionality or prior projection.
Temporal Resampling
Section titled “Temporal Resampling”Embeddings from different sources update at different frequencies. Text embeddings might update daily, image embeddings hourly. To analyze cross-space alignment, the timelines need to be brought to a common temporal scale.
Interpolation Methods
Section titled “Interpolation Methods”| Method | How it works | Best for |
|---|---|---|
| LastValue | Zero-order hold (use the last known value) | Sparse updates, no assumptions |
| Linear | Linear interpolation between known points | Short gaps, general purpose |
| Slerp | Spherical linear interpolation | Cosine-metric spaces (preserves unit sphere geometry) |
| NeuralOde | Use trained Neural ODE for continuous interpolation | Most accurate, but expensive (requires Layer 10) |
For downsampling (multiple values in one bin), three aggregation strategies are available: Last (most recent value), Mean (average), and MostRecent (highest confidence).
Slerp deserves special attention for unit-normalized embeddings. Unlike linear interpolation, which can produce vectors with , Slerp interpolates along the great circle on the unit sphere:
where .
Multi-Scale Drift Analysis
Section titled “Multi-Scale Drift Analysis”Drift can appear different at different temporal scales. Daily noise might mask weekly trends, or amplify them. Multi-scale analysis examines drift at multiple granularities simultaneously.
Analysis Protocol
Section titled “Analysis Protocol”- Resample the trajectory to each target scale (hourly, daily, weekly, monthly)
- Compute the drift time series at each scale
- Detect change points at each scale
- Compare results across scales
At each scale, a ScaleDriftReport provides:
- Mean drift rate and variance
- Trend: accelerating, decelerating, stable, or oscillating
- Change points detected at that scale
- Signal-to-noise ratio (SNR): drift signal vs measurement noise
Cross-Scale Coherence
Section titled “Cross-Scale Coherence”The key insight: change points that persist across multiple scales are high-confidence. Change points that appear only at fine scales are likely noise.
A RobustChangePoint is a change point detected at multiple scales:
{ "timestamp": 1650000000, "severity": 0.85, "scale_count": 3, "scales_detected": ["hourly", "daily", "weekly"]}Cross-scale coherence also identifies the optimal analysis scale — the temporal granularity where the signal-to-noise ratio is maximized. This tells the user: “For this entity, weekly analysis gives you the clearest signal.”
The fine-to-coarse correlation measures whether drift patterns at fine scales predict drift at coarser scales. High correlation suggests a consistent underlying process; low correlation suggests scale-dependent dynamics.
API Endpoints
Section titled “API Endpoints”Space Management
Section titled “Space Management”| Endpoint | Method | Description |
|---|---|---|
/v1/spaces | POST | Register a new embedding space |
/v1/spaces | GET | List all registered spaces |
/v1/spaces/{name} | GET | Get space details |
Cross-Space Alignment
Section titled “Cross-Space Alignment”| Endpoint | Method | Description |
|---|---|---|
/v1/alignment/entities/{id} | GET | Alignment score between two spaces for an entity |
/v1/alignment/cohort | POST | Alignment analysis for multiple entities |
/v1/alignment/entities/{id}/cross-prediction | GET | Predict evolution in one space from another |
Parameters for alignment queries include space_a, space_b, method (structural, behavioral, procrustes, cca), time range, and resampling frequency.
Multi-Scale Analysis
Section titled “Multi-Scale Analysis”| Endpoint | Method | Description |
|---|---|---|
/v1/multiscale/entities/{id}/drift | GET | Drift analysis at multiple temporal scales |
/v1/multiscale/entities/{id}/robust-changepoints | GET | Change points that persist across scales |
The robust-changepoints endpoint accepts a min_scales parameter (default: 2) that controls the minimum number of scales at which a change point must appear to be considered robust.
Performance Targets
Section titled “Performance Targets”| Operation | Target |
|---|---|
| Space registration | < 1ms |
| Behavioral alignment (2 spaces, 1K timestamps) | < 50ms |
| Structural alignment (2 spaces, 1K timestamps, ) | < 500ms |
| Procrustes alignment (, 1K timestamps) | < 200ms |
| CCA (, , 1K timestamps) | < 1s |
| Temporal resampling (10K to 1K points) | < 10ms |
| Multi-scale drift (3 scales, 10K points) | < 500ms |