Skip to content

Quick Start

Terminal window
pip install chronos-vector

Standard vector databases store embeddings as static points. CVX stores them as trajectories — ordered sequences of embeddings for the same entity over time.

Each point in CVX has three components:

ComponentTypeMeaning
entity_idu64Who — the entity this vector belongs to (user, document, episode)
timestampi64When — unix timestamp (seconds)
vector[f32]What — the embedding at this moment in time

This enables questions that static stores cannot answer: “How is this entity changing?”, “When did its behavior shift?”, “What trajectory shape does it follow?”

We create three entities with distinct temporal behaviors — each models a real phenomenon:

EntityProcessModels
0Ornstein-UhlenbeckStable concept with slow evolution
1Regime shift (onset at day 40)Disease onset, crisis event
2Periodic oscillation (14-day)Circadian/weekly patterns
import chronos_vector as cvx
import numpy as np
np.random.seed(42)
D = 32
index = cvx.TemporalIndex(m=16, ef_construction=200, ef_search=50)
# Entity 0: Ornstein-Uhlenbeck — mean-reverting with slow drift
theta = 0.1
ou_mean, ou_state = np.zeros(D, dtype=np.float32), np.zeros(D, dtype=np.float32)
for t in range(100):
ou_mean += np.sin(np.arange(D) * 0.05 + t * 0.02).astype(np.float32) * 0.02
ou_state += theta * (ou_mean - ou_state) + np.random.randn(D).astype(np.float32) * 0.05
index.insert(0, t * 86400, ou_state.tolist())
# Entity 1: Regime shift — gradual onset after day 40
for t in range(100):
if t < 40:
vec = np.ones(D, dtype=np.float32) * 0.3 + np.random.randn(D).astype(np.float32) * 0.03
else:
severity = (t - 40) / 60.0
vec = np.ones(D, dtype=np.float32) * (0.3 - severity * 0.8) + \
np.random.randn(D).astype(np.float32) * (0.03 + severity * 0.1)
index.insert(1, t * 86400, vec.tolist())
# Entity 2: Periodic oscillation — 14-day cycle
for t in range(100):
phase = 2 * np.pi * t / 14
vec = np.sin(np.arange(D) * 0.3 + phase).astype(np.float32) * 0.3 + \
np.random.randn(D).astype(np.float32) * 0.02
index.insert(2, t * 86400, vec.tolist())
print(f"{len(index)} points inserted")
300 points inserted

The raw signals show the three distinct behaviors — the OU process is bounded and smooth, the onset entity shifts abruptly after day 40, and the periodic entity oscillates with a 14-day cycle:

The HNSW index parameters control the speed/accuracy trade-off:

ParameterEffectTypical value
mConnections per node — higher = better recall, more memory16
ef_constructionSearch width during build — higher = better graph, slower build100-200
ef_searchSearch width during query — higher = better recall, slower query50-200

CVX search combines semantic distance (cosine/L2 between vectors) with temporal distance (how far apart in time), controlled by the α\alpha parameter:

dST=αdsemantic+(1α)dtemporald_{ST} = \alpha \cdot d_{\text{semantic}} + (1 - \alpha) \cdot d_{\text{temporal}}

  • α=1.0\alpha = 1.0: pure semantic (ignore time)
  • α=0.5\alpha = 0.5: balanced — prefer recent AND similar
  • α=0.0\alpha = 0.0: pure temporal (ignore content)
query = np.random.randn(32).astype(np.float32).tolist()
results = index.search(query, k=5, alpha=1.0)
for entity_id, timestamp, score in results:
print(f" entity={entity_id}, day={timestamp//86400}, score={score:.4f}")
entity=1, day=981, score=27.2262
entity=1, day=458, score=28.8013
entity=7, day=104, score=30.5205
entity=2, day=122, score=30.8634
entity=5, day=199, score=31.3905

A trajectory is the ordered sequence of all embeddings for a single entity:

traj = index.trajectory(entity_id=0)
print(f"Entity 0 has {len(traj)} points")
Entity 0 has 118 points

Velocity: vt\frac{\partial \mathbf{v}}{\partial t}

Section titled “Velocity: ∂v∂t\frac{\partial \mathbf{v}}{\partial t}∂t∂v​”

The rate of change of the embedding vector at a specific timestamp, computed via finite differences:

v(t)v(t+Δt)v(tΔt)2Δt\mathbf{v}'(t) \approx \frac{\mathbf{v}(t + \Delta t) - \mathbf{v}(t - \Delta t)}{2\Delta t}

A high velocity indicates rapid semantic change — the entity’s representation is shifting fast.

The OU process (green) shows stable low velocity — mean reversion keeps changes bounded. The onset entity (red) shows increasing velocity after day 40 as the regime shift deepens. The periodic entity (blue) shows regular velocity oscillations matching its 14-day cycle.

The Hurst exponent HH characterizes the long-term memory of a trajectory:

H[0,1]H \in [0, 1]

ValueMeaningImplication
H>0.5H > 0.5Persistent (trending)Past direction predicts future direction
H=0.5H = 0.5Random walkNo long-term memory
H<0.5H < 0.5Anti-persistent (mean-reverting)Past direction predicts opposite future

Computed via rescaled range analysis: E[R(n)/S(n)]nH\mathbb{E}[R(n)/S(n)] \propto n^H.

h = cvx.hurst_exponent(traj)
print(f"Hurst exponent: {h:.4f}")
Hurst exponent: 0.7074

PELT (Pruned Exact Linear Time, Killick et al. 2012) detects structural breaks — moments where the statistical properties of the trajectory change abruptly.

The algorithm minimizes a penalized cost:

i=1m+1[C(yτi1+1:τi)+β]\sum_{i=1}^{m+1} \left[ \mathcal{C}(\mathbf{y}_{\tau_{i-1}+1:\tau_i}) + \beta \right]

where C\mathcal{C} is the segment cost and β\beta is the penalty per change point. Higher β\beta = fewer, more significant change points.

cps = cvx.detect_changepoints(1, traj1, min_segment_len=5)
print(f"{len(cps)} change point(s) detected")
1 change point(s) detected
day=50, severity=0.963

PELT correctly identifies the regime change at day 50, where the entity’s embedding flips from positive to negative values.

Drift quantifies the displacement between two vectors:

l2, cosine, top_dims = cvx.drift(traj[0][1], traj[-1][1], top_n=3)
print(f"L2 magnitude: {l2:.4f}")
print(f"Cosine drift: {cosine:.4f}")
L2 magnitude: 7.6693
Cosine drift: 0.9442

HNSW’s multi-level hierarchy forms natural clusters. Nodes at higher levels are hub nodes — they act as cluster centroids. Every node at level 0 is assigned to its nearest hub via greedy descent.

regions = index.regions(level=1)
print(f"{len(regions)} regions at level 1")
assignments = index.region_assignments(level=1)
total = sum(len(m) for m in assignments.values())
print(f"{total} points assigned across {len(assignments)} regions")
17 regions at level 1
1000 points assigned across 17 regions

Three synthetic entities show distinct behaviors in embedding space:

💡 Color = time
Points are colored by time (dark = early, light = late). The smooth drift entity traces a continuous path. The regime change entity shows two distinct clusters. The noise entity fills a cloud around the origin.

index.save("my_index") # Directory: my_index/index.bin + temporal_edges.bin
loaded = cvx.TemporalIndex.load("my_index")
print(f"Loaded {len(loaded)} points")
Loaded 1000 points