Temporal Analytics

CVX provides 27+ analytical functions that treat embedding trajectories as mathematical objects — differentiable curves, stochastic processes, and topological spaces. This tutorial covers each category with theory, code, and visualizations.

Setup: Three Stochastic Processes

Each entity models a real phenomenon using well-known stochastic processes:

Entity	Process	Models	Expected behavior
0	Ornstein-Uhlenbeck	Stable concept with slow evolution	Mean-reverting, bounded drift
1	Regime shift (onset at $t=40$ )	Disease onset, crisis event	Structural break, accelerating drift
2	Periodic oscillation (14-day)	Circadian/weekly patterns	Cyclical velocity, oscillating drift

import chronos_vector as cvx
import numpy as np

np.random.seed(42)
D = 32

index = cvx.TemporalIndex(m=16, ef_construction=100)

# Entity 0: Ornstein-Uhlenbeck — mean-reverting with slow drift
theta = 0.1  # mean reversion speed
ou_mean, ou_state = np.zeros(D, dtype=np.float32), np.zeros(D, dtype=np.float32)
for t in range(100):
    ou_mean += np.sin(np.arange(D) * 0.05 + t * 0.02).astype(np.float32) * 0.02
    ou_state += theta * (ou_mean - ou_state) + np.random.randn(D).astype(np.float32) * 0.05
    index.insert(0, t * 86400, ou_state.tolist())

# Entity 1: Regime shift — gradual onset after day 40
for t in range(100):
    if t < 40:
        vec = np.ones(D, dtype=np.float32) * 0.3 + np.random.randn(D).astype(np.float32) * 0.03
    else:
        severity = (t - 40) / 60.0
        vec = np.ones(D, dtype=np.float32) * (0.3 - severity * 0.8) + \
              np.random.randn(D).astype(np.float32) * (0.03 + severity * 0.1)
    index.insert(1, t * 86400, vec.tolist())

# Entity 2: Periodic oscillation — 14-day cycle
for t in range(100):
    phase = 2 * np.pi * t / 14
    vec = np.sin(np.arange(D) * 0.3 + phase).astype(np.float32) * 0.3 + \
          np.random.randn(D).astype(np.float32) * 0.02
    index.insert(2, t * 86400, vec.tolist())

traj0, traj1, traj2 = index.trajectory(0), index.trajectory(1), index.trajectory(2)

The raw signals — vector norm and first 3 dimensions over time — show the structure that CVX analytics will detect:

Top row: Vector norm over time. The OU process is bounded (~0.3-0.5), the onset entity drops sharply after day 40, and the periodic entity oscillates regularly. Bottom row: First 3 dimensions showing the underlying dynamics — mean reversion, regime shift, and sinusoidal oscillation.

1. Vector Differential Calculus

Drift: Displacement Between States

Drift measures the total displacement between two embedding vectors, decomposed into L2 magnitude, cosine distance, and per-dimension contributions:

$\text{drift}_{L2} = \|\mathbf{v}(t_2) - \mathbf{v}(t_1)\|_2 \qquad \text{drift}_{\cos} = 1 - \frac{\mathbf{v}(t_1) \cdot \mathbf{v}(t_2)}{\|\mathbf{v}(t_1)\| \cdot \|\mathbf{v}(t_2)\|}$

l2, cosine, top_dims = cvx.drift(traj1[0][1], traj1[-1][1], top_n=3)
print(f"Regime entity drift: L2={l2:.3f}, cosine={cosine:.4f}")

Regime entity drift: L2=5.627, cosine=1.9959

The onset entity (red) shows accelerating drift after day 40 as the regime shift deepens. The OU process (green) drifts slowly but is bounded by mean reversion. The periodic entity (blue) oscillates — drift increases and decreases with each cycle.

Velocity: Instantaneous Rate of Change

$\mathbf{v}'(t) \approx \frac{\mathbf{v}(t + \Delta t) - \mathbf{v}(t - \Delta t)}{2\Delta t}$

The magnitude $\|\mathbf{v}'(t)\|$ indicates how fast the entity’s semantic representation is shifting. See the Quick Start velocity plot for a comparison across entity types.

Hurst Exponent: Persistence vs Mean-Reversion

The rescaled range statistic:

$\mathbb{E}\left[\frac{R(n)}{S(n)}\right] \sim c \cdot n^H \quad \text{as } n \to \infty$

where $R(n)$ is the range of cumulative deviations and $S(n)$ is the standard deviation over window $n$ .

$H$ value	Behavior	Example
$H \approx 0.5$	Random walk	Gaussian noise trajectory
$H > 0.5$	Persistent — trends continue	Gradual semantic drift, disease progression
$H < 0.5$	Anti-persistent — trends reverse	Mean-reverting oscillation

for traj, name in [(traj0, "Trending"), (traj1, "Regime"), (traj2, "Noise")]:
    print(f"  {name}: H={cvx.hurst_exponent(traj):.4f}")

  Trending: H=0.5592
  Regime: H=0.7224
  Noise: H=0.7227

2. Change Point Detection (PELT)

Minimizes a penalized cost function over all possible segmentations:

$\min_{\tau_1, \ldots, \tau_m} \sum_{i=1}^{m+1} \left[ \mathcal{C}(\mathbf{y}_{\tau_{i-1}+1:\tau_i}) + \beta \right]$

$\mathcal{C}$ : segment cost (Gaussian log-likelihood on L2 distances between consecutive vectors)
$\beta$ : penalty per change point (default: BIC = $\frac{D \cdot \ln(n)}{2}$ )

cps = cvx.detect_changepoints(1, traj1, min_segment_len=5)
for ts, severity in cps:
    print(f"  Day {ts // 86400}: severity={severity:.3f}")

  Day 50: severity=0.963

💡 BIC penalty for high dimensions
For $D > 100$ , the default BIC penalty may be too sensitive. Use penalty = 3 * np.log(n) to reduce false positives.

3. Path Signatures

From rough path theory (Lyons 1998), the truncated path signature is a universal, reparametrization-invariant descriptor of a trajectory’s shape.

For a path $\mathbf{X}: [0,T] \to \mathbb{R}^D$ , the depth- $k$ signature is:

$S(\mathbf{X})^{i_1, \ldots, i_k} = \int_0^T \int_0^{t_k} \cdots \int_0^{t_2} dX^{i_1}_{t_1} \cdots dX^{i_k}_{t_k}$

Depth	Features	Captures
1	$D$	Net displacement per dimension
2	$D + D^2$	Displacement + signed area (rotation, order)
3	$D + D^2 + D^3$	Higher-order interactions

sig0 = cvx.path_signature(traj0, depth=2)
sig1 = cvx.path_signature(traj1, depth=2)
print(f"Signature dimension: {len(sig0)} (D={D}, depth=2)")

print(f"  trending vs regime: {cvx.signature_distance(sig0, sig1):.3f}")

Signature dimension: 1056 (D=32, depth=2)
  trending vs regime: 18.148

The distance matrix shows that the trending and regime trajectories have the most different shapes — the abrupt regime change creates a very different signature from smooth drift.

Fréchet Distance

The dog-walking distance — the minimum leash length needed to traverse both trajectories simultaneously, preserving order:

$d_F(\mathbf{X}, \mathbf{Y}) = \inf_{\alpha, \beta} \max_{t \in [0,1]} \|\mathbf{X}(\alpha(t)) - \mathbf{Y}(\beta(t))\|$

4. Point Process Analysis

When only event timing is available (no embeddings), point process features characterize temporal patterns:

Feature	Formula	Interpretation
Burstiness	$B = \frac{\sigma_\tau - \mu_\tau}{\sigma_\tau + \mu_\tau}$	$B > 0$ : bursty, $B = 0$ : Poisson, $B < 0$ : periodic
Memory	$M = \text{corr}(\tau_i, \tau_{i+1})$	Correlation between consecutive inter-event gaps
Entropy	$H = -\sum p_i \log p_i$	Uniformity of event distribution over time bins
Gap CV	$\text{CV} = \sigma_\tau / \mu_\tau$	Coefficient of variation of inter-event gaps

Bursty events show high burstiness and gap CV; uniform events show near-zero burstiness and high entropy.

5. Distributional Distances

For comparing probability distributions (e.g., region membership proportions):

Fisher-Rao Distance

The geodesic distance on the statistical manifold of categorical distributions:

$d_{FR}(\mathbf{p}, \mathbf{q}) = 2 \arccos\left(\sum_i \sqrt{p_i \cdot q_i}\right)$

Bounded in $[0, \pi]$ . A true metric — symmetric, satisfies triangle inequality (unlike KL divergence).

Hellinger Distance

$d_H(\mathbf{p}, \mathbf{q}) = \frac{1}{\sqrt{2}} \sqrt{\sum_i \left(\sqrt{p_i} - \sqrt{q_i}\right)^2}$

Bounded in $[0, 1]$ . Related to Fisher-Rao: $d_H = \sqrt{2} \sin(d_{FR}/2)$ .

Wasserstein (Earth Mover’s Distance)

The optimal transport cost between distributions, respecting the geometry of the underlying space (region centroids). Unlike Hellinger/Fisher-Rao, Wasserstein accounts for how far apart categories are, not just their probabilities.

p, q = [0.5, 0.3, 0.2], [0.2, 0.5, 0.3]
print(f"Fisher-Rao: {cvx.fisher_rao_distance(p, q):.4f}")
print(f"Hellinger:  {cvx.hellinger_distance(p, q):.4f}")

Fisher-Rao: 0.4510
Hellinger:  0.2228

6. Persistent Homology

Topological data analysis tracks how the connectivity structure of a point cloud changes across spatial scales.

The Betti number $\beta_0(r)$ counts the number of connected components at radius $r$ . As $r$ increases, components merge. The radii at which they appear and disappear form a persistence diagram.

Metric	Meaning
Total persistence	Sum of all lifetimes — overall topological complexity
Max persistence	Longest-lived feature — most robust structure
Persistence entropy	Shannon entropy of lifetimes — uniformity of scales

vecs = [v for _, v in traj0]
topo = cvx.topological_features(vecs, n_radii=20, persistence_threshold=0.1)
print(f"Total persistence: {topo['total_persistence']:.4f}")
print(f"Max persistence:   {topo['max_persistence']:.4f}")

The Betti curve shows how connected components merge as the radius grows. A sharp drop indicates well-separated clusters.

Summary: Choosing the Right Tool

Question	Function	Output
How fast is it changing?	`velocity()`	Vector — direction + magnitude
How much has it changed?	`drift()`	Scalar + top dimensions
Is it trending or reverting?	`hurst_exponent()`	$H \in [0,1]$
When did behavior change?	`detect_changepoints()`	Timestamps + severity
What shape is the trajectory?	`path_signature()`	Fixed-size descriptor
Are two trajectories similar?	`signature_distance()`, `frechet_distance()`	Scalar distance
How do distributions differ?	`fisher_rao_distance()`, `wasserstein_drift()`	Scalar distance
Is the data clustered?	`topological_features()`	Betti curves + persistence
What’s the temporal pattern?	`event_features()`	Burstiness, memory, entropy