Skip to content

Temporal Analytics

CVX provides 27+ analytical functions that treat embedding trajectories as mathematical objects — differentiable curves, stochastic processes, and topological spaces. This tutorial covers each category with theory, code, and visualizations.

Each entity models a real phenomenon using well-known stochastic processes:

EntityProcessModelsExpected behavior
0Ornstein-UhlenbeckStable concept with slow evolutionMean-reverting, bounded drift
1Regime shift (onset at t=40t=40)Disease onset, crisis eventStructural break, accelerating drift
2Periodic oscillation (14-day)Circadian/weekly patternsCyclical velocity, oscillating drift
import chronos_vector as cvx
import numpy as np
np.random.seed(42)
D = 32
index = cvx.TemporalIndex(m=16, ef_construction=100)
# Entity 0: Ornstein-Uhlenbeck — mean-reverting with slow drift
theta = 0.1 # mean reversion speed
ou_mean, ou_state = np.zeros(D, dtype=np.float32), np.zeros(D, dtype=np.float32)
for t in range(100):
ou_mean += np.sin(np.arange(D) * 0.05 + t * 0.02).astype(np.float32) * 0.02
ou_state += theta * (ou_mean - ou_state) + np.random.randn(D).astype(np.float32) * 0.05
index.insert(0, t * 86400, ou_state.tolist())
# Entity 1: Regime shift — gradual onset after day 40
for t in range(100):
if t < 40:
vec = np.ones(D, dtype=np.float32) * 0.3 + np.random.randn(D).astype(np.float32) * 0.03
else:
severity = (t - 40) / 60.0
vec = np.ones(D, dtype=np.float32) * (0.3 - severity * 0.8) + \
np.random.randn(D).astype(np.float32) * (0.03 + severity * 0.1)
index.insert(1, t * 86400, vec.tolist())
# Entity 2: Periodic oscillation — 14-day cycle
for t in range(100):
phase = 2 * np.pi * t / 14
vec = np.sin(np.arange(D) * 0.3 + phase).astype(np.float32) * 0.3 + \
np.random.randn(D).astype(np.float32) * 0.02
index.insert(2, t * 86400, vec.tolist())
traj0, traj1, traj2 = index.trajectory(0), index.trajectory(1), index.trajectory(2)

The raw signals — vector norm and first 3 dimensions over time — show the structure that CVX analytics will detect:

Top row: Vector norm over time. The OU process is bounded (~0.3-0.5), the onset entity drops sharply after day 40, and the periodic entity oscillates regularly. Bottom row: First 3 dimensions showing the underlying dynamics — mean reversion, regime shift, and sinusoidal oscillation.


Drift measures the total displacement between two embedding vectors, decomposed into L2 magnitude, cosine distance, and per-dimension contributions:

driftL2=v(t2)v(t1)2driftcos=1v(t1)v(t2)v(t1)v(t2)\text{drift}_{L2} = \|\mathbf{v}(t_2) - \mathbf{v}(t_1)\|_2 \qquad \text{drift}_{\cos} = 1 - \frac{\mathbf{v}(t_1) \cdot \mathbf{v}(t_2)}{\|\mathbf{v}(t_1)\| \cdot \|\mathbf{v}(t_2)\|}

l2, cosine, top_dims = cvx.drift(traj1[0][1], traj1[-1][1], top_n=3)
print(f"Regime entity drift: L2={l2:.3f}, cosine={cosine:.4f}")
Regime entity drift: L2=5.627, cosine=1.9959

The onset entity (red) shows accelerating drift after day 40 as the regime shift deepens. The OU process (green) drifts slowly but is bounded by mean reversion. The periodic entity (blue) oscillates — drift increases and decreases with each cycle.

v(t)v(t+Δt)v(tΔt)2Δt\mathbf{v}'(t) \approx \frac{\mathbf{v}(t + \Delta t) - \mathbf{v}(t - \Delta t)}{2\Delta t}

The magnitude v(t)\|\mathbf{v}'(t)\| indicates how fast the entity’s semantic representation is shifting. See the Quick Start velocity plot for a comparison across entity types.

Hurst Exponent: Persistence vs Mean-Reversion

Section titled “Hurst Exponent: Persistence vs Mean-Reversion”

The rescaled range statistic:

E[R(n)S(n)]cnHas n\mathbb{E}\left[\frac{R(n)}{S(n)}\right] \sim c \cdot n^H \quad \text{as } n \to \infty

where R(n)R(n) is the range of cumulative deviations and S(n)S(n) is the standard deviation over window nn.

HH valueBehaviorExample
H0.5H \approx 0.5Random walkGaussian noise trajectory
H>0.5H > 0.5Persistent — trends continueGradual semantic drift, disease progression
H<0.5H < 0.5Anti-persistent — trends reverseMean-reverting oscillation
for traj, name in [(traj0, "Trending"), (traj1, "Regime"), (traj2, "Noise")]:
print(f" {name}: H={cvx.hurst_exponent(traj):.4f}")
Trending: H=0.5592
Regime: H=0.7224
Noise: H=0.7227

Minimizes a penalized cost function over all possible segmentations:

minτ1,,τmi=1m+1[C(yτi1+1:τi)+β]\min_{\tau_1, \ldots, \tau_m} \sum_{i=1}^{m+1} \left[ \mathcal{C}(\mathbf{y}_{\tau_{i-1}+1:\tau_i}) + \beta \right]

  • C\mathcal{C}: segment cost (Gaussian log-likelihood on L2 distances between consecutive vectors)
  • β\beta: penalty per change point (default: BIC = Dln(n)2\frac{D \cdot \ln(n)}{2})
cps = cvx.detect_changepoints(1, traj1, min_segment_len=5)
for ts, severity in cps:
print(f" Day {ts // 86400}: severity={severity:.3f}")
Day 50: severity=0.963

💡 BIC penalty for high dimensions
For D>100D > 100, the default BIC penalty may be too sensitive. Use penalty = 3 * np.log(n) to reduce false positives.


From rough path theory (Lyons 1998), the truncated path signature is a universal, reparametrization-invariant descriptor of a trajectory’s shape.

For a path X:[0,T]RD\mathbf{X}: [0,T] \to \mathbb{R}^D, the depth-kk signature is:

S(X)i1,,ik=0T0tk0t2dXt1i1dXtkikS(\mathbf{X})^{i_1, \ldots, i_k} = \int_0^T \int_0^{t_k} \cdots \int_0^{t_2} dX^{i_1}_{t_1} \cdots dX^{i_k}_{t_k}

DepthFeaturesCaptures
1DDNet displacement per dimension
2D+D2D + D^2Displacement + signed area (rotation, order)
3D+D2+D3D + D^2 + D^3Higher-order interactions
sig0 = cvx.path_signature(traj0, depth=2)
sig1 = cvx.path_signature(traj1, depth=2)
print(f"Signature dimension: {len(sig0)} (D={D}, depth=2)")
print(f" trending vs regime: {cvx.signature_distance(sig0, sig1):.3f}")
Signature dimension: 1056 (D=32, depth=2)
trending vs regime: 18.148

The distance matrix shows that the trending and regime trajectories have the most different shapes — the abrupt regime change creates a very different signature from smooth drift.

The dog-walking distance — the minimum leash length needed to traverse both trajectories simultaneously, preserving order:

dF(X,Y)=infα,βmaxt[0,1]X(α(t))Y(β(t))d_F(\mathbf{X}, \mathbf{Y}) = \inf_{\alpha, \beta} \max_{t \in [0,1]} \|\mathbf{X}(\alpha(t)) - \mathbf{Y}(\beta(t))\|


When only event timing is available (no embeddings), point process features characterize temporal patterns:

FeatureFormulaInterpretation
BurstinessB=στμτστ+μτB = \frac{\sigma_\tau - \mu_\tau}{\sigma_\tau + \mu_\tau}B>0B > 0: bursty, B=0B = 0: Poisson, B<0B < 0: periodic
MemoryM=corr(τi,τi+1)M = \text{corr}(\tau_i, \tau_{i+1})Correlation between consecutive inter-event gaps
EntropyH=pilogpiH = -\sum p_i \log p_iUniformity of event distribution over time bins
Gap CVCV=στ/μτ\text{CV} = \sigma_\tau / \mu_\tauCoefficient of variation of inter-event gaps

Bursty events show high burstiness and gap CV; uniform events show near-zero burstiness and high entropy.


For comparing probability distributions (e.g., region membership proportions):

The geodesic distance on the statistical manifold of categorical distributions:

dFR(p,q)=2arccos(ipiqi)d_{FR}(\mathbf{p}, \mathbf{q}) = 2 \arccos\left(\sum_i \sqrt{p_i \cdot q_i}\right)

Bounded in [0,π][0, \pi]. A true metric — symmetric, satisfies triangle inequality (unlike KL divergence).

dH(p,q)=12i(piqi)2d_H(\mathbf{p}, \mathbf{q}) = \frac{1}{\sqrt{2}} \sqrt{\sum_i \left(\sqrt{p_i} - \sqrt{q_i}\right)^2}

Bounded in [0,1][0, 1]. Related to Fisher-Rao: dH=2sin(dFR/2)d_H = \sqrt{2} \sin(d_{FR}/2).

The optimal transport cost between distributions, respecting the geometry of the underlying space (region centroids). Unlike Hellinger/Fisher-Rao, Wasserstein accounts for how far apart categories are, not just their probabilities.

p, q = [0.5, 0.3, 0.2], [0.2, 0.5, 0.3]
print(f"Fisher-Rao: {cvx.fisher_rao_distance(p, q):.4f}")
print(f"Hellinger: {cvx.hellinger_distance(p, q):.4f}")
Fisher-Rao: 0.4510
Hellinger: 0.2228

Topological data analysis tracks how the connectivity structure of a point cloud changes across spatial scales.

The Betti number β0(r)\beta_0(r) counts the number of connected components at radius rr. As rr increases, components merge. The radii at which they appear and disappear form a persistence diagram.

MetricMeaning
Total persistenceSum of all lifetimes — overall topological complexity
Max persistenceLongest-lived feature — most robust structure
Persistence entropyShannon entropy of lifetimes — uniformity of scales
vecs = [v for _, v in traj0]
topo = cvx.topological_features(vecs, n_radii=20, persistence_threshold=0.1)
print(f"Total persistence: {topo['total_persistence']:.4f}")
print(f"Max persistence: {topo['max_persistence']:.4f}")

The Betti curve shows how connected components merge as the radius grows. A sharp drop indicates well-separated clusters.


QuestionFunctionOutput
How fast is it changing?velocity()Vector — direction + magnitude
How much has it changed?drift()Scalar + top dimensions
Is it trending or reverting?hurst_exponent()H[0,1]H \in [0,1]
When did behavior change?detect_changepoints()Timestamps + severity
What shape is the trajectory?path_signature()Fixed-size descriptor
Are two trajectories similar?signature_distance(), frechet_distance()Scalar distance
How do distributions differ?fisher_rao_distance(), wasserstein_drift()Scalar distance
Is the data clustered?topological_features()Betti curves + persistence
What’s the temporal pattern?event_features()Burstiness, memory, entropy