Skip to content

Temporal Analytics Toolkit

ChronosVector provides 19 analytical functions for extracting insights from temporal vector data. Each function is grounded in a specific mathematical framework and applicable across multiple domains.

CategoryFunctionsMathematical Basis
Differential Calculusvelocity, drift, temporal_featuresFinite differences, feature engineering
Stochastic Characterizationhurst_exponent, detect_changepointsR/S analysis, PELT algorithm
Path Signaturespath_signature, log_signature, signature_distanceRough path theory (Lyons, 1998)
Trajectory Comparisonfrechet_distanceComputational geometry
Distributional Distanceswasserstein_drift, fisher_rao_distance, hellinger_distanceOptimal transport, information geometry
Point Process Analysisevent_featuresTemporal point processes
Topological Analysistopological_featuresPersistent homology (TDA)
Anchor Projectionproject_to_anchors, anchor_summaryCoordinate system change to reference frame
PredictionpredictLinear extrapolation / Neural ODE

Theory: Instantaneous rate of change dvdt\frac{dv}{dt} via central finite differences. Measures how fast an entity is moving through the embedding space.

Output: Velocity vector (same dimensionality as input).

Applications:

  • Mental health: Rate of linguistic change — accelerating drift may signal crisis
  • Finance: Speed of portfolio evolution — regime transitions show velocity spikes
  • Molecular dynamics: Conformational transition speed between states
  • MAP-Elites: Rate of archive exploration — stagnation detection

Theory: Total displacement between two vectors, decomposed by dimension. Identifies what changed and how much.

Output: (l2_magnitude, cosine_drift, top_dimensions) — the overall magnitude, angular change, and the dimensions contributing most.

Theory: Fixed-size summary vector combining mean, std, velocity statistics, Hurst exponent, and change point count. Designed for downstream ML classification.

Output: Vector of size 2D+52D + 5 where DD = input dimensionality.


Theory: The Hurst exponent H(0,1)H \in (0, 1) characterizes the long-range dependence of a time series via R/S (rescaled range) analysis:

  • H=0.5H = 0.5: random walk (no memory)
  • H>0.5H > 0.5: persistent (trending) — past increases predict future increases
  • H<0.5H < 0.5: anti-persistent (mean-reverting) — past increases predict future decreases

Applications:

  • Mental health: Depression users show anti-persistent topical dynamics (H<0.5H < 0.5, d=0.41d = -0.41 on CLPsych) — erratic topic switching
  • Finance: H>0.5H > 0.5 indicates trending markets; H<0.5H < 0.5 indicates mean-reversion opportunities
  • Evolutionary optimization: H0.5H \to 0.5 indicates convergence (search becoming random)

cvx.detect_changepoints(entity_id, trajectory, penalty, min_segment_len)

Section titled “cvx.detect_changepoints(entity_id, trajectory, penalty, min_segment_len)”

Theory: PELT (Pruned Exact Linear Time) algorithm for offline change point detection. Minimizes:

i=1K+1C(yti1:ti)+Kβ\sum_{i=1}^{K+1} C(y_{t_{i-1}:t_i}) + K \cdot \beta

where CC is the segment cost and β\beta is the penalty per change point.

Output: List of (timestamp, severity) pairs marking regime transitions.

Applications:

  • Molecular dynamics: Conformational state transitions
  • MLOps: Model drift onset detection
  • Social media: Behavioral regime changes in users

Based on rough path theory (Lyons, 1998). See Path Signatures for full mathematical treatment.

cvx.path_signature(trajectory, depth=2, time_augmentation=False)

Section titled “cvx.path_signature(trajectory, depth=2, time_augmentation=False)”

Theory: The truncated path signature is the universal nonlinear feature of sequential data. Any continuous function of a path can be approximated by a linear function of its signature. It captures ordered, multi-scale structure:

  • Depth 1 (KK features): Net displacement — where did the entity move
  • Depth 2 (K+K2K + K^2 features): Signed areas — how it moved (rotation, correlation, volatility). Distinguishes “right-then-up” from “up-then-right”
  • Depth 3 (K+K2+K3K + K^2 + K^3 features): Higher-order interactions

Key property — Chen’s Identity: S(αβ)=S(α)S(β)S(\alpha \ast \beta) = S(\alpha) \otimes S(\beta). When a new point is appended, the signature updates in O(K2)O(K^2) instead of full recomputation in O(NK2)O(N \cdot K^2). This makes path signatures native to incremental-insert databases.

Practical note: Applied on region trajectories (K80K \sim 80 at HNSW level 3), not raw embeddings (D=768D = 768), keeping the computation tractable.

cvx.log_signature(trajectory, depth=2, time_augmentation=False)

Section titled “cvx.log_signature(trajectory, depth=2, time_augmentation=False)”

Theory: The log-signature removes redundant symmetric components via the Baker-Campbell-Hausdorff formula. Contains the same information in fewer dimensions: K+K(K1)/2K + K(K-1)/2 instead of K+K2K + K^2 at depth 2.

Theory: L2L_2 distance between path signatures. A fast trajectory similarity measure: O(K2)O(K^2) per comparison capturing all order-dependent temporal dynamics.

Applications (all domains):

  • Molecular dynamics: Find simulations with similar conformational dynamics
  • Drug discovery: Identify optimization campaigns that followed similar paths through chemical space
  • MAP-Elites: Find solutions with similar evolutionary trajectories
  • Mental health: Detect users with similar temporal patterns to known at-risk cases

Theory: The discrete Fréchet distance measures the maximum minimum distance between corresponding points when both paths are traversed monotonically. Informally: the shortest leash needed if you walk along path A while your dog walks along path B, both moving only forward.

dF(A,B)=infα,βmaxtA(α(t))B(β(t))d_F(A, B) = \inf_{\alpha, \beta} \max_t \| A(\alpha(t)) - B(\beta(t)) \|

Complexity: O(n×m)O(n \times m) via dynamic programming.

When to use vs. signature distance:

  • Signature distance (O(K2)O(K^2)): universal features, very fast — recommended default
  • Fréchet distance (O(nm)O(nm)): exact geometric comparison — when the precise geometric shape of the path matters

These operate on region distributions — histograms over HNSW graph regions that describe an entity’s topical composition at a given time.

cvx.wasserstein_drift(dist_a, dist_b, centroids, n_projections=50)

Section titled “cvx.wasserstein_drift(dist_a, dist_b, centroids, n_projections=50)”

Theory: The Wasserstein (optimal transport) distance measures the minimum cost of transforming one distribution into another, respecting the geometry of the space. Unlike L2L_2 between histograms, Wasserstein accounts for which regions are close vs far:

W1(p,q)=infγΠ(p,q)xydγ(x,y)W_1(p, q) = \inf_{\gamma \in \Pi(p,q)} \int \|x - y\| \, d\gamma(x, y)

Implemented as Sliced Wasserstein (random 1D projections) for computational efficiency.

Key insight: Shifting mass between neighboring regions costs less than between distant ones. Verified: distant shift (1.2) > adjacent shift (0.7) in tests.

Applications:

  • MLOps: Detect concept drift that respects feature space geometry
  • Drug discovery: Measure how a campaign’s chemical space focus shifted
  • Mental health: Track topical migration between related vs. unrelated topics

Theory: The Fisher-Rao metric is the unique Riemannian metric on the statistical manifold that is invariant under sufficient statistics (Chentsov’s theorem, 1982). For categorical distributions, it has a closed form via the Bhattacharyya angle:

dFR(p,q)=2arccos(ipiqi)d_{FR}(p, q) = 2 \arccos\left(\sum_i \sqrt{p_i \cdot q_i}\right)

Properties:

  • Symmetric, metric (satisfies triangle inequality)
  • Bounded: [0,π][0, \pi] (00 = identical, π\pi = disjoint support)
  • Invariant under reparametrization of the probability space

When to use vs. Wasserstein: Fisher-Rao is the mathematically natural distance between distributions. Wasserstein additionally incorporates the geometry of the support space (region centroids). Use Fisher-Rao for pure distributional comparison; Wasserstein when the spatial structure of regions matters.

Theory: Related to Fisher-Rao, bounded in [0,1][0, 1]: H(p,q)=1BC(p,q)2H(p, q) = \sqrt{\frac{1 - BC(p,q)}{2}} where BCBC is the Bhattacharyya coefficient.


Theory: Extracts features from the timing of events, independent of vector content. The inter-event intervals themselves encode behavioral patterns modeled by temporal point process theory.

Output: Dictionary with 11 features:

FeatureFormulaInterpretation
burstinessB=σμσ+μB = \frac{\sigma - \mu}{\sigma + \mu}1-1 = perfectly regular, 00 = Poisson (random), +1+1 = maximally bursty
memoryAutocorrelation at lag 1>0> 0: short gaps follow short gaps (clustering). <0< 0: alternating
temporal_entropyH=pilogpiH = -\sum p_i \log p_iHigher = more unpredictable timing
intensity_trendSlope of event ratePositive = accelerating, negative = decelerating
gap_cvσgap/μgap\sigma_{gap} / \mu_{gap}Coefficient of variation of intervals
circadian_strength24h Fourier amplitude00 = no daily rhythm, 11 = strong rhythm
max_gapLongest silenceWithdrawal period duration

Applications:

  • Mental health: Night posting ratio (d=0.534d = 0.534, p<0.001p < 0.001 on eRisk) and burstiness are behavioral biomarkers of depression. Circadian disruption is a known clinical marker.
  • Cybersecurity: Bursty network activity patterns indicate automated attacks or data exfiltration
  • Evolutionary computation: Accelerating event rate indicates convergence; stagnation shows flattening
  • Finance: Trading bursts precede volatility spikes; memory coefficient detects algorithmic trading patterns

References: Goh & Barabási (2008) introduced burstiness/memory. Hawkes (1971) formalized self-exciting processes.


cvx.topological_features(points, n_radii=20, persistence_threshold=0.1)

Section titled “cvx.topological_features(points, n_radii=20, persistence_threshold=0.1)”

Theory: Persistent homology from Topological Data Analysis (TDA) tracks how the topology of a point cloud changes as a filtration radius grows. For dimension 0 (connected components):

  • Build a Vietoris-Rips complex: connect points within radius rr
  • Track when components appear (birth) and merge (death)
  • Betti number β0(r)\beta_0(r) = number of connected components at radius rr

Implemented via Union-Find on the pairwise distance graph (single-linkage equivalent). Applied on region centroids (K80K \sim 80) for tractability.

Output: Dictionary with:

  • n_components: significant clusters (persistence >> threshold)
  • max_persistence: most prominent topological gap
  • persistence_entropy: piPlogpiP-\sum \frac{p_i}{P} \log \frac{p_i}{P} — uniformity of feature lifetimes
  • betti_curve: β0(r)\beta_0(r) sampled at n_radii points

What topology reveals that geometry doesn’t:

  • Increasing β0\beta_0 over time → fragmentation (topic space splitting)
  • Decreasing β0\beta_0convergence (topics merging)
  • High persistence entropy → uniform cluster structure (no dominant gap)
  • Low persistence entropy → clear cluster separation (one dominant gap)

Applications:

  • Drug discovery: Track how the active chemical space fragments or converges during optimization campaigns
  • Molecular dynamics: Detect when the conformational landscape develops new basins (new β0\beta_0)
  • MAP-Elites: Monitor archive topology — is the solution space connected or fragmented?
  • MLOps: Detect structural changes in embedding space topology (not just distributional shifts)

References: Edelsbrunner & Harer (2010). Zigzag persistence for temporal networks (EPJ Data Science, 2023).


See RFC-006: Anchor Projection for design rationale and clinical validation.

cvx.project_to_anchors(trajectory, anchors, metric='cosine')

Section titled “cvx.project_to_anchors(trajectory, anchors, metric='cosine')”

Theory: Projects a trajectory from absolute embedding space RD\mathbb{R}^D into an anchor-relative coordinate system RK\mathbb{R}^K, where KK is the number of anchors. Each output dimension kk is the distance (cosine or L2) from the trajectory point to anchor kk:

projectedt[k]=d(xt,ak),k=1,,K\text{projected}_t[k] = d(\mathbf{x}_t, \mathbf{a}_k), \quad k = 1, \ldots, K

This is a coordinate system change, not a new analytics paradigm. The output is itself a trajectory, so all existing CVX functions (velocity, hurst_exponent, detect_changepoints, path_signature, etc.) compose with it natively.

Output: (T,K)(T, K) array — a trajectory in RK\mathbb{R}^K where each dimension represents distance to the corresponding anchor.

Applications:

  • Mental health: Measure drift toward/away from clinical poles (depression, anxiety, neutral). Anchor-relative features improved F1 from 0.600 to 0.781 on eRisk
  • Finance: Track portfolio proximity to sector archetypes (tech-heavy, defensive, balanced)
  • Drug discovery: Monitor compound evolution relative to known active/toxic/selective reference molecules
  • MLOps: Measure model embedding drift relative to known-good and known-bad reference distributions

Theory: Aggregates per-anchor distance dynamics into a fixed-size summary. For each anchor kk, computes statistics over the projected trajectory’s kk-th dimension.

Output: Dictionary per anchor with:

StatisticFormulaInterpretation
meandˉk=1Ttdk(t)\bar{d}_k = \frac{1}{T}\sum_t d_k(t)Average proximity to anchor kk
minmintdk(t)\min_t d_k(t)Closest approach to anchor kk
trendLinear slope of dk(t)d_k(t)Positive = drifting away, negative = approaching
lastdk(T)d_k(T)Current distance to anchor kk

Applications:

  • Mental health: Trend toward depression anchor with decreasing min signals progressive deterioration
  • Finance: last vs mean reveals whether current positioning is typical or extreme
  • MLOps: Rising trend across all anchors indicates the model is entering an out-of-distribution region

Theory: Linear extrapolation from trajectory history. For Neural ODE prediction, use TemporalIndex(model_path="model.pt").predict().


FunctionMental HealthFinanceDrug DiscoveryMAP-ElitesMolecular DynamicsMLOps
velocityCrisis speedRegime transitionOptimization speedExploration rateTransition speedDrift rate
hurst_exponentAnti-persistence (d=-0.41)Trending vs revertingConvergenceStagnationBasin stabilityDrift persistence
detect_changepointsBehavioral regime changeMarket regimeCampaign phaseArchive reorganizationConformational transitionDrift onset
path_signatureTrajectory pattern matchingPath-dependent optionsCampaign similaritySolution lineageFolding pathwayTraining dynamics
frechet_distanceSimilar user trajectoriesPortfolio path comparisonCampaign comparisonArchive comparisonTrajectory similarityModel comparison
wasserstein_driftTopical migrationSector rotationChemical space shiftNiche redistributionState redistributionFeature drift
fisher_rao_distanceDistribution shiftRisk profile changeActivity profile changeDiversity metricState population changeClass balance drift
event_featuresNight posting, burstinessTrading patternsExperiment cadenceGeneration dynamicsSimulation eventsRequest patterns
topological_featuresTopic fragmentationMarket structureLandscape topologyArchive connectivityConformational basinsEmbedding structure
project_to_anchorsDrift toward clinical polesSector proximityReference compound distanceArchetype trackingState proximityDistribution drift
anchor_summaryDeterioration trendPosition vs. normCampaign summaryExploration summaryBasin residenceDrift summary