Skip to content

Interpretability & Visualization

ChronosVector produces rich temporal signals — velocity, acceleration, change points, drift, trajectories, predictions. But these signals are raw data. An ML engineer monitoring drift does not want a 768-dimensional velocity vector; they want to know which dimensions changed, why those dimensions matter, and what action to take.

The interpretability layer transforms raw analytics into structured artifacts that humans can understand, consumed by any frontend: Grafana, Jupyter, React, Streamlit.

“Data for interpretation, not graphics.” The cvx-explain crate does not render SVGs or HTML. It produces structured JSON/protobuf that any frontend can render. CVX’s responsibility ends at producing the data; rendering is the consumer’s job.

cvx-explain is a library crate that transforms outputs from cvx-analytics and cvx-query into interpretable artifacts. It does not access the index directly — it consumes query results and transforms them, maintaining clean separation between computation and presentation.

Question it answers: “My entity drifted — but which dimensions drove the change?”

Given a drift between timestamps t1t_1 and t2t_2, this artifact identifies which embedding dimensions contributed most. The algorithm computes the per-dimension absolute delta vt2[d]vt1[d]|v_{t_2}[d] - v_{t_1}[d]| for each dimension dd, ranks them by contribution to total drift, and calculates a cumulative Pareto distribution.

Example output:

{
"entity_id": 42,
"from_timestamp": 1640000000,
"to_timestamp": 1700000000,
"total_magnitude": 0.47,
"pareto_80_count": 23,
"dimension_contributions": [
{ "dimension_index": 42, "label": "medical", "contribution_fraction": 0.12, "direction": 0.34 },
{ "dimension_index": 157, "label": "technology", "contribution_fraction": 0.09, "direction": -0.21 },
{ "dimension_index": 384, "label": null, "contribution_fraction": 0.07, "direction": 0.15 }
]
}

Interpretation: “80% of the drift is concentrated in 23 of 768 dimensions. Dimensions [42, 157, 384] are the largest contributors.” When dimension labels are available, this becomes: “Medical dimensions increased by 340%.”

Complexity: O(D)O(D) where DD is the embedding dimensionality. Target latency: < 5ms for D=768D=768.

Question it answers: “What does this entity’s evolution look like?”

Projects a high-dimensional trajectory to 2D or 3D for visualization. Two methods are available:

  • PCA — deterministic, fast, linear. Good default. Variance explained is returned so you know how much information is preserved.
  • UMAP — non-linear, preserves local neighborhood structure. Better for discovering clusters but stochastic and slower.

Each projected point retains its timestamp, enabling animated trajectory rendering. Optionally, kNN neighbors at each timestamp can be included to show how the entity’s neighborhood evolves.

{
"entity_id": 42,
"projection_method": "pca",
"target_dims": 2,
"variance_explained": [0.42, 0.18],
"points": [
{ "timestamp": 1640000000, "coords": [0.12, -0.34] },
{ "timestamp": 1650000000, "coords": [0.45, -0.22] },
{ "timestamp": 1660000000, "coords": [0.78, 0.11] }
]
}

Interpretation: “You can see how ‘machine learning’ moved from the ‘statistics’ region toward the ‘deep learning’ region between 2015 and 2020.”

Question it answers: “When did this entity change, and what happened at each change point?”

Transforms a raw list of change points into a timeline annotated with interpretive context. For each change point, the artifact includes:

  • Severity score and z-score (relative to the entity’s historical volatility)
  • Top-K dimensions that changed
  • kNN neighbors before and after the change (showing how the entity’s context shifted)
  • An optional human-readable narrative (when dimension labels are available)

Periods between change points are reported as “stable segments” with summary statistics.

{
"entity_id": 42,
"change_points": [
{
"timestamp": 1584230400,
"severity": 0.92,
"z_score": 3.7,
"method": "pelt",
"top_dimensions": [
{ "dimension_index": 42, "label": "health", "absolute_delta": 0.34 }
],
"neighbors_before": [
{ "entity_id": 101, "label": "beer", "distance": 0.12 }
],
"neighbors_after": [
{ "entity_id": 201, "label": "COVID", "distance": 0.08 }
],
"narrative": "Severe change in March 2020. Health-related dimensions increased dramatically. Nearest neighbors shifted from [beer, royal, solar] to [COVID, pandemic, virus]."
}
],
"stable_segments": [
{ "from_timestamp": 1577836800, "to_timestamp": 1584230400, "mean_drift_rate": 0.02, "volatility": 0.005 }
]
}

Question it answers: “When did these entities start diverging from each other?”

For a set of entities, computes pairwise distance time series and detects divergence/convergence events using PELT on the distance series.

{
"entity_ids": [101, 102, 103],
"events": [
{
"entity_a": 101, "entity_b": 102,
"timestamp": 1640000000,
"event_type": "Divergence",
"magnitude": 0.35
}
]
}

Interpretation: “‘ML’ and ‘AI’ were converging until 2022, then diverged significantly. ‘Deep Learning’ and ‘Neural Networks’ remain stably related.”

For large cohorts (>100 entities), representative entities per cluster are computed first to keep costs manageable.

Question it answers: “Which dimensions are active at which times?”

Produces a matrix (time bins x dimensions) showing the intensity of change per dimension over time. Three variants are available:

VariantWhat it measures
Absolute changeMagnitude of delta per dimension per time bin
Relative changeDelta normalized by that dimension’s historical std
Cumulative changeRunning sum of absolute deltas

The result is a heatmap where “hot bands” reveal which aspects of the embedding are active in each period.

Question it answers: “How confident is this Neural ODE prediction, and where is it uncertain?”

Makes the Neural ODE output interpretable by providing:

  • Fan chart data: historical trajectory + prediction cone with expanding confidence intervals
  • Per-dimension uncertainty: which dimensions are most/least certain in the prediction
  • Baseline comparison: Neural ODE vs linear extrapolation vs historical mean
  • Trajectory dynamics: is the entity accelerating, decelerating, or stable?

Interpretation: “The prediction for ‘transformer’ in 2027 has high confidence in syntactic dimensions (±0.02\pm 0.02) but low confidence in application dimensions (±0.15\pm 0.15). The trajectory shows deceleration — the concept is stabilizing.”

All endpoints are under the /v1/explain/ prefix:

EndpointMethodArtifact
/v1/explain/entities/{id}/drift-attributionGETDriftAttribution
/v1/explain/entities/{id}/trajectory-projectionGETProjectedTrajectory
/v1/explain/entities/{id}/changepoint-narrativeGETAnnotatedTimeline
/v1/explain/entities/{id}/dimension-heatmapGETDimensionHeatmap
/v1/explain/entities/{id}/predictionGETPredictionExplanation
/v1/explain/cohort-divergencePOSTCohortDivergenceMap

A gRPC streaming endpoint WatchDriftExplained is also available for real-time drift attribution as events are detected.

Several artifacts are enriched with optional dimension labels — semantic names for each embedding dimension (e.g., dim[42] = “medical”). Without them, outputs use numeric indices. With them, narratives become human-readable: “Medical dimensions increased 340%” instead of “dim[42] increased 0.34”.

Labels are provided via configuration or entity schema and stored in the metadata column family.

OperationLatency target
Drift attribution (D=768D=768)< 5ms
PCA projection (1K points, D=768D=768)< 50ms
UMAP projection (1K points, D=768D=768)< 2s
Heatmap (365 days, daily, D=768D=768)< 100ms
Cohort divergence (10 entities, 365 days)< 1s
Prediction explanation< 10ms