Skip to content

Unified Theory: Temporal Vector Analytics for Intelligent Memory

Time converts vectors into trajectories. Trajectories have mathematical structure that static vectors cannot capture. Exploiting this structure produces better decisions than similarity-based retrieval alone.

ChronosVector (CVX) implements this thesis across six layers, from raw storage to probabilistic reasoning. Each layer builds on the previous, and no layer alone is sufficient.


A temporal point (e,t,v)(e, t, \mathbf{v}) captures an entity ee observed at time tt with embedding vRD\mathbf{v} \in \mathbb{R}^D. A trajectory is a time-ordered sequence:

Te={(t1,v1),(t2,v2),,(tn,vn)}t1<t2<<tn\mathcal{T}_e = \{(t_1, \mathbf{v}_1), (t_2, \mathbf{v}_2), \ldots, (t_n, \mathbf{v}_n)\} \quad t_1 < t_2 < \cdots < t_n

Standard vector databases store only v\mathbf{v} — they discard tt and ee. CVX preserves all three, enabling every subsequent layer.

  • HNSW index: O(logN)O(\log N) approximate nearest neighbor search with SIMD distance kernels
  • Temporal filtering: RoaringBitmap pre-filter by time range (< 1 byte per vector)
  • Episode encoding: entity_id = episode_id × 10000 + step_index groups steps into episodes

Modern embedding models produce vectors in a narrow cone — all pairwise cosine similarities 0.96\sim 0.96. Centering (subtracting the global mean μ\boldsymbol{\mu}) amplifies the discriminative signal 30×:

vcentered=vμ,μ=1Ni=1Nvi\mathbf{v}_{\text{centered}} = \mathbf{v} - \boldsymbol{\mu}, \quad \boldsymbol{\mu} = \frac{1}{N}\sum_{i=1}^N \mathbf{v}_i

This is not optional — without centering, all downstream analytics operate on noise (Ethayarajh, EMNLP 2019; Su et al., ACL 2021).


3. Layer 1: Differential Calculus on Trajectories

Section titled “3. Layer 1: Differential Calculus on Trajectories”

Treating trajectories as differentiable curves in RD\mathbb{R}^D enables kinematic analysis:

v(t)v(t+Δt)v(tΔt)2Δt\mathbf{v}'(t) \approx \frac{\mathbf{v}(t + \Delta t) - \mathbf{v}(t - \Delta t)}{2\Delta t}

The magnitude v(t)\|\mathbf{v}'(t)\| measures rate of semantic change. High velocity = rapid behavioral shift.

driftL2(t1,t2)=v(t2)v(t1)2\text{drift}_{L2}(t_1, t_2) = \|\mathbf{v}(t_2) - \mathbf{v}(t_1)\|_2

Cumulative displacement from an initial state. When projected onto anchor vectors, drift measures proximity to domain-specific concepts (e.g., DSM-5 symptoms).

E[R(n)S(n)]cnH\mathbb{E}\left[\frac{R(n)}{S(n)}\right] \sim c \cdot n^H

H>0.5H > 0.5: trajectory is persistent (trends continue). H<0.5H < 0.5: anti-persistent (trends reverse). H=0.5H = 0.5: random walk.

minτ1,,τmi=1m+1[C(yτi1+1:τi)+β]\min_{\tau_1, \ldots, \tau_m} \sum_{i=1}^{m+1} \left[\mathcal{C}(\mathbf{y}_{\tau_{i-1}+1:\tau_i}) + \beta\right]

Detects moments when the statistical properties of the trajectory change abruptly — regime transitions, onset events, behavioral shifts (Killick et al., 2012).


The depth-kk truncated signature of a path X:[0,T]RD\mathbf{X}: [0,T] \to \mathbb{R}^D is:

S(X)i1,,ik=0T0tk0t2dXt1i1dXtkikS(\mathbf{X})^{i_1, \ldots, i_k} = \int_0^T \int_0^{t_k} \cdots \int_0^{t_2} dX^{i_1}_{t_1} \cdots dX^{i_k}_{t_k}

This is a universal, reparametrization-invariant descriptor: two trajectories with the same shape (regardless of speed) produce similar signatures. At depth 2, the D+D2D + D^2 features capture both displacement and signed area (rotation/order).

Persistent homology tracks the connectivity structure of trajectory point clouds:

β0(r)=number of connected components at radius r\beta_0(r) = \text{number of connected components at radius } r

The persistence diagram encodes the birth/death of topological features across scales — more robust than single-scale clustering.


When trajectories are projected onto regions or anchors, they become probability distributions over discrete states. The geometry of distributions requires specialized metrics:

The geodesic on the statistical manifold of categorical distributions:

dFR(p,q)=2arccos(ipiqi)d_{FR}(\mathbf{p}, \mathbf{q}) = 2 \arccos\left(\sum_i \sqrt{p_i \cdot q_i}\right)

Unlike KL divergence, this is a true metric (symmetric, triangle inequality). Range: [0,π][0, \pi].

The optimal transport cost, respecting the geometry of the underlying space:

W(p,q)=infγΠ(p,q)xydγ(x,y)W(\mathbf{p}, \mathbf{q}) = \inf_{\gamma \in \Pi(\mathbf{p}, \mathbf{q})} \int \|\mathbf{x} - \mathbf{y}\| \, d\gamma(\mathbf{x}, \mathbf{y})

Unlike Fisher-Rao, Wasserstein accounts for how far apart the categories are — moving mass between nearby regions costs less than between distant ones.


Each entity’s trajectory has an intrinsic order — step nn precedes step n+1n+1. Temporal edges (TemporalEdgeLayer) encode this:

  • successor(node) → what happened next for this entity
  • predecessor(node) → what happened before
  • causal_search(query, k, temporal_context) → find similar states + walk forward/backward

This enables the continuation pattern: “find where someone was in my situation, show me what they did next.”

Beyond temporal succession, typed edges encode relational structure:

Edge TypeMeaningExample
CausedSuccessThis action contributed to a winretrieve + follow → win → edge
CausedFailureThis action was present during failureretrieve + follow → fail → edge
SameActionTypeSame abstract action in different contexts”navigate” in episode A ↔ “navigate” in episode B
RegionTransitionObserved movement between semantic clustersRegion 5 → Region 12 with probability 0.7

For cross-entity causal discovery:

XGrangerY    P(YtY<t,X<t)P(YtY<t)X \xrightarrow{\text{Granger}} Y \iff P(Y_t \mid Y_{<t}, X_{<t}) \neq P(Y_t \mid Y_{<t})

“Does entity A’s trajectory history improve prediction of entity B’s future?” — implemented as F-test on lagged regression residuals.


HNSW regions define a discrete state space. Observed trajectories define transitions. The result is a Markov Decision Process:

P(ss,a)=count(sas)scount(sas)P(s' \mid s, a) = \frac{\text{count}(s \xrightarrow{a} s')}{\sum_{s''} \text{count}(s \xrightarrow{a} s'')}

P(successs,a)=1+[reward>0.5]2+n(Beta prior)P(\text{success} \mid s, a) = \frac{1 + \sum[\text{reward} > 0.5]}{2 + n} \quad \text{(Beta prior)}

This answers: “in states like mine, which action type has the highest success probability?”

When variables have conditional dependencies that a linear scorer cannot capture:

P(successtask,region,action)=P(task,region,action,success)P(task,region,action)P(\text{success} \mid \text{task}, \text{region}, \text{action}) = \frac{P(\text{task}, \text{region}, \text{action}, \text{success})}{P(\text{task}, \text{region}, \text{action})}

The network factorizes the joint distribution via the DAG structure:

P(X)=iP(Xiparents(Xi))P(\mathbf{X}) = \prod_{i} P(X_i \mid \text{parents}(X_i))

Each CPT is learned from observations with Laplace smoothing. Inference computes posteriors via variable elimination.

The discovery from E7c/E7d/E7e: blind reward decay destroys useful experts. Context-aware decay only penalizes experts when:

  1. Task type matches the failed game
  2. The agent actually followed the expert’s action
  3. The expert is in a low-quality region (informed by MDP)

Encodes compositional structure that neither vectors nor probabilities capture:

  • Task plans: heat_then_place requires find → take → heat → take → put
  • Shared sub-plans: heat and clean both start with find → take
  • Constraints: after take, valid next actions are go/use/put, not take again
  • Transfer: if I know how to find → take for cleaning, I can reuse it for heating

The graph enables structural guidance during retrieval: the agent knows what step comes next from the graph, and uses CVX to find the best concrete realization.


The six layers compose into a closed-loop active memory:

OBSERVATION → embed → HNSW search → candidates
Bayesian scoring (Layer 5)
├── Similarity (Layer 0)
├── Recency (Layer 1)
├── Reward (Layer 4 — typed edges)
├── P(success | context) (Layer 5 — BN/MDP)
└── Task plan step (Layer 6 — KG)
LLM chooses action
OUTCOME → update
├── Win → add to index (Layer 0)
├── Win → add CausedSuccess edges (Layer 4)
├── Fail → context-aware decay (Layer 5)
├── Update MDP transitions (Layer 5)
├── Update BN posteriors (Layer 5)
└── Update KG if new structure learned (Layer 6)
LayerWhat it providesWhat it cannot do
0 (HNSW)Find similar statesDistinguish success from failure
1 (Calculus)Measure change speedPredict next action
2 (Signatures)Compare trajectory shapesReason about task structure
3 (Distributions)Compare population-level patternsMake individual decisions
4 (Causality)Attribute outcomes to actionsEstimate probabilities
5 (Bayesian)Compute conditional probabilitiesRepresent compositional knowledge
6 (Knowledge)Encode task structureScore candidates numerically

Each layer addresses a specific limitation of the layers below it. The full system is more than the sum of its parts.


ExperimentWhat it testedResult
B2 (clinical anchoring)Layer 0+1: centered drift detectionF1=0.744 on eRisk depression
B8 (ParlaMint)Layer 0+3: rhetorical profilingF1=0.94 predicting speaker gender
E5 (ALFWorld GPT-4o)Layer 0+4: causal retrieval20% → 43.3% task completion
E7b (online learning)Layer 0+4+5: reward decay6.7% → 26.7% across 3 rounds
E7e (context-aware)Layer 5: conditional decayPeak 30%, plateau 19.5% (vs 14.8%)

  1. Ethayarajh (2019). “How Contextual are Contextualized Word Representations?” EMNLP
  2. Su et al. (2021). “Whitening Sentence Representations for Better Semantics and Faster Retrieval.” ACL
  1. Lyons (1998). Differential Equations Driven by Rough Signals
  2. Carlsson (2009). “Topology and Data.” Bulletin of the AMS
  1. Killick et al. (2012). “Optimal detection of changepoints with a linear computational cost.” JASA
  1. Amari & Nagaoka (2000). Methods of Information Geometry
  1. Pearl (1988). Probabilistic Reasoning in Intelligent Systems
  2. Koller & Friedman (2009). Probabilistic Graphical Models
  1. Hogan et al. (2021). “Knowledge Graphs.” ACM Computing Surveys
  1. Park et al. (2023). “Generative Agents.” UIST
  2. Shinn et al. (2023). “Reflexion.” NeurIPS
  3. Chen et al. (2021). “Decision Transformer.” NeurIPS
  4. Hafner et al. (2023). “DreamerV3.” arXiv
  1. Bareinboim et al. (2022-2024). Causal Reinforcement Learning
  1. Villani (2008). Optimal Transport: Old and New