Skip to content

RFC-004: Graph-Based Semantic Regions

This RFC proposes using the HNSW graph’s hierarchical structure to define semantic regions — natural clusters derived from graph topology — transforming noisy per-post trajectories into smooth, interpretable temporal signals.

Individual post embeddings are too noisy for temporal analysis. A depressed user still writes about sports, food, weather — topics that dominate the embedding trajectory. Direct analysis on D=384 raw embeddings yields:

MetricResult (eRisk)
Velocity Cohen’s d-0.03 (no discrimination)
PELT change points0 detected
Hurst p-value0.25 (not significant)

The HNSW graph already is a hierarchical clustering:

  • Level 0: all nodes
  • Level L: ~N/M^L “hub” nodes (natural centroids)
  • Choosing level controls granularity
  1. Extract regions from HNSW level L (e.g., ~30-500 regions)
  2. Assign each post to nearest region hub (O(log N) via graph descent)
  3. Compute temporal distribution per user over regions in sliding windows
  4. Smooth with EMA: st=αxt+(1α)st1s_t = \alpha \cdot x_t + (1-\alpha) \cdot s_{t-1}
  5. Apply CVX analytics (velocity, CPD, Hurst) on smooth K-dimensional signal

Instead of “dim_127 changed by 0.3”, we get: “user shifted from 40% social to 60% emotional distress over 3 weeks”.

regions = index.regions(level=2) # graph-derived clusters
traj = index.region_trajectory(uid, level=2, window_days=7, alpha=0.3)
members = index.region_members(region_id=7) # single region
all_members = index.region_assignments(level=2) # all regions in one O(N) pass
PhaseScopeEffort
1Graph region extraction (nodes_at_level, assign_region)Low
2Region trajectory computation (sliding window + EMA)Medium
3Region inspection (region_members, region_assignments)Low
4Python bindingsLow
5Tutorial B1 rewrite with real resultsMedium

See full RFC in design/CVX_RFC_004_Graph_Semantic_Regions.md.