RFC-004: Graph-Based Semantic Regions
This RFC proposes using the HNSW graph’s hierarchical structure to define semantic regions — natural clusters derived from graph topology — transforming noisy per-post trajectories into smooth, interpretable temporal signals.
Problem
Section titled “Problem”Individual post embeddings are too noisy for temporal analysis. A depressed user still writes about sports, food, weather — topics that dominate the embedding trajectory. Direct analysis on D=384 raw embeddings yields:
| Metric | Result (eRisk) |
|---|---|
| Velocity Cohen’s d | -0.03 (no discrimination) |
| PELT change points | 0 detected |
| Hurst p-value | 0.25 (not significant) |
Key Insight
Section titled “Key Insight”The HNSW graph already is a hierarchical clustering:
- Level 0: all nodes
- Level L: ~N/M^L “hub” nodes (natural centroids)
- Choosing level controls granularity
Solution
Section titled “Solution”- Extract regions from HNSW level L (e.g., ~30-500 regions)
- Assign each post to nearest region hub (O(log N) via graph descent)
- Compute temporal distribution per user over regions in sliding windows
- Smooth with EMA:
- Apply CVX analytics (velocity, CPD, Hurst) on smooth K-dimensional signal
Result
Section titled “Result”Instead of “dim_127 changed by 0.3”, we get: “user shifted from 40% social to 60% emotional distress over 3 weeks”.
New API
Section titled “New API”regions = index.regions(level=2) # graph-derived clusterstraj = index.region_trajectory(uid, level=2, window_days=7, alpha=0.3)members = index.region_members(region_id=7) # single regionall_members = index.region_assignments(level=2) # all regions in one O(N) passPhases
Section titled “Phases”| Phase | Scope | Effort |
|---|---|---|
| 1 | Graph region extraction (nodes_at_level, assign_region) | Low |
| 2 | Region trajectory computation (sliding window + EMA) | Medium |
| 3 | Region inspection (region_members, region_assignments) | Low |
| 4 | Python bindings | Low |
| 5 | Tutorial B1 rewrite with real results | Medium |
See full RFC in design/CVX_RFC_004_Graph_Semantic_Regions.md.