Skip to content

Clinical Anchoring — DSM-5 Anchor Projection for Depression Detection

Standard sentence embeddings compress clinical signals into opaque high-dimensional spaces where syntactic structure masks symptom evolution. We introduce anchor projection — a coordinate system transformation implemented natively in ChronosVector (CVX) — that re-expresses user trajectories relative to DSM-5 symptom reference vectors. On the eRisk depression detection task with a proper temporal split (train: 2017+2018, test: 2022), anchor-projected features achieve F1=0.744 and AUC=0.886, compared to F1=0.600 with absolute temporal features alone. Early detection at 10% of post history yields F1=0.673.

  • eRisk shared task (Losada et al., 2017-2022): Early risk detection of depression from social media posts. Best systems use transformer-based classifiers on user-level features.
  • Concept-based explanations (Kim et al., TCAV, 2018): Testing with Concept Activation Vectors measures model sensitivity to human-defined concepts. Our anchor projection applies a similar idea at the data level.
  • Clinical NLP for mental health (Coppersmith et al., 2018; Harrigian et al., 2020): Feature engineering from social media text for mental health detection. Most approaches treat users as static feature vectors.
  • Temporal dynamics in depression (De Choudhury et al., 2013): Pioneering work on temporal patterns in social media for depression, but without vector trajectory analysis.
  • eRisk dataset: 1.36M Reddit posts from 2,285 users
  • Embeddings: MentalRoBERTa (D=768), pre-trained on mental health corpora
  • Subset: 466 balanced users (233 depression, 233 control), 225,962 posts
  1. Index Construction: bulk_insert 225K vectors with save/load caching (avoids 500s rebuild)
  2. DSM-5 Anchor Vectors: 9 symptom anchors (depressed mood, anhedonia, sleep disturbance, fatigue, worthlessness, concentration, suicidal ideation, appetite, psychomotor) + 1 healthy baseline, each encoded as MentalRoBERTa centroid of 3-5 representative phrases
  3. Anchor Projection: cvx.project_to_anchors(trajectory, anchors, metric='cosine') → trajectory in ℝ¹⁰
  4. Feature Extraction: cvx.anchor_summary() (mean, min, trend per anchor), cvx.hurst_exponent() on projected trajectory, cvx.velocity() in anchor space, topic polarization (dispersion), velocity differential (cvx.drift() for consecutive posts)
  5. Classification: Logistic Regression with class_weight='balanced', both 5-fold CV and temporal split
  • Train: eRisk 2017+2018 editions (226 users)
  • Test: eRisk 2022 edition (236 users) — completely unseen
ModelF1AUCPrecisionRecall
B1 Baseline (absolute features)0.6000.6390.5900.614
Anchor Only0.7460.8490.7390.759
Polarization Only0.5990.6650.6530.556
Velocity Only0.5470.5540.5200.582
Combined (B2)0.7810.8630.7750.789

Temporal Split (Train 2017+2018 → Test 2022)

Section titled “Temporal Split (Train 2017+2018 → Test 2022)”
MetricValue
F10.744
Precision0.659
Recall0.856
AUC0.886
% of PostsF1AUC
10%0.6730.788
20%0.6940.811
30%0.7190.829
50%0.7290.858
100%0.7530.895
FunctionRole
project_to_anchors(metric='cosine')Transform ℝ⁷⁶⁸ → ℝ¹⁰ symptom coordinates
anchor_summary()Mean, min, trend, last distance per anchor
hurst_exponent(projected)Persistence of approach to depression
velocity(projected)Rate of change in symptom space
drift(post_t, post_t+1)Consecutive cosine displacement
save() / load()Index persistence (avoid 500s rebuild)
Terminal window
conda activate cvx
cd crates/cvx-python && maturin develop --release && cd ../..
jupyter notebook notebooks/B2_clinical_anchoring.ipynb

See the full tutorial with Plotly plots.


← Back to White Paper