Skip to content

B1: Mental Health Detection with MentalRoBERTa + Temporal Features

Notebooks: Per-dataset analysis following Research Protocol 001 v2

We present a multi-signal approach to early depression detection from social media combining: (1) MentalRoBERTa embeddings (D=768, domain-adapted to mental health language), (2) behavioral posting patterns (inter-post gaps, circadian activity, posting bursts), and (3) graph-based semantic regions from ChronosVector’s HNSW hierarchy.

On eRisk (1.36M Reddit posts, 2,285 users), the full model achieves ROC-AUC = 0.911 [0.875, 0.941] on held-out 2022 test data, with early detection capability: AUC = 0.849 using only 10% of a user’s posts. On CLPsych (Twitter), temporal embeddings reach AUC = 0.804 with early detection at AUC = 0.813 at 20% of posts.


ModeleRisk AUCeRisk F1CLPsych AUCCLPsych F1Dims
Behavioral only0.6430.0200.5850.30911
Static MentalRoBERTa0.9010.4570.7870.579768
Temporal MentalRoBERTa0.9100.4430.8010.559768
Region L3 only0.8900.4180.7700.491~80
Temporal + Behavioral0.9070.4320.8040.571779
Full (Temp+Region+Behav)0.9110.4580.7900.522~880

Best Model — Test Set (95% Bootstrap CI)

Section titled “Best Model — Test Set (95% Bootstrap CI)”

eRisk (Full model, ntestn_{test} = 1,398):

MetricPoint95% CI
ROC-AUC0.911[0.875, 0.941]
F10.456[0.346, 0.558]
Precision0.717[0.579, 0.848]
Recall0.336[0.239, 0.433]

CLPsych (Static MentalRoBERTa, ntestn_{test} = 203):

MetricPoint95% CI
ROC-AUC0.786[0.722, 0.850]
F10.576[0.464, 0.683]
Precision0.712[0.583, 0.838]
Recall0.487[0.370, 0.603]
% PostseRisk AUCCLPsych AUC
10%0.8490.780
20%0.8580.813
50%0.8880.795
100%0.9080.792

Key result: With only 10% of a user’s post history, the model already achieves AUC > 0.84 on eRisk. This validates early detection feasibility.


MentalRoBERTa (pre-trained on Reddit mental health forums) captures emotional and clinical language that general-purpose models miss. The static baseline with MentalRoBERTa (AUC = 0.901) is already strong — the embedding model is the single most impactful choice.

2. Behavioral Signals: Night Posting as Biomarker

Section titled “2. Behavioral Signals: Night Posting as Biomarker”

On eRisk, the night posting ratio (posts between 00:00–06:00) is the strongest behavioral discriminator:

  • Depression: 31% night posts vs Control: 22% (d = 0.534, p < 0.001)
  • Circadian disruption is a known clinical marker of depression

On CLPsych, posting gap variability (gap_cv) is discriminative (d = 0.315, p < 0.001) — depression users have more irregular posting patterns.

At HNSW Level 3 (~60–97 coarse regions), several regions show large effect sizes:

  • eRisk: regions r3_80 (d = 1.05) and r3_72 (d = 1.06) — depression users heavily over-represented
  • With only 99 dimensions, Region L3 achieves AUC = 0.890 (eRisk) — comparable to 768-dim embeddings

The AUC@k% curves show signal is present early:

  • eRisk: AUC > 0.84 with 10% of posts
  • CLPsych: AUC > 0.80 with 20% of posts

This means screening can begin very early in a user’s posting history.


Posts → MentalRoBERTa (D=768) → CVX Index → Region Discovery (L3)
│ │
├─ Relative time (t_rel, gap) ├─ Region distribution
├─ Behavioral features (gaps, circadian) ├─ Region entropy
└─ Recency-weighted aggregation └─ Top-3 concentration
│ │
└──────── Feature Concatenation ────────┘
Random Forest (balanced)
Depression / Control
import chronos_vector as cvx
index = cvx.TemporalIndex(m=16, ef_construction=200, ef_search=100)
# ... ingest posts with relative timestamps ...
# Top-down region discovery
regions_l3 = index.regions(level=3) # ~60-97 coarse regions
regions_l2 = index.regions(level=2) # ~1,000 fine regions
# Region trajectory with EMA smoothing
traj = index.region_trajectory(entity_id=uid, level=3, window_days=14, alpha=0.3)
GroupFeaturesSource
Behavioral (11)mean_gap, std_gap, gap_cv, gap_trend, night_ratio, burst_count, …Posting patterns
Embedding (768)Recency-weighted mean of MentalRoBERTa embeddingsmental/mental-roberta-base
Region L3 (~80)Post distribution across coarse HNSW regions + entropyCVX graph hierarchy

Terminal window
# Setup
conda activate cvx
pip install transformers torch
cd crates/cvx-python && maturin develop --release && cd ../..
# Generate embeddings (on GPU)
python scripts/generate_embeddings_v2.py --dataset erisk
# Apply splits
python scripts/add_splits.py
# Run analysis
cd notebooks && jupyter notebook B1_eRisk.ipynb

Requirements: ~10 GB RAM for loading parquets, ~15 min CVX ingestion per dataset.


  1. Couto, M. et al. (2025). Temporal word embeddings for psychological disorder early detection. JHIR.
  2. Ji, S. et al. (2022). MentalBERT: Publicly available pretrained language models for mental healthcare. LREC.
  3. Coppersmith, G. et al. (2018). NLP of social media as screening for suicide risk. BMI Insights.
  4. Losada, D. et al. (2019). eRisk: early risk prediction on the internet. CLEF Working Notes.
  5. Killick, R. et al. (2012). Optimal detection of changepoints. JASA.
  6. Malkov, Y.A. & Yashunin, D.A. (2018). Efficient and robust ANN using HNSW graphs. IEEE TPAMI.
  7. DS@GT (2025). Temporal attention models for eRisk 2025. CLEF Working Notes.