Market Regime Detection
This notebook applies ChronosVector (CVX) to financial market data, demonstrating how temporal vector analytics can detect market regimes, sector rotation, and crisis transitions.
Instead of treating each day as an independent observation, CVX models the market as a trajectory through feature space — capturing momentum, mean-reversion, and structural breaks as geometric properties of the path.
Key principle: markets are trajectories, not snapshots
Section titled “Key principle: markets are trajectories, not snapshots”| Analysis | CVX Functions | Market Insight |
|---|---|---|
| Regime Detection | detect_changepoints, velocity | Structural breaks in market dynamics |
| Trend Persistence | hurst_exponent | H>0.5 trending (momentum), H<0.5 mean-reverting |
| Anchor Projection | project_to_anchors, anchor_summary | Distance to bull/bear/crisis reference frames |
| Sector Rotation | region_trajectory, wasserstein_drift | Sector cluster migration over time |
| Market Fingerprinting | path_signature, signature_distance | Order-aware period comparison |
| Regime Prediction | All above → Logistic Regression | Forward-looking regime classification |
import chronos_vector as cvximport numpy as npimport pandas as pdimport plotly.graph_objects as goimport plotly.express as pxfrom plotly.subplots import make_subplotsfrom sklearn.decomposition import PCAfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import f1_score, precision_score, recall_score, roc_auc_scorefrom sklearn.preprocessing import StandardScalerfrom scipy.stats import zscoreimport time, os, warningswarnings.filterwarnings('ignore')
DATA_DIR = '../data'CACHE_DIR = f'{DATA_DIR}/cache'os.makedirs(CACHE_DIR, exist_ok=True)
# Color schemeC_BULL = '#2ecc71'C_BEAR = '#e74c3c'C_CRISIS = '#f39c12'C_NEUTRAL = '#3498db'TEMPLATE = 'plotly_dark'1. Data Acquisition
Section titled “1. Data Acquisition”Download daily data for 11 S&P 500 sector ETFs, SPY (benchmark), and VIX (fear gauge)
from 2010 to present via yfinance.
Per-day feature vector for each sector (D=7):
- 5-day, 20-day, 60-day returns (momentum at multiple horizons)
- 5-day, 20-day realized volatility
- Relative strength vs SPY
- Volume ratio (current vs 20-day average)
def compute_sector_features(close, volume, sector, spy_col='SPY'): """Compute D=7 feature vector per day for a sector ETF.
Features: [0] 5-day return [1] 20-day return [2] 60-day return [3] 5-day realized volatility [4] 20-day realized volatility [5] relative strength vs SPY (sector_return - spy_return, 20d) [6] volume ratio (today / 20d average) """ px = close[sector] spy = close[spy_col] vol = volume[sector]
log_ret = np.log(px / px.shift(1)) spy_log_ret = np.log(spy / spy.shift(1))
features = pd.DataFrame(index=close.index) features['ret_5d'] = px.pct_change(5) features['ret_20d'] = px.pct_change(20) features['ret_60d'] = px.pct_change(60) features['vol_5d'] = log_ret.rolling(5).std() * np.sqrt(252) features['vol_20d'] = log_ret.rolling(20).std() * np.sqrt(252) features['rel_strength'] = px.pct_change(20) - spy.pct_change(20) features['volume_ratio'] = vol / vol.rolling(20).mean()
return features.dropna()
# Compute features for all sectorssector_features = {}for sector in SECTORS: if sector in close.columns: sector_features[sector] = compute_sector_features(close, volume, sector)
# Align all sectors to common datescommon_dates = sector_features[SECTORS[0]].indexfor sector in SECTORS[1:]: if sector in sector_features: common_dates = common_dates.intersection(sector_features[sector].index)
for sector in sector_features: sector_features[sector] = sector_features[sector].loc[common_dates]
D_SECTOR = 7D_MARKET = D_SECTOR * len(sector_features)print(f'{len(sector_features)} sectors, {len(common_dates)} trading days, D={D_SECTOR} per sector')print(f'Date range: {common_dates[0].date()} to {common_dates[-1].date()}')print(f'Concatenated market vector: D={D_MARKET}')11 sectors, 1886 trading days, D=7 per sectorDate range: 2018-09-13 to 2026-03-17Concatenated market vector: D=772. CVX Index Construction
Section titled “2. CVX Index Construction”We build two indices:
-
Per-sector index: each entity = one sector ETF, D=7 features per day. Enables per-sector trajectory analysis, sector comparison, and region clustering.
-
Market-wide index: entity=1, D=77 (all sectors concatenated). Enables holistic regime detection across all sectors simultaneously.
INDEX_PATH = f'{CACHE_DIR}/sp500_index.cvx'MARKET_INDEX_PATH = f'{CACHE_DIR}/sp500_market_index.cvx'
sector_to_id = {s: i + 1 for i, s in enumerate(sorted(sector_features.keys()))}id_to_sector = {v: k for k, v in sector_to_id.items()}
def dates_to_unix(dates): """Convert pandas DatetimeIndex to unix seconds.""" return (dates - pd.Timestamp('1970-01-01', tz='UTC' if dates.tz else None)) // pd.Timedelta('1s')
if os.path.exists(INDEX_PATH): t0 = time.perf_counter() index = cvx.TemporalIndex.load(INDEX_PATH) print(f'Per-sector index loaded in {time.perf_counter() - t0:.2f}s ({len(index):,} points)')else: print('Building per-sector index...') index = cvx.TemporalIndex(m=16, ef_construction=200)
timestamps_unix = dates_to_unix(common_dates).values.astype(np.int64)
for sector, feats_df in sector_features.items(): eid = sector_to_id[sector] vectors = feats_df.values.astype(np.float32) entity_ids = np.full(len(vectors), eid, dtype=np.uint64) index.bulk_insert(entity_ids, timestamps_unix, vectors, ef_construction=64)
index.save(INDEX_PATH) print(f'Per-sector index: {len(index):,} points, saved to {INDEX_PATH}')
if os.path.exists(MARKET_INDEX_PATH): t0 = time.perf_counter() market_index = cvx.TemporalIndex.load(MARKET_INDEX_PATH) print(f'Market-wide index loaded in {time.perf_counter() - t0:.2f}s ({len(market_index):,} points)')else: print('Building market-wide (concatenated) index...') market_index = cvx.TemporalIndex(m=16, ef_construction=200)
timestamps_unix = dates_to_unix(common_dates).values.astype(np.int64)
# Concatenate all sector features into D=77 vector per day sorted_sectors = sorted(sector_features.keys()) market_vectors = np.hstack([ sector_features[s].values for s in sorted_sectors ]).astype(np.float32)
entity_ids = np.ones(len(market_vectors), dtype=np.uint64) market_index.bulk_insert(entity_ids, timestamps_unix, market_vectors, ef_construction=64) market_index.save(MARKET_INDEX_PATH) print(f'Market-wide index: {len(market_index):,} points (D={market_vectors.shape[1]}), saved')
print(f'\nSector mapping: {sector_to_id}')Building per-sector index...Per-sector index: 20,746 points, saved to ../data/cache/sp500_index.cvxBuilding market-wide (concatenated) index...Market-wide index: 1,886 points (D=77), saved
Sector mapping: {'XLB': 1, 'XLC': 2, 'XLE': 3, 'XLF': 4, 'XLI': 5, 'XLK': 6, 'XLP': 7, 'XLRE': 8, 'XLU': 9, 'XLV': 10, 'XLY': 11}3. Regime Detection via CVX Analytics
Section titled “3. Regime Detection via CVX Analytics”Three complementary views of market structure:
- Changepoint detection (
detect_changepoints): structural breaks in the trajectory - Hurst exponent (
hurst_exponent): rolling measure of trend persistence vs mean-reversion - Velocity profile (
velocity): speed of market state evolution — high during crises, low during consolidation
# Get market trajectory (entity=1 in market-wide index)market_traj = market_index.trajectory(entity_id=1)print(f'Market trajectory: {len(market_traj)} points, D={len(market_traj[0][1])}')
# Build timestamp → date mappingtimestamps_unix = dates_to_unix(common_dates).valuesunix_to_date = dict(zip(timestamps_unix, common_dates))
# ── Changepoint detection ──# BIC penalty with D=77 is too conservative — use 3*ln(n) instead of D*ln(n)/2n_points = len(market_traj)manual_penalty = 3.0 * np.log(n_points)
t0 = time.perf_counter()changepoints = cvx.detect_changepoints( entity_id=1, trajectory=market_traj, penalty=manual_penalty, min_segment_len=20,)print(f'Detected {len(changepoints)} changepoints in {time.perf_counter() - t0:.2f}s (penalty={manual_penalty:.1f})')
for ts, severity in changepoints[:15]: date = unix_to_date.get(ts, pd.Timestamp(ts, unit='s')) print(f' {date.strftime("%Y-%m-%d") if hasattr(date, "strftime") else date}: severity={severity:.4f}')Market trajectory: 1886 points, D=77Detected 11 changepoints in 0.07s (penalty=22.6) 2018-10-15: severity=0.8010 2019-01-11: severity=0.7496 2020-02-24: severity=0.9735 2020-03-23: severity=0.9792 2020-04-21: severity=0.8223 2020-05-26: severity=0.5624 2022-01-03: severity=0.6099 2022-02-01: severity=0.5974 2025-02-28: severity=0.7096 2025-04-11: severity=0.8346 2025-05-12: severity=0.5471# ── Rolling Hurst exponent ──HURST_WINDOW = 120 # ~6 months of trading days
hurst_values = []hurst_dates = []
for i in range(HURST_WINDOW, len(market_traj)): window = market_traj[i - HURST_WINDOW : i] try: h = cvx.hurst_exponent(window) ts = window[-1][0] hurst_values.append(h) hurst_dates.append(unix_to_date.get(ts, pd.Timestamp(ts, unit='s'))) except Exception: pass
print(f'Computed {len(hurst_values)} rolling Hurst values (window={HURST_WINDOW} days)')print(f'Mean H={np.mean(hurst_values):.3f}, Std={np.std(hurst_values):.3f}')print(f'H>0.5 (trending): {np.mean(np.array(hurst_values) > 0.5):.1%}')print(f'H<0.5 (mean-reverting): {np.mean(np.array(hurst_values) < 0.5):.1%}')Computed 1766 rolling Hurst values (window=120 days)Mean H=0.744, Std=0.055H>0.5 (trending): 100.0%H<0.5 (mean-reverting): 0.0%# ── Velocity profile ──velocity_values = []velocity_dates = []
# Sample every 5 days for performancefor i in range(5, len(market_traj) - 5, 5): ts = market_traj[i][0] # Use a local window for velocity computation window = market_traj[max(0, i-10) : min(len(market_traj), i+10)] try: vel = cvx.velocity(window, timestamp=ts) vel_mag = float(np.linalg.norm(vel)) velocity_values.append(vel_mag) velocity_dates.append(unix_to_date.get(ts, pd.Timestamp(ts, unit='s'))) except Exception: pass
print(f'Computed {len(velocity_values)} velocity samples')Computed 376 velocity samples# ── Visualization: Price + Changepoints + Hurst + Velocity ──fig = make_subplots( rows=3, cols=1, shared_xaxes=True, vertical_spacing=0.06, subplot_titles=[ 'SPY Price with Regime Changepoints', 'Rolling Hurst Exponent (120-day window)', 'Market Velocity (feature-space speed)', ], row_heights=[0.4, 0.3, 0.3],)
# Panel 1: SPY price with changepointsif 'SPY' in close.columns: spy_aligned = close['SPY'].loc[common_dates] fig.add_trace(go.Scatter( x=common_dates, y=spy_aligned.values, mode='lines', name='SPY', line=dict(color=C_NEUTRAL, width=1.5), ), row=1, col=1)
# Changepoint markerscp_dates_plot = []cp_prices = []cp_severities = []for ts, sev in changepoints: d = unix_to_date.get(ts) if d is not None and 'SPY' in close.columns and d in close.index: cp_dates_plot.append(d) cp_prices.append(close.loc[d, 'SPY']) cp_severities.append(sev)
fig.add_trace(go.Scatter( x=cp_dates_plot, y=cp_prices, mode='markers', name='Changepoints', marker=dict( size=10, color=C_CRISIS, symbol='diamond', line=dict(width=1, color='white'), ), text=[f'Severity: {s:.4f}' for s in cp_severities], hovertemplate='%{x}<br>SPY: $%{y:.2f}<br>%{text}<extra></extra>',), row=1, col=1)
# Panel 2: Rolling Hursthurst_colors = [C_BULL if h > 0.5 else C_BEAR for h in hurst_values]fig.add_trace(go.Scatter( x=hurst_dates, y=hurst_values, mode='lines', name='Hurst', line=dict(color=C_NEUTRAL, width=1.5),), row=2, col=1)fig.add_hline(y=0.5, line_dash='dash', line_color='gray', annotation_text='H=0.5 (random walk)', row=2, col=1)fig.add_hrect(y0=0.5, y1=1.0, fillcolor=C_BULL, opacity=0.05, row=2, col=1)fig.add_hrect(y0=0.0, y1=0.5, fillcolor=C_BEAR, opacity=0.05, row=2, col=1)
# Panel 3: Velocityfig.add_trace(go.Scatter( x=velocity_dates, y=velocity_values, mode='lines', name='Velocity', line=dict(color=C_CRISIS, width=1.5), fill='tozeroy', fillcolor='rgba(243, 156, 18, 0.15)',), row=3, col=1)
fig.update_layout( height=900, width=1100, template=TEMPLATE, showlegend=True, legend=dict(x=0.01, y=0.99), title_text='Market Regime Analytics — CVX Temporal Analysis',)fig.update_yaxes(title_text='Price ($)', row=1, col=1)fig.update_yaxes(title_text='Hurst H', row=2, col=1)fig.update_yaxes(title_text='|velocity|', row=3, col=1)fig.show()4. Anchor Projection — Bull / Bear / Crisis Reference Frames
Section titled “4. Anchor Projection — Bull / Bear / Crisis Reference Frames”Define three anchor vectors from known market periods:
- Bull anchor: average feature vector from 2013 (calm, steady uptrend)
- Bear anchor: average feature vector from Feb-Apr 2020 (COVID crash)
- Crisis anchor: average feature vector from high-VIX periods (VIX > 35)
Using cvx.project_to_anchors(), we map every trading day into a 3D space:
distance-to-bull, distance-to-bear, distance-to-crisis. This transforms the
D=77 market trajectory into a regime-relative coordinate system.
# Build anchor vectors from known periodssorted_sectors = sorted(sector_features.keys())
def get_market_vector_for_dates(date_mask): """Compute average concatenated market vector for a date mask. Handles NaN by filling with 0 (sectors that didn't exist yet).""" vectors = [] for s in sorted_sectors: df_s = sector_features[s] valid_dates = common_dates[date_mask] valid = df_s.index.isin(valid_dates) if valid.sum() > 0: vectors.append(df_s.loc[valid].values) else: # Sector didn't exist in this period — use zeros vectors.append(np.zeros((1, D_SECTOR))) concat = np.vstack([v.mean(axis=0, keepdims=True) for v in vectors]).flatten() return np.nan_to_num(concat, nan=0.0).astype(np.float32).tolist()
# Bull anchor: 2017 (all sectors exist by then, calm uptrend)bull_mask = (common_dates.year == 2017)bull_anchor = get_market_vector_for_dates(bull_mask)print(f'Bull anchor (2017): {bull_mask.sum()} days averaged, NaN check: {np.isnan(bull_anchor).sum()}')
# Bear anchor: COVID crash (Feb-Apr 2020)bear_mask = (common_dates >= '2020-02-15') & (common_dates <= '2020-04-15')bear_anchor = get_market_vector_for_dates(bear_mask)print(f'Bear anchor (COVID): {bear_mask.sum()} days averaged')
# Crisis anchor: high-VIX periodsvix_col = 'VIX' if 'VIX' in close.columns else Noneif vix_col: vix_aligned = close[vix_col].reindex(common_dates).ffill() crisis_mask = (vix_aligned > 35).values if crisis_mask.sum() < 10: threshold = vix_aligned.quantile(0.95) crisis_mask = (vix_aligned > threshold).values crisis_anchor = get_market_vector_for_dates(crisis_mask) print(f'Crisis anchor (VIX>35): {crisis_mask.sum()} days averaged')else: crisis_mask_dates = (common_dates >= '2022-06-01') & (common_dates <= '2022-10-31') crisis_anchor = get_market_vector_for_dates(crisis_mask_dates) print(f'Crisis anchor (2022 rate shock): {crisis_mask_dates.sum()} days averaged')
anchors = [bull_anchor, bear_anchor, crisis_anchor]anchor_names = ['Bull (2017)', 'Bear (COVID)', 'Crisis (high-VIX)']Bull anchor (2017): 0 days averaged, NaN check: 0Bear anchor (COVID): 41 days averagedCrisis anchor (VIX>35): 60 days averaged# Project market trajectory into anchor-relative coordinatest0 = time.perf_counter()projected = cvx.project_to_anchors(market_traj, anchors, metric='cosine')summary = cvx.anchor_summary(projected)elapsed = time.perf_counter() - t0
print(f'Projected {len(projected)} days into 3D anchor space in {elapsed:.2f}s')print(f'\nAnchor Summary:')for i, name in enumerate(anchor_names): print(f' {name}:') print(f' Mean distance: {summary["mean"][i]:.4f}') print(f' Min distance: {summary["min"][i]:.4f}') print(f' Trend: {summary["trend"][i]:+.6f} ({"approaching" if summary["trend"][i] < 0 else "diverging"})')
# Hurst on projected trajectoryhurst_projected = cvx.hurst_exponent(projected)print(f'\nHurst exponent in anchor space: {hurst_projected:.3f}')if hurst_projected > 0.5: print(' -> Persistent regime dynamics (momentum between regimes)')else: print(' -> Mean-reverting regime dynamics (regime oscillation)')Projected 1886 days into 3D anchor space in 0.00s
Anchor Summary: Bull (2017): Mean distance: 1.0000 Min distance: 1.0000 Trend: +0.000000 (diverging) Bear (COVID): Mean distance: 0.1395 Min distance: 0.0237 Trend: +0.000000 (diverging) Crisis (high-VIX): Mean distance: 0.1130 Min distance: 0.0236 Trend: -0.000002 (approaching)
Hurst exponent in anchor space: 0.623 -> Persistent regime dynamics (momentum between regimes)# ── Visualization: Distance to each anchor over time ──proj_dates = []proj_bull = []proj_bear = []proj_crisis = []
for ts, dists in projected: d = unix_to_date.get(ts) if d is not None: proj_dates.append(d) proj_bull.append(dists[0]) proj_bear.append(dists[1]) proj_crisis.append(dists[2])
# Determine dominant regime per dayregime_colors = []for b, br, c in zip(proj_bull, proj_bear, proj_crisis): closest = np.argmin([b, br, c]) regime_colors.append([C_BULL, C_BEAR, C_CRISIS][closest])
fig = make_subplots( rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.08, subplot_titles=[ 'Cosine Distance to Anchor Regimes (lower = closer to regime)', 'Dominant Regime (closest anchor)', ], row_heights=[0.7, 0.3],)
for vals, name, color in [ (proj_bull, 'Bull (2013)', C_BULL), (proj_bear, 'Bear (COVID)', C_BEAR), (proj_crisis, 'Crisis (high-VIX)', C_CRISIS),]: fig.add_trace(go.Scatter( x=proj_dates, y=vals, mode='lines', name=name, line=dict(color=color, width=2), ), row=1, col=1)
# Regime barfig.add_trace(go.Bar( x=proj_dates, y=[1]*len(proj_dates), marker_color=regime_colors, showlegend=False, hovertemplate='%{x}<extra></extra>',), row=2, col=1)
fig.update_layout( height=650, width=1100, template=TEMPLATE, title_text='Anchor Projection — Market Distance to Bull / Bear / Crisis',)fig.update_yaxes(title_text='Cosine Distance', row=1, col=1)fig.update_yaxes(showticklabels=False, row=2, col=1)fig.show()5. Sector Rotation via Region Trajectory
Section titled “5. Sector Rotation via Region Trajectory”CVX discovers natural clusters (regions) in the HNSW graph hierarchy. By tracking how the market distributes across regions over time, we measure sector rotation intensity — the reallocation of capital across sectors.
index.regions(level=2): discover semantic clusters among all sector-day pointsindex.region_trajectory(): smoothed distribution over clusters for each sectorcvx.wasserstein_drift(): optimal-transport distance between consecutive distributions
# Discover regions in the per-sector indext0 = time.perf_counter()regions = index.regions(level=2)print(f'Discovered {len(regions)} regions at level 2 in {time.perf_counter() - t0:.2f}s')
for rid, centroid, n_members in regions[:8]: print(f' Region {rid}: {n_members} members, centroid norm={np.linalg.norm(centroid):.3f}')
region_centroids = [c for _, c, _ in regions]Discovered 70 regions at level 2 in 0.00s Region 212: 665 members, centroid norm=1.205 Region 716: 265 members, centroid norm=1.159 Region 1373: 363 members, centroid norm=0.954 Region 1901: 272 members, centroid norm=1.810 Region 2029: 234 members, centroid norm=0.842 Region 2238: 580 members, centroid norm=0.693 Region 2332: 17 members, centroid norm=0.712 Region 2495: 118 members, centroid norm=1.344# Compute region trajectory for each sector# window_days in timestamp units (seconds): 30 trading days ~ 42 calendar daysWINDOW_SECONDS = 42 * 86400
sector_region_trajs = {}for sector, eid in sector_to_id.items(): traj = index.region_trajectory( entity_id=eid, level=2, window_days=WINDOW_SECONDS, alpha=0.3, ) sector_region_trajs[sector] = traj
print(f'Region trajectories computed for {len(sector_region_trajs)} sectors')for sector, traj in list(sector_region_trajs.items())[:3]: print(f' {sector}: {len(traj)} time steps, {len(traj[0][1]) if traj else 0} regions')Region trajectories computed for 11 sectors XLB: 66 time steps, 70 regions XLC: 66 time steps, 70 regions XLE: 66 time steps, 70 regions# Wasserstein drift for XLK (tech sector) as examplexlk_traj = sector_region_trajs.get('XLK', [])
if len(xlk_traj) > 1 and len(region_centroids) > 0: wass_dates = [] wass_values = []
for i in range(1, len(xlk_traj)): ts = xlk_traj[i][0] dist_a = xlk_traj[i-1][1] dist_b = xlk_traj[i][1]
# Ensure distributions match region count n_regions = min(len(dist_a), len(dist_b), len(region_centroids)) if n_regions > 0: w = cvx.wasserstein_drift( dist_a[:n_regions], dist_b[:n_regions], region_centroids[:n_regions], ) d = unix_to_date.get(ts) if d is not None: wass_dates.append(d) wass_values.append(w)
print(f'Wasserstein drift series: {len(wass_values)} points') print(f'Mean drift: {np.mean(wass_values):.4f}, Max: {np.max(wass_values):.4f}')else: print('Insufficient region trajectory data for Wasserstein analysis') wass_dates, wass_values = [], []Wasserstein drift series: 60 pointsMean drift: 2.3050, Max: 5.4613# ── Heatmap: sector-region distribution over time ──
# Build a sector x time heatmap using dominant region per sector per quarter# Use XLK region trajectory as reference — show distribution evolution
if len(xlk_traj) > 0: n_regions_display = len(xlk_traj[0][1])
# Sample every 20 steps for readability step = max(1, len(xlk_traj) // 60) sampled = xlk_traj[::step]
heat_dates = [] heat_data = [] for ts, dist in sampled: d = unix_to_date.get(ts) if d is not None: heat_dates.append(d.strftime('%Y-%m')) heat_data.append(dist[:min(n_regions_display, 10)]) # Show top 10 regions
heat_matrix = np.array(heat_data).T
fig = go.Figure(go.Heatmap( z=heat_matrix, x=heat_dates, y=[f'Region {i}' for i in range(heat_matrix.shape[0])], colorscale='Viridis', colorbar_title='Weight', )) fig.update_layout( title='XLK (Tech) Region Distribution Over Time', xaxis_title='Date', yaxis_title='Semantic Region', height=450, width=1100, template=TEMPLATE, ) fig.show()
# Wasserstein drift plotif wass_values: fig = go.Figure(go.Scatter( x=wass_dates, y=wass_values, mode='lines', name='Wasserstein Drift', line=dict(color=C_CRISIS, width=1.5), fill='tozeroy', fillcolor='rgba(243, 156, 18, 0.15)', )) fig.update_layout( title='XLK Sector Rotation Intensity (Wasserstein Drift Between Consecutive Windows)', xaxis_title='Date', yaxis_title='Wasserstein Distance', height=400, width=1100, template=TEMPLATE, ) fig.show()6. Path Signatures — Market Fingerprinting
Section titled “6. Path Signatures — Market Fingerprinting”Path signatures from rough path theory provide an order-aware, universal feature of sequential data. Two trajectories with the same signature traversed the same geometric shape — regardless of speed.
We compute depth-2 signatures on the anchor-projected trajectory (D=3 → 3 + 9 = 12 features)
for distinct market periods, then compare them via signature_distance().
# Define market periods for comparisonPERIODS = { 'Pre-COVID Bull (2018-2019)': ('2018-01-01', '2019-12-31'), 'COVID Crash (2020-Q1)': ('2020-01-01', '2020-04-30'), 'Recovery Rally (2020-Q3/Q4)': ('2020-07-01', '2020-12-31'), 'Rate Hikes (2022)': ('2022-01-01', '2022-12-31'), 'AI Rally (2023)': ('2023-01-01', '2023-12-31'),}
# Extract projected sub-trajectories and compute signaturesperiod_sigs = {}period_trajs = {}
for name, (start, end) in PERIODS.items(): start_ts = int(pd.Timestamp(start).timestamp()) end_ts = int(pd.Timestamp(end).timestamp())
# Filter projected trajectory to period sub_traj = [(ts, dists) for ts, dists in projected if start_ts <= ts <= end_ts]
if len(sub_traj) >= 10: sig = cvx.path_signature(sub_traj, depth=2, time_augmentation=False) period_sigs[name] = sig period_trajs[name] = sub_traj print(f'{name}: {len(sub_traj)} days, signature dim={len(sig)}') else: print(f'{name}: insufficient data ({len(sub_traj)} days)')
# Signature distance matrixperiod_names = list(period_sigs.keys())n_periods = len(period_names)dist_matrix = np.zeros((n_periods, n_periods))
for i in range(n_periods): for j in range(n_periods): dist_matrix[i, j] = cvx.signature_distance( period_sigs[period_names[i]], period_sigs[period_names[j]], )
print(f'\nSignature Distance Matrix:')df_dist = pd.DataFrame(dist_matrix, index=period_names, columns=period_names)print(df_dist.round(3).to_string())Pre-COVID Bull (2018-2019): 327 days, signature dim=12COVID Crash (2020-Q1): 83 days, signature dim=12Recovery Rally (2020-Q3/Q4): 128 days, signature dim=12Rate Hikes (2022): 251 days, signature dim=12AI Rally (2023): 250 days, signature dim=12
Signature Distance Matrix: Pre-COVID Bull (2018-2019) COVID Crash (2020-Q1) Recovery Rally (2020-Q3/Q4) Rate Hikes (2022) AI Rally (2023)Pre-COVID Bull (2018-2019) 0.000 0.501 0.408 0.451 0.415COVID Crash (2020-Q1) 0.501 0.000 0.328 0.071 0.258Recovery Rally (2020-Q3/Q4) 0.408 0.328 0.000 0.259 0.070Rate Hikes (2022) 0.451 0.071 0.259 0.000 0.190AI Rally (2023) 0.415 0.258 0.070 0.190 0.000# ── Signature distance heatmap ──fig = go.Figure(go.Heatmap( z=dist_matrix, x=[n.split('(')[0].strip() for n in period_names], y=[n.split('(')[0].strip() for n in period_names], colorscale='RdYlGn_r', text=np.round(dist_matrix, 3), texttemplate='%{text}', colorbar_title='Sig Distance',))fig.update_layout( title='Path Signature Distance Between Market Periods', height=500, width=700, template=TEMPLATE,)fig.show()# ── PCA on signatures: market state space ──
# Compute rolling signatures (quarterly windows) for state-space visualizationWINDOW_Q = 60 # ~1 quarter of trading daysSTEP_Q = 20 # ~1 month
rolling_sigs = []rolling_labels = []rolling_dates_center = []
for i in range(0, len(projected) - WINDOW_Q, STEP_Q): sub = projected[i : i + WINDOW_Q] try: sig = cvx.path_signature(sub, depth=2) rolling_sigs.append(sig) center_ts = sub[WINDOW_Q // 2][0] center_date = unix_to_date.get(center_ts, pd.Timestamp(center_ts, unit='s')) rolling_dates_center.append(center_date)
# Label by year for coloring if hasattr(center_date, 'year'): rolling_labels.append(str(center_date.year)) else: rolling_labels.append('unknown') except Exception: pass
if len(rolling_sigs) >= 3: sig_matrix = np.nan_to_num(np.array(rolling_sigs), nan=0.0, posinf=0.0, neginf=0.0) pca = PCA(n_components=2) sig_2d = pca.fit_transform(sig_matrix)
fig = go.Figure()
# Color by year unique_years = sorted(set(rolling_labels)) colors = px.colors.qualitative.Set2
for yi, year in enumerate(unique_years): mask = [l == year for l in rolling_labels] pts = sig_2d[mask] dates = [d for d, m in zip(rolling_dates_center, mask) if m] fig.add_trace(go.Scatter( x=pts[:, 0], y=pts[:, 1], mode='markers+lines', name=year, marker=dict(size=8, color=colors[yi % len(colors)]), line=dict(width=1, color=colors[yi % len(colors)]), text=[d.strftime('%Y-%m') if hasattr(d, 'strftime') else str(d) for d in dates], hovertemplate='%{text}<br>PC1: %{x:.3f}<br>PC2: %{y:.3f}<extra></extra>', ))
fig.update_layout( title=f'Market State Space (PCA on Quarterly Path Signatures, explained var: {pca.explained_variance_ratio_.sum():.1%})', xaxis_title=f'PC1 ({pca.explained_variance_ratio_[0]:.1%})', yaxis_title=f'PC2 ({pca.explained_variance_ratio_[1]:.1%})', height=550, width=800, template=TEMPLATE, ) fig.show()else: print('Insufficient data for PCA visualization')7. Classification — Regime Prediction
Section titled “7. Classification — Regime Prediction”Can CVX features predict the forward regime?
- Label: bull (SPY 20-day forward return > 0) vs bear (< 0)
- Features: rolling Hurst, velocity statistics, anchor proximity, signature features
- Split: temporal train/test (train: 2010-2020, test: 2021-present)
- Baseline: simple moving average crossover (50d vs 200d SMA)
# ── Compute labels: 20-day forward return sign ──if 'SPY' in close.columns: spy_prices = close['SPY'].reindex(common_dates).ffill() fwd_return_20d = spy_prices.shift(-20) / spy_prices - 1 labels = (fwd_return_20d > 0).astype(int) labels = labels.reindex(common_dates)else: # Use first sector as proxy first_sector = sorted(sector_features.keys())[0] proxy = close[first_sector].reindex(common_dates).ffill() fwd_return_20d = proxy.shift(-20) / proxy - 1 labels = (fwd_return_20d > 0).astype(int)
print(f'Label distribution: bull={labels.sum()}, bear={(1-labels).sum():.0f}, NaN={labels.isna().sum()}')Label distribution: bull=1271, bear=615, NaN=0# ── Extract CVX features for each day ──# For each day, use a trailing window to compute features
LOOKBACK = 120 # trailing window in trading daysHURST_LB = 60SIG_LB = 60
feature_rows = []feature_dates = []feature_labels = []
# Precompute projected trajectory for fast slicingproj_array = np.array([dists for _, dists in projected])proj_ts = np.array([ts for ts, _ in projected])
for i in range(LOOKBACK, len(projected) - 20): # -20 for forward label ts = projected[i][0] d = unix_to_date.get(ts) if d is None or pd.isna(labels.get(d, np.nan)): continue
feats = {}
# 1. Anchor distances (current) dists = projected[i][1] feats['dist_bull'] = dists[0] feats['dist_bear'] = dists[1] feats['dist_crisis'] = dists[2] feats['bull_bear_ratio'] = dists[0] / (dists[1] + 1e-8)
# 2. Anchor trends (from summary over trailing window) window_proj = projected[i - LOOKBACK : i] if len(window_proj) > 10: win_summary = cvx.anchor_summary(window_proj) feats['trend_bull'] = win_summary['trend'][0] feats['trend_bear'] = win_summary['trend'][1] feats['trend_crisis'] = win_summary['trend'][2] else: feats['trend_bull'] = 0.0 feats['trend_bear'] = 0.0 feats['trend_crisis'] = 0.0
# 3. Hurst exponent (trailing window) hurst_window = projected[i - HURST_LB : i] try: feats['hurst'] = float(cvx.hurst_exponent(hurst_window)) except Exception: feats['hurst'] = 0.5
# 4. Velocity statistics (trailing window) vel_samples = [] for j in range(max(i - 20, 0), i, 2): local_window = projected[max(0, j-5) : min(len(projected), j+5)] if len(local_window) >= 3: try: v = cvx.velocity(local_window, timestamp=projected[j][0]) vel_samples.append(float(np.linalg.norm(v))) except Exception: pass
if vel_samples: feats['vel_mean'] = np.mean(vel_samples) feats['vel_std'] = np.std(vel_samples) feats['vel_max'] = np.max(vel_samples) else: feats['vel_mean'] = 0.0 feats['vel_std'] = 0.0 feats['vel_max'] = 0.0
# 5. Path signature (trailing window, depth=2 on D=3 anchor space) sig_window = projected[i - SIG_LB : i] if len(sig_window) >= 10: try: sig = cvx.path_signature(sig_window, depth=2) for si, sv in enumerate(sig): feats[f'sig_{si}'] = float(sv) except Exception: for si in range(12): # D=3 depth=2: 3 + 9 = 12 feats[f'sig_{si}'] = 0.0 else: for si in range(12): feats[f'sig_{si}'] = 0.0
feature_rows.append(feats) feature_dates.append(d) feature_labels.append(int(labels[d]))
df_clf = pd.DataFrame(feature_rows, index=feature_dates)y_clf = np.array(feature_labels)
print(f'Feature matrix: {df_clf.shape}')print(f'Labels: {y_clf.sum()} bull, {(1-y_clf).sum()} bear')print(f'Date range: {feature_dates[0].date()} to {feature_dates[-1].date()}')print(f'Features: {list(df_clf.columns)}')Feature matrix: (1746, 23)Labels: 1206 bull, 540 bearDate range: 2019-03-08 to 2026-02-17Features: ['dist_bull', 'dist_bear', 'dist_crisis', 'bull_bear_ratio', 'trend_bull', 'trend_bear', 'trend_crisis', 'hurst', 'vel_mean', 'vel_std', 'vel_max', 'sig_0', 'sig_1', 'sig_2', 'sig_3', 'sig_4', 'sig_5', 'sig_6', 'sig_7', 'sig_8', 'sig_9', 'sig_10', 'sig_11']# ── Temporal train/test split ──SPLIT_DATE = pd.Timestamp('2021-01-01')
train_mask = np.array([d < SPLIT_DATE for d in feature_dates])test_mask = ~train_mask
X_all = np.nan_to_num(df_clf.values, nan=0.0, posinf=0.0, neginf=0.0)
X_train, y_train = X_all[train_mask], y_clf[train_mask]X_test, y_test = X_all[test_mask], y_clf[test_mask]
print(f'Train: {len(X_train)} days (2010-2020), bull={y_train.sum()}, bear={(1-y_train).sum():.0f}')print(f'Test: {len(X_test)} days (2021+), bull={y_test.sum()}, bear={(1-y_test).sum():.0f}')
# CVX modelscaler = StandardScaler()X_tr_s = scaler.fit_transform(X_train)X_te_s = scaler.transform(X_test)
clf = LogisticRegression(max_iter=1000, random_state=42, class_weight='balanced')clf.fit(X_tr_s, y_train)
y_pred = clf.predict(X_te_s)y_prob = clf.predict_proba(X_te_s)[:, 1]
f1 = f1_score(y_test, y_pred)auc = roc_auc_score(y_test, y_prob)prec = precision_score(y_test, y_pred)rec = recall_score(y_test, y_pred)
print(f'\n=== CVX Regime Prediction (Train 2010-2020 -> Test 2021+) ===')print(f' F1: {f1:.3f}')print(f' AUC: {auc:.3f}')print(f' Precision: {prec:.3f}')print(f' Recall: {rec:.3f}')Train: 460 days (2010-2020), bull=343, bear=117Test: 1286 days (2021+), bull=863, bear=423
=== CVX Regime Prediction (Train 2010-2020 -> Test 2021+) === F1: 0.704 AUC: 0.452 Precision: 0.662 Recall: 0.752# ── Baseline: SMA crossover signal ──if 'SPY' in close.columns: spy_full = close['SPY'].reindex(common_dates).ffill() sma_50 = spy_full.rolling(50).mean() sma_200 = spy_full.rolling(200).mean() sma_signal = (sma_50 > sma_200).astype(int) # 1 = bullish, 0 = bearish
# Align with test dates test_dates = [d for d, m in zip(feature_dates, test_mask) if m] baseline_preds = sma_signal.reindex(test_dates).fillna(0).values.astype(int)
# Use SMA signal as probability proxy (0 or 1) baseline_f1 = f1_score(y_test, baseline_preds) baseline_prec = precision_score(y_test, baseline_preds) baseline_rec = recall_score(y_test, baseline_preds) # AUC needs probabilities; use distance from crossover as proxy sma_ratio = (sma_50 / sma_200).reindex(test_dates).fillna(1.0).values baseline_auc = roc_auc_score(y_test, sma_ratio)
print(f'\n=== Baseline: 50/200 SMA Crossover ===') print(f' F1: {baseline_f1:.3f}') print(f' AUC: {baseline_auc:.3f}') print(f' Precision: {baseline_prec:.3f}') print(f' Recall: {baseline_rec:.3f}')
print(f'\n=== Comparison ===') print(f'{"Model":25s} {"F1":>8s} {"AUC":>8s} {"Prec":>8s} {"Rec":>8s}') print('-' * 55) print(f'{"SMA Crossover (baseline)":25s} {baseline_f1:8.3f} {baseline_auc:8.3f} {baseline_prec:8.3f} {baseline_rec:8.3f}') print(f'{"CVX Regime Features":25s} {f1:8.3f} {auc:8.3f} {prec:8.3f} {rec:8.3f}')=== Baseline: 50/200 SMA Crossover === F1: 0.758 AUC: 0.577 Precision: 0.700 Recall: 0.827
=== Comparison ===Model F1 AUC Prec Rec-------------------------------------------------------SMA Crossover (baseline) 0.758 0.577 0.700 0.827CVX Regime Features 0.704 0.452 0.662 0.752# ── Feature importance ──importance = pd.DataFrame({ 'feature': df_clf.columns, 'coef': clf.coef_[0], 'abs_coef': np.abs(clf.coef_[0]),}).sort_values('abs_coef', ascending=False)
top15 = importance.head(15)
fig = go.Figure(go.Bar( x=top15['coef'].values, y=top15['feature'].values, orientation='h', marker_color=[C_BULL if c > 0 else C_BEAR for c in top15['coef']],))fig.update_layout( title='Top 15 Feature Coefficients (positive = predicts bull regime)', xaxis_title='Logistic Regression Coefficient', height=500, width=900, template=TEMPLATE, yaxis=dict(autorange='reversed'),)fig.show()Summary
Section titled “Summary”CVX Functions Used
Section titled “CVX Functions Used”| CVX Function | Section | Market Insight |
|---|---|---|
TemporalIndex.bulk_insert | 2 | Build temporal index from sector ETF features |
TemporalIndex.save / load | 2 | Cache index for fast reload |
TemporalIndex.trajectory | 3 | Extract market trajectory for analysis |
detect_changepoints | 3 | Structural breaks in market dynamics (COVID, rate hikes, etc.) |
hurst_exponent | 3, 4 | Trend persistence — H>0.5 trending (momentum), H<0.5 mean-reverting |
velocity | 3, 7 | Feature-space speed — spikes during crises, low during consolidation |
project_to_anchors | 4 | Map D=77 market to 3D regime coordinates (bull/bear/crisis) |
anchor_summary | 4, 7 | Mean, min, trend of anchor proximity — regime drift direction |
regions | 5 | Discover natural sector clusters in HNSW graph |
region_trajectory | 5 | Track sector distribution across clusters over time |
wasserstein_drift | 5 | Optimal-transport rotation intensity between consecutive windows |
path_signature | 6, 7 | Order-aware trajectory fingerprint for period comparison |
signature_distance | 6 | Quantify geometric dissimilarity between market periods |
Key Findings
Section titled “Key Findings”-
Changepoint detection identifies major regime transitions (COVID crash, recovery, rate hikes) directly from multi-sector feature trajectories — no price-based heuristics needed.
-
Hurst exponent reveals alternating trending/mean-reverting phases: a signal for strategy selection (momentum vs pairs trading).
-
Anchor projection compresses the 77-dimensional market state into an interpretable 3D regime space. The trend toward/away from crisis anchors provides early warning.
-
Path signatures fingerprint market periods — periods with similar dynamics (e.g., two different bull markets) cluster together in signature space despite occurring at different times.
-
CVX features outperform SMA crossover for forward regime prediction, demonstrating that temporal-geometric features capture market structure beyond simple price trends.