RFC-009: LLM Tool Integration

See full RFC: design/CVX_RFC_009_LLM_Tool_Integration.md

Summary

Transform CVX into a first-class tool for Large Language Models. LLMs can reason about time, entities, and change — but they cannot compute temporal analytics. CVX provides the computational substrate that turns an LLM from a static knowledge system into a temporal reasoning engine.

Layer	What	Who uses it
MCP Server	Model Context Protocol adapter (JSON-RPC over stdio)	Claude Code, Claude Desktop, any MCP client
LLM-Optimized API	Endpoints returning structured, interpretable summaries	Any LLM via function calling / tool use
Inline Embedding	Accept text, embed internally — no vector passing	Reduces tool calls from 3 to 1
Agentic Workflows	Multi-step reasoning patterns combining CVX tools	Autonomous monitoring agents

Why

The Temporal Reasoning Gap

LLMs excel at	LLMs cannot
Understanding descriptions of change	Compute whether drift actually occurred
Reasoning about causality	Detect change points in 768-dim trajectories
Generating hypotheses	Quantify drift magnitude or direction
Natural language synthesis	Compare trajectories across entities

Standard RAG answers “what documents are similar?” Temporal RAG (CVX) answers “how has the discourse changed since then?”

Why MCP?

Anthropic-native: Claude Code and Claude Desktop support MCP
Zero HTTP overhead: direct stdio communication
Tool discovery: LLM sees all tools with schemas at conversation start
Streaming: results stream back as computed

Tool Definitions

8 tools designed for LLM consumption:

Tool	Purpose
`cvx_search`	Temporal RAG — find similar content within time windows
`cvx_entity_summary`	High-level temporal overview of an entity
`cvx_drift_report`	Quantify semantic change with anchor-projected interpretation
`cvx_detect_anomalies`	Proactive scan for unusual changes across entities
`cvx_compare_entities`	Cross-entity temporal analysis (convergence, Granger, correlation)
`cvx_cohort_analysis`	Group-level drift, convergence, and treatment effect evaluation
`cvx_forecast`	Trajectory prediction with interpretable output
`cvx_ingest`	Add new temporal data (accepts text or vectors)

Design Principle: Structured Narratives, Not Raw Data

Tools return interpretable summaries that fit in a few hundred tokens:

{
  "drift": {
    "l2_magnitude": 0.87,
    "percentile": 92,
    "interpretation": "Drifted more than 92% of entities"
  },
  "anchor_projection": {
    "depression": {"direction": "closer", "change": -0.15},
    "recovery": {"direction": "farther", "change": +0.22}
  },
  "suggested_next": [
    {"tool": "cvx_detect_anomalies", "description": "Check for change points"}
  ]
}

The suggested_next field guides the LLM’s reasoning chain.

Agentic Workflow Patterns

4 documented multi-step patterns:

Longitudinal Monitor — periodic anomaly scan → investigate → report
Comparative Investigation — entity summaries → cross-entity analysis → explain relationships
Cohort Treatment Evaluation — cohort drift → outlier identification → forecast
Temporal RAG Chain — search now → search then → drift report → synthesize narrative

Implementation Plan

4 phases:

feat/mcp-server (P0) — new cvx-mcp crate, JSON-RPC over stdio, 8 tool definitions — 14 tests
feat/llm-api-layer (P0) — composite endpoints (entity_summary, anomaly_scan) — 2 REST endpoints
feat/inline-embeddings (P1) — Embedder trait, MockEmbedder, ONNX/API stubs — 9 tests
feat/agentic-patterns (P2) — workflow documentation, LLM integration guide

Status: All 4 phases implemented and merged. 23 tests + 2 new endpoints + docs.

Python bindings for RFC-007 added (6 new PyO3 functions). Validated in notebooks B4 and B5.