RFC-014: Neuro-Symbolic Planning — ASP Solver + Emergent Plans + Bayesian Context

Status: Research

Motivation

RFC-013 identified a tension between two sources of “what comes next”:

Knowledge graph (cvx-graph): defines a fixed plan per task type (heat_then_place: find → take → heat → put)
Temporal edges (cvx-index): record what actually happened in specific episodes, which varies by context

The same task in different contexts requires different action sequences. A fixed plan cannot capture this variation. An emergent plan from memory can, but has no guarantee of validity (preconditions, effects).

This RFC proposes a three-layer planning architecture that resolves the tension:

Constraints (knowledge graph + ASP rules) → what is valid
Experience (CVX episodic memory) → what has worked before
Probability (Bayesian network) → what is likely to succeed now

Part A: Emergent Plans from Episodic Memory

The Problem with Fixed Plans

A knowledge graph encoding heat_then_place: find → take → go_microwave → heat → put assumes one sequence fits all contexts. But:

Object on countertop → go_countertop → take → go_microwave → ...
Object in fridge → go_fridge → open → take → go_microwave → ...
Object already in hand → skip find+take entirely

Solution: Plans Emerge from Memory

Instead of defining plans a priori, the plan is the consensus of retrieved similar episodes:

1. Agent observes current state → embed → CVX causal_search
2. CVX returns k episodes from similar states
3. Each episode's continuation IS a candidate plan
4. The action that appears most frequently across candidates = consensus action
5. BN scores each candidate action for P(success | context)

No fixed plan structure needed. The “plan” is implicit in the collective behavior of past successful episodes.

Relationship to cvx-graph

The knowledge graph shifts from planner to validator:

Does NOT prescribe “do A then B then C”
DOES validate “is action X valid given current state?”
DOES provide structural knowledge: “heating requires a microwave”, “you can’t take something you’re already holding”

Part B: Contextual Bayesian Scoring

Phase-Based Context Variables

Instead of tracking “step 3 of 7” (assumes fixed plan), the BN uses abstract phases inferred from observations:

Phase	Detection signal	Example
`searching`	No object in inventory	”You are in the middle of a room”
`holding`	Object just taken	”You pick up the tomato 1”
`transforming`	At appliance	”You are at the microwave 1”
`placing`	Object transformed	”You heated the tomato 1”

BN Variables

Observed:
  region:      HNSW region (discretized embedding space)
  task_type:   heat / cool / clean / pick / examine
  phase:       searching / holding / transforming / placing
  action:      candidate action to score

Query:
  P(success | region, task_type, phase, action)

Key Insight: Phase × Action Interaction

The same action has different success rates depending on the phase:

Action	Phase=searching	Phase=holding	Phase=transforming
navigate	High (need to find object)	Medium (going to appliance)	Low (already there)
take	Low (haven’t found it yet)	N/A (already holding)	Low (need to use, not take)
use	Low (nothing to use)	Low (need to go to appliance)	High (ready to transform)

A linear scorer weights actions equally regardless of phase. The BN captures these interactions.

Online Learning

After each episode, update the BN:

For each step in the episode:
  1. Infer phase from observation
  2. Identify region from HNSW
  3. Record (region, task_type, phase, action, outcome)
  4. bn.observe(observations)

bn.update_cpts()

Part C: Answer Set Programming (Future)

Concept

Answer Set Programming (ASP) provides declarative constraint solving: the user specifies what is valid, and the solver finds plans that satisfy all constraints.

Task Formalization

% State facts (from observation, parsed by LLM)
at(tomato1, countertop2).
at(agent, kitchen).
type(tomato1, food).
clean(sinkbasin1).

% Domain rules (from cvx-graph or learned)
can_take(X, L) :- at(X, L), at(agent, L), not holding(_).
can_heat(X) :- holding(X), at(agent, microwave1).
can_clean(X) :- holding(X), at(agent, sinkbasin1).

% Effects
holding(X) :- do(take, X, L), can_take(X, L).
heated(X) :- do(heat, X), can_heat(X).

% Goal
:- not heated(tomato1).
:- not at(tomato1, garbagecan1).

% Solver outputs a valid action sequence

Integration Pipeline

Observation (natural language)
  → LLM extracts facts (at, holding, type)
  → cvx-graph provides domain rules (requires, precondition, effect)
  → ASP solver (clingo) computes valid plan candidates
  → CVX causal_search retrieves similar experiences
  → BN scores candidates: P(success | context, plan)
  → Agent executes highest-scoring valid plan

Why ASP (Not STRIPS/PDDL)

ASP offers advantages for this use case:

Default reasoning: “normally objects are on surfaces” — handles incomplete observations
Non-monotonic: can retract beliefs when new evidence arrives
Preference rules: “prefer shorter plans” or “prefer actions the agent has succeeded at before”
Diagnosis: “why did the plan fail?” — compute minimal explanation

Implementation Considerations

Component	Approach	Effort
ASP solver	clingo via C FFI or Rust re-implementation	High
NL → ASP facts	LLM prompt with structured output	Medium
KG → ASP rules	Automatic conversion from cvx-graph entities/relations	Medium
Plan validation	Check ASP model against observation	Low

Complexity Assessment

This is a research direction, not a near-term implementation:

clingo integration requires C FFI and a non-trivial binding layer
NL → formal logic is an active research problem
The value is clear (guaranteed valid plans) but the engineering cost is high
Alternative: use the LLM itself as an approximate solver, with cvx-graph constraints as structured prompts

Part D: Unified Architecture

Three Layers of “What Comes Next”

Layer	Source	Answers	Guarantees
Constraints	cvx-graph + ASP	What is valid	Logical correctness
Experience	CVX causal_search	What has worked	Empirical evidence
Probability	cvx-bayes BN	What is likely	Statistical confidence

Decision Flow

State observation
  │
  ├─→ Phase detection (searching / holding / transforming / placing)
  │
  ├─→ CVX causal_search → k candidate continuations
  │     Each candidate = sequence of expert actions
  │
  ├─→ KG constraint check (optional)
  │     Filter candidates that violate preconditions
  │
  ├─→ BN scoring
  │     P(success | region, task_type, phase, action) per candidate
  │
  └─→ Agent chooses highest-scoring valid action

Why No Layer Alone Suffices

Constraints only (ASP): produces valid plans but doesn’t know which valid plan is most likely to succeed in this specific room
Experience only (CVX): retrieves what worked before but may suggest invalid actions (take something already held)
Probability only (BN): scores actions but doesn’t know the domain rules (can’t heat without microwave)

Combined: valid + experienced + probable.

Implementation Priority

Phase	Component	Effort	Prerequisite
1	Phase detection from observations	Low	None
2	BN with phase variable	Low	cvx-bayes (done)
3	KG as constraint validator	Medium	cvx-graph (done)
4	NL → KG facts via LLM	Medium	LLM API
5	ASP solver integration	High	clingo or custom solver

Phases 1-3 are implementable now with existing crates. Phase 4 needs an LLM in the loop. Phase 5 is long-term research.

References

Answer Set Programming

Gelfond & Lifschitz (1988). “The Stable Model Semantics for Logic Programming.” ICLP.
Gebser et al. (2012). Answer Set Solving in Practice. (clingo authors)
Lifschitz (2019). Answer Set Programming. Springer.

Neuro-Symbolic AI

Garcez et al. (2019). “Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning.” JAIR.
Hamilton et al. (2022). “Is Neuro-Symbolic AI Meeting its Promises in Natural Language Processing?” ACL.

Planning

Ghallab, Nau & Traverso (2004). Automated Planning: Theory and Practice.
Helmert (2006). “The Fast Downward Planning System.” JAIR.

Contextual Bayesian Decision-Making

Kaelbling, Littman & Cassandra (1998). “Planning and Acting in Partially Observable Stochastic Domains.” AI Journal.