Skip to content

RFC-014: Neuro-Symbolic Planning — ASP Solver + Emergent Plans + Bayesian Context


RFC-013 identified a tension between two sources of “what comes next”:

  • Knowledge graph (cvx-graph): defines a fixed plan per task type (heat_then_place: find → take → heat → put)
  • Temporal edges (cvx-index): record what actually happened in specific episodes, which varies by context

The same task in different contexts requires different action sequences. A fixed plan cannot capture this variation. An emergent plan from memory can, but has no guarantee of validity (preconditions, effects).

This RFC proposes a three-layer planning architecture that resolves the tension:

  1. Constraints (knowledge graph + ASP rules) → what is valid
  2. Experience (CVX episodic memory) → what has worked before
  3. Probability (Bayesian network) → what is likely to succeed now

Part A: Emergent Plans from Episodic Memory

Section titled “Part A: Emergent Plans from Episodic Memory”

A knowledge graph encoding heat_then_place: find → take → go_microwave → heat → put assumes one sequence fits all contexts. But:

  • Object on countertop → go_countertop → take → go_microwave → ...
  • Object in fridge → go_fridge → open → take → go_microwave → ...
  • Object already in hand → skip find+take entirely

Instead of defining plans a priori, the plan is the consensus of retrieved similar episodes:

1. Agent observes current state → embed → CVX causal_search
2. CVX returns k episodes from similar states
3. Each episode's continuation IS a candidate plan
4. The action that appears most frequently across candidates = consensus action
5. BN scores each candidate action for P(success | context)

No fixed plan structure needed. The “plan” is implicit in the collective behavior of past successful episodes.

The knowledge graph shifts from planner to validator:

  • Does NOT prescribe “do A then B then C”
  • DOES validate “is action X valid given current state?”
  • DOES provide structural knowledge: “heating requires a microwave”, “you can’t take something you’re already holding”

Instead of tracking “step 3 of 7” (assumes fixed plan), the BN uses abstract phases inferred from observations:

PhaseDetection signalExample
searchingNo object in inventory”You are in the middle of a room”
holdingObject just taken”You pick up the tomato 1”
transformingAt appliance”You are at the microwave 1”
placingObject transformed”You heated the tomato 1”
Observed:
region: HNSW region (discretized embedding space)
task_type: heat / cool / clean / pick / examine
phase: searching / holding / transforming / placing
action: candidate action to score
Query:
P(success | region, task_type, phase, action)

The same action has different success rates depending on the phase:

ActionPhase=searchingPhase=holdingPhase=transforming
navigateHigh (need to find object)Medium (going to appliance)Low (already there)
takeLow (haven’t found it yet)N/A (already holding)Low (need to use, not take)
useLow (nothing to use)Low (need to go to appliance)High (ready to transform)

A linear scorer weights actions equally regardless of phase. The BN captures these interactions.

After each episode, update the BN:

For each step in the episode:
1. Infer phase from observation
2. Identify region from HNSW
3. Record (region, task_type, phase, action, outcome)
4. bn.observe(observations)
bn.update_cpts()

Answer Set Programming (ASP) provides declarative constraint solving: the user specifies what is valid, and the solver finds plans that satisfy all constraints.

% State facts (from observation, parsed by LLM)
at(tomato1, countertop2).
at(agent, kitchen).
type(tomato1, food).
clean(sinkbasin1).
% Domain rules (from cvx-graph or learned)
can_take(X, L) :- at(X, L), at(agent, L), not holding(_).
can_heat(X) :- holding(X), at(agent, microwave1).
can_clean(X) :- holding(X), at(agent, sinkbasin1).
% Effects
holding(X) :- do(take, X, L), can_take(X, L).
heated(X) :- do(heat, X), can_heat(X).
% Goal
:- not heated(tomato1).
:- not at(tomato1, garbagecan1).
% Solver outputs a valid action sequence
Observation (natural language)
→ LLM extracts facts (at, holding, type)
→ cvx-graph provides domain rules (requires, precondition, effect)
→ ASP solver (clingo) computes valid plan candidates
→ CVX causal_search retrieves similar experiences
→ BN scores candidates: P(success | context, plan)
→ Agent executes highest-scoring valid plan

ASP offers advantages for this use case:

  1. Default reasoning: “normally objects are on surfaces” — handles incomplete observations
  2. Non-monotonic: can retract beliefs when new evidence arrives
  3. Preference rules: “prefer shorter plans” or “prefer actions the agent has succeeded at before”
  4. Diagnosis: “why did the plan fail?” — compute minimal explanation
ComponentApproachEffort
ASP solverclingo via C FFI or Rust re-implementationHigh
NL → ASP factsLLM prompt with structured outputMedium
KG → ASP rulesAutomatic conversion from cvx-graph entities/relationsMedium
Plan validationCheck ASP model against observationLow

This is a research direction, not a near-term implementation:

  • clingo integration requires C FFI and a non-trivial binding layer
  • NL → formal logic is an active research problem
  • The value is clear (guaranteed valid plans) but the engineering cost is high
  • Alternative: use the LLM itself as an approximate solver, with cvx-graph constraints as structured prompts

LayerSourceAnswersGuarantees
Constraintscvx-graph + ASPWhat is validLogical correctness
ExperienceCVX causal_searchWhat has workedEmpirical evidence
Probabilitycvx-bayes BNWhat is likelyStatistical confidence
State observation
├─→ Phase detection (searching / holding / transforming / placing)
├─→ CVX causal_search → k candidate continuations
│ Each candidate = sequence of expert actions
├─→ KG constraint check (optional)
│ Filter candidates that violate preconditions
├─→ BN scoring
│ P(success | region, task_type, phase, action) per candidate
└─→ Agent chooses highest-scoring valid action
  • Constraints only (ASP): produces valid plans but doesn’t know which valid plan is most likely to succeed in this specific room
  • Experience only (CVX): retrieves what worked before but may suggest invalid actions (take something already held)
  • Probability only (BN): scores actions but doesn’t know the domain rules (can’t heat without microwave)

Combined: valid + experienced + probable.


PhaseComponentEffortPrerequisite
1Phase detection from observationsLowNone
2BN with phase variableLowcvx-bayes (done)
3KG as constraint validatorMediumcvx-graph (done)
4NL → KG facts via LLMMediumLLM API
5ASP solver integrationHighclingo or custom solver

Phases 1-3 are implementable now with existing crates. Phase 4 needs an LLM in the loop. Phase 5 is long-term research.


  1. Gelfond & Lifschitz (1988). “The Stable Model Semantics for Logic Programming.” ICLP.
  2. Gebser et al. (2012). Answer Set Solving in Practice. (clingo authors)
  3. Lifschitz (2019). Answer Set Programming. Springer.
  1. Garcez et al. (2019). “Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning.” JAIR.
  2. Hamilton et al. (2022). “Is Neuro-Symbolic AI Meeting its Promises in Natural Language Processing?” ACL.
  1. Ghallab, Nau & Traverso (2004). Automated Planning: Theory and Practice.
  2. Helmert (2006). “The Fast Downward Planning System.” JAIR.
  1. Kaelbling, Littman & Cassandra (1998). “Planning and Acting in Partially Observable Stochastic Domains.” AI Journal.