RFC-014: Neuro-Symbolic Planning — ASP Solver + Emergent Plans + Bayesian Context
Status: Research
Section titled “Status: Research”Motivation
Section titled “Motivation”RFC-013 identified a tension between two sources of “what comes next”:
- Knowledge graph (cvx-graph): defines a fixed plan per task type
(
heat_then_place: find → take → heat → put) - Temporal edges (cvx-index): record what actually happened in specific episodes, which varies by context
The same task in different contexts requires different action sequences. A fixed plan cannot capture this variation. An emergent plan from memory can, but has no guarantee of validity (preconditions, effects).
This RFC proposes a three-layer planning architecture that resolves the tension:
- Constraints (knowledge graph + ASP rules) → what is valid
- Experience (CVX episodic memory) → what has worked before
- Probability (Bayesian network) → what is likely to succeed now
Part A: Emergent Plans from Episodic Memory
Section titled “Part A: Emergent Plans from Episodic Memory”The Problem with Fixed Plans
Section titled “The Problem with Fixed Plans”A knowledge graph encoding heat_then_place: find → take → go_microwave → heat → put
assumes one sequence fits all contexts. But:
- Object on countertop →
go_countertop → take → go_microwave → ... - Object in fridge →
go_fridge → open → take → go_microwave → ... - Object already in hand → skip find+take entirely
Solution: Plans Emerge from Memory
Section titled “Solution: Plans Emerge from Memory”Instead of defining plans a priori, the plan is the consensus of retrieved similar episodes:
1. Agent observes current state → embed → CVX causal_search2. CVX returns k episodes from similar states3. Each episode's continuation IS a candidate plan4. The action that appears most frequently across candidates = consensus action5. BN scores each candidate action for P(success | context)No fixed plan structure needed. The “plan” is implicit in the collective behavior of past successful episodes.
Relationship to cvx-graph
Section titled “Relationship to cvx-graph”The knowledge graph shifts from planner to validator:
- Does NOT prescribe “do A then B then C”
- DOES validate “is action X valid given current state?”
- DOES provide structural knowledge: “heating requires a microwave”, “you can’t take something you’re already holding”
Part B: Contextual Bayesian Scoring
Section titled “Part B: Contextual Bayesian Scoring”Phase-Based Context Variables
Section titled “Phase-Based Context Variables”Instead of tracking “step 3 of 7” (assumes fixed plan), the BN uses abstract phases inferred from observations:
| Phase | Detection signal | Example |
|---|---|---|
searching | No object in inventory | ”You are in the middle of a room” |
holding | Object just taken | ”You pick up the tomato 1” |
transforming | At appliance | ”You are at the microwave 1” |
placing | Object transformed | ”You heated the tomato 1” |
BN Variables
Section titled “BN Variables”Observed: region: HNSW region (discretized embedding space) task_type: heat / cool / clean / pick / examine phase: searching / holding / transforming / placing action: candidate action to score
Query: P(success | region, task_type, phase, action)Key Insight: Phase × Action Interaction
Section titled “Key Insight: Phase × Action Interaction”The same action has different success rates depending on the phase:
| Action | Phase=searching | Phase=holding | Phase=transforming |
|---|---|---|---|
| navigate | High (need to find object) | Medium (going to appliance) | Low (already there) |
| take | Low (haven’t found it yet) | N/A (already holding) | Low (need to use, not take) |
| use | Low (nothing to use) | Low (need to go to appliance) | High (ready to transform) |
A linear scorer weights actions equally regardless of phase. The BN captures these interactions.
Online Learning
Section titled “Online Learning”After each episode, update the BN:
For each step in the episode: 1. Infer phase from observation 2. Identify region from HNSW 3. Record (region, task_type, phase, action, outcome) 4. bn.observe(observations)
bn.update_cpts()Part C: Answer Set Programming (Future)
Section titled “Part C: Answer Set Programming (Future)”Concept
Section titled “Concept”Answer Set Programming (ASP) provides declarative constraint solving: the user specifies what is valid, and the solver finds plans that satisfy all constraints.
Task Formalization
Section titled “Task Formalization”% State facts (from observation, parsed by LLM)at(tomato1, countertop2).at(agent, kitchen).type(tomato1, food).clean(sinkbasin1).
% Domain rules (from cvx-graph or learned)can_take(X, L) :- at(X, L), at(agent, L), not holding(_).can_heat(X) :- holding(X), at(agent, microwave1).can_clean(X) :- holding(X), at(agent, sinkbasin1).
% Effectsholding(X) :- do(take, X, L), can_take(X, L).heated(X) :- do(heat, X), can_heat(X).
% Goal:- not heated(tomato1).:- not at(tomato1, garbagecan1).
% Solver outputs a valid action sequenceIntegration Pipeline
Section titled “Integration Pipeline”Observation (natural language) → LLM extracts facts (at, holding, type) → cvx-graph provides domain rules (requires, precondition, effect) → ASP solver (clingo) computes valid plan candidates → CVX causal_search retrieves similar experiences → BN scores candidates: P(success | context, plan) → Agent executes highest-scoring valid planWhy ASP (Not STRIPS/PDDL)
Section titled “Why ASP (Not STRIPS/PDDL)”ASP offers advantages for this use case:
- Default reasoning: “normally objects are on surfaces” — handles incomplete observations
- Non-monotonic: can retract beliefs when new evidence arrives
- Preference rules: “prefer shorter plans” or “prefer actions the agent has succeeded at before”
- Diagnosis: “why did the plan fail?” — compute minimal explanation
Implementation Considerations
Section titled “Implementation Considerations”| Component | Approach | Effort |
|---|---|---|
| ASP solver | clingo via C FFI or Rust re-implementation | High |
| NL → ASP facts | LLM prompt with structured output | Medium |
| KG → ASP rules | Automatic conversion from cvx-graph entities/relations | Medium |
| Plan validation | Check ASP model against observation | Low |
Complexity Assessment
Section titled “Complexity Assessment”This is a research direction, not a near-term implementation:
- clingo integration requires C FFI and a non-trivial binding layer
- NL → formal logic is an active research problem
- The value is clear (guaranteed valid plans) but the engineering cost is high
- Alternative: use the LLM itself as an approximate solver, with cvx-graph constraints as structured prompts
Part D: Unified Architecture
Section titled “Part D: Unified Architecture”Three Layers of “What Comes Next”
Section titled “Three Layers of “What Comes Next””| Layer | Source | Answers | Guarantees |
|---|---|---|---|
| Constraints | cvx-graph + ASP | What is valid | Logical correctness |
| Experience | CVX causal_search | What has worked | Empirical evidence |
| Probability | cvx-bayes BN | What is likely | Statistical confidence |
Decision Flow
Section titled “Decision Flow”State observation │ ├─→ Phase detection (searching / holding / transforming / placing) │ ├─→ CVX causal_search → k candidate continuations │ Each candidate = sequence of expert actions │ ├─→ KG constraint check (optional) │ Filter candidates that violate preconditions │ ├─→ BN scoring │ P(success | region, task_type, phase, action) per candidate │ └─→ Agent chooses highest-scoring valid actionWhy No Layer Alone Suffices
Section titled “Why No Layer Alone Suffices”- Constraints only (ASP): produces valid plans but doesn’t know which valid plan is most likely to succeed in this specific room
- Experience only (CVX): retrieves what worked before but may suggest invalid actions (take something already held)
- Probability only (BN): scores actions but doesn’t know the domain rules (can’t heat without microwave)
Combined: valid + experienced + probable.
Implementation Priority
Section titled “Implementation Priority”| Phase | Component | Effort | Prerequisite |
|---|---|---|---|
| 1 | Phase detection from observations | Low | None |
| 2 | BN with phase variable | Low | cvx-bayes (done) |
| 3 | KG as constraint validator | Medium | cvx-graph (done) |
| 4 | NL → KG facts via LLM | Medium | LLM API |
| 5 | ASP solver integration | High | clingo or custom solver |
Phases 1-3 are implementable now with existing crates. Phase 4 needs an LLM in the loop. Phase 5 is long-term research.
References
Section titled “References”Answer Set Programming
Section titled “Answer Set Programming”- Gelfond & Lifschitz (1988). “The Stable Model Semantics for Logic Programming.” ICLP.
- Gebser et al. (2012). Answer Set Solving in Practice. (clingo authors)
- Lifschitz (2019). Answer Set Programming. Springer.
Neuro-Symbolic AI
Section titled “Neuro-Symbolic AI”- Garcez et al. (2019). “Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning.” JAIR.
- Hamilton et al. (2022). “Is Neuro-Symbolic AI Meeting its Promises in Natural Language Processing?” ACL.
Planning
Section titled “Planning”- Ghallab, Nau & Traverso (2004). Automated Planning: Theory and Practice.
- Helmert (2006). “The Fast Downward Planning System.” JAIR.
Contextual Bayesian Decision-Making
Section titled “Contextual Bayesian Decision-Making”- Kaelbling, Littman & Cassandra (1998). “Planning and Acting in Partially Observable Stochastic Domains.” AI Journal.