Multi-Agent State Machines: Deterministic Control for Non-Deterministic Systems


Your customer support agent sometimes issues refunds before verifying the order exists. Sometimes it escalates to a human before attempting any resolution. Sometimes it resolves the issue perfectly. The variance comes from the LLM’s non-deterministic nature - it makes different decisions based on subtle prompt variations, temperature settings, and the specific tokens it generates.

A state machine wraps the agent in deterministic transitions: from INTAKE, you can only go to VERIFICATION. From VERIFICATION, you go to RESOLUTION or ESCALATION. From RESOLUTION, you go to CONFIRMATION or RETRY. The LLM decides what to do within each state, but the transitions between states are programmatic and enforced. The agent cannot issue a refund from INTAKE because the state machine does not allow it.

This hybrid gives you the best of both worlds: the LLM’s natural language understanding and reasoning within states, plus deterministic guarantees about workflow order and constraints.

graph LR
  INTAKE["INTAKE
Classify issue"] -->|"classified"| VERIFY["VERIFY
Check order exists"]
  VERIFY -->|"verified"| RESOLVE["RESOLVE
Attempt fix"]
  VERIFY -->|"not_found"| CLARIFY["CLARIFY
Ask user for details"]
  CLARIFY -->|"details_provided"| VERIFY
  RESOLVE -->|"resolved"| CONFIRM["CONFIRM
Verify satisfaction"]
  RESOLVE -->|"failed"| ESCALATE["ESCALATE
Human handoff"]
  CONFIRM -->|"satisfied"| DONE["DONE"]
  CONFIRM -->|"unsatisfied"| RESOLVE

  style INTAKE fill:#EEEDFE,stroke:#534AB7,color:#3C3489
  style VERIFY fill:#E1F5EE,stroke:#0F6E56,color:#085041
  style RESOLVE fill:#FAEEDA,stroke:#854F0B,color:#633806
  style ESCALATE fill:#FCEBEB,stroke:#A32D2D,color:#791F1F
  style DONE fill:#E1F5EE,stroke:#0F6E56,color:#085041

Implementation with LangGraph

from langgraph.graph import StateGraph, END

class SupportState(TypedDict):
    issue: str
    order_id: str | None
    status: str
    resolution: str | None
    messages: list

def intake_node(state):
    """LLM classifies the issue and extracts order ID"""
    classification = llm.generate(
        f"Classify this support request and extract order ID: {state['issue']}"
    )
    return {"order_id": classification.order_id, "status": "classified"}

def verify_node(state):
    """Check if order exists in database"""
    order = db.get_order(state["order_id"])
    if order:
        return {"status": "verified", "order_details": order}
    return {"status": "not_found"}

def resolve_node(state):
    """LLM attempts resolution within allowed actions"""
    response = llm.generate(
        system="You can: issue_refund, replace_item, extend_warranty. Choose one.",
        user=f"Order: {state['order_details']}, Issue: {state['issue']}"
    )
    result = execute_resolution(response.action)
    return {"resolution": result, "status": "resolved" if result.success else "failed"}

# Build the graph
graph = StateGraph(SupportState)
graph.add_node("intake", intake_node)
graph.add_node("verify", verify_node)
graph.add_node("resolve", resolve_node)
graph.add_node("escalate", escalate_node)

# Define transitions (deterministic)
graph.add_edge("intake", "verify")
graph.add_conditional_edges("verify", 
    lambda s: "resolve" if s["status"] == "verified" else "clarify")
graph.add_conditional_edges("resolve",
    lambda s: END if s["status"] == "resolved" else "escalate")

workflow = graph.compile()

Why state machines for agents

Compliance: Regulated industries need provable workflow guarantees. “The agent ALWAYS verifies identity before accessing account data” is a state machine constraint, not a prompt suggestion.

Debuggability: When something goes wrong, you know exactly which state the agent was in and which transition fired. Contrast with a free-form agent where failure might happen at any point in an opaque reasoning chain.

Controllability: Add new states or transitions without rewriting agent logic. Want to add a “fraud check” step? Add a state and a transition - existing states are unchanged.

Testability: Each state can be unit-tested independently. The transition logic is deterministic and testable with standard testing tools.

Where to use state machines vs free-form agents

State machines when: The workflow has clear stages, compliance/audit requirements exist, the failure modes of skipping steps are costly, or you need predictable behavior for SLAs.

Free-form agents when: The task is exploratory (research, creative work), there is no fixed workflow, or the agent needs full autonomy to discover novel approaches.

Hybrid (most common): State machine for the overall workflow structure, free-form LLM reasoning within each state for the actual work.

FAQ

Q: Do state machines make agents less capable?

No - they make agents more reliable without reducing capability. The LLM still has full reasoning power within each state. The state machine only constrains the order of operations, not what the LLM can think or generate within a state. A constrained agent that always succeeds is more capable than an unconstrained agent that fails 20% of the time.

Q: How do I handle unexpected situations that do not fit the state machine?

Add a “fallback” or “exception” state that any state can transition to. This state handles unexpected inputs, errors, or situations the machine was not designed for (typically: escalate to human or return to a safe state).

Interview questions

Q: Design a state machine for an AI agent that processes insurance claims. The workflow must enforce: intake → document verification → damage assessment → coverage check → approval/denial → notification.

Define states with specific responsibilities and allowed transitions. Each state has: (1) entry actions (what to do on arrival), (2) allowed tools (only claim-relevant tools in each state), (3) exit conditions (what triggers transition). Key constraints: cannot reach approval without passing document verification AND coverage check. Add timeout per state (claims stuck in assessment > 48h auto-escalate). Add “fraud_review” as a conditional state triggered by anomaly detection at any stage. The LLM reasons within states (assessing damage severity, interpreting policy terms) while the machine ensures all required steps complete in order.