Orchestrator / Subagent Patterns: Coordinating Multiple AI Agents
You build an AI agent that handles end-to-end customer onboarding: verify identity, set up billing, configure the product, send welcome emails, and schedule a follow-up call. A single agent with all these tools becomes confused - it has 25 tools, loses track of what it has done, and occasionally sends welcome emails before identity verification passes. The context window fills with irrelevant tool results from earlier steps.
You split it into an orchestrator and specialized subagents. The orchestrator plans the workflow: “First verify identity, then set up billing, then configure product…” Each subagent handles one domain with only its relevant tools. The identity agent has 4 tools and always returns a verified/rejected decision. The billing agent has 3 tools and returns a billing configuration. Each subagent is simple, focused, and reliable. The orchestrator coordinates the sequence without needing to understand the details of each domain.
This is the orchestrator/subagent pattern - the dominant architecture for complex agentic systems.
What the orchestrator pattern actually is
The orchestrator pattern separates planning from execution:
- Orchestrator: A “manager” agent that receives the high-level task, breaks it into subtasks, delegates to specialized subagents, collects results, and synthesizes the final output
- Subagents: Specialized “worker” agents that handle one specific domain or subtask. They have limited tools, focused system prompts, and return structured results
graph TD
USER["User Task:
'Onboard new customer'"] --> ORCH["Orchestrator Agent
(plans, delegates, synthesizes)"]
ORCH --> SA1["Identity Agent
Verify documents
4 tools"]
ORCH --> SA2["Billing Agent
Set up payment
3 tools"]
ORCH --> SA3["Config Agent
Configure product
5 tools"]
ORCH --> SA4["Comms Agent
Send welcome email
2 tools"]
SA1 -->|"verified: true"| ORCH
SA2 -->|"billing_id: xyz"| ORCH
SA3 -->|"config: {...}"| ORCH
SA4 -->|"email_sent: true"| ORCH
ORCH --> RESULT["Final Result:
Onboarding complete"]
style ORCH fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style SA1 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SA2 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SA3 fill:#E1F5EE,stroke:#0F6E56,color:#085041
style SA4 fill:#E1F5EE,stroke:#0F6E56,color:#085041
Why this pattern works
Reduced complexity per agent
A single agent with 30 tools has to reason about which tools to use from a large set. Subagents with 3-5 tools each are much more reliable at tool selection - less ambiguity, simpler reasoning.
Independent optimization
Each subagent can be optimized independently: different models (use GPT-4 for the planning orchestrator, GPT-3.5 for simple execution subagents), different prompts, different temperature settings. You can upgrade or fix one subagent without touching others.
Better error isolation
If the billing subagent fails, it returns an error to the orchestrator. The orchestrator can retry, skip, or escalate without the failure corrupting the state of other subagents.
Parallel execution
Independent subtasks can be dispatched to subagents simultaneously. Verify identity and configure product can happen in parallel if they are independent.
Cleaner context
Each subagent starts with a fresh, focused context. It does not carry the accumulated context of unrelated previous steps. The orchestrator maintains the high-level state; subagents focus narrowly.
Implementation patterns
Pattern 1: Sequential orchestration
The orchestrator executes subtasks in a defined sequence:
async def orchestrate_onboarding(customer_data):
# Step 1: Identity
identity_result = await identity_agent.run(
f"Verify identity for: {customer_data['name']}, docs: {customer_data['documents']}"
)
if not identity_result.verified:
return {"status": "rejected", "reason": identity_result.reason}
# Step 2: Billing (depends on identity)
billing_result = await billing_agent.run(
f"Set up billing for verified customer: {identity_result.customer_id}"
)
# Step 3: Configuration (depends on billing)
config_result = await config_agent.run(
f"Configure {customer_data['plan']} for billing_id: {billing_result.billing_id}"
)
# Step 4: Communications
await comms_agent.run(
f"Send welcome email to {customer_data['email']} with config: {config_result}"
)
return {"status": "complete", "details": {...}}
Pattern 2: Dynamic orchestration (LLM-driven planning)
The orchestrator uses an LLM to decide what to do next:
orchestrator_prompt = """
You are a task orchestrator. Given a user request, break it into subtasks and delegate to available agents.
Available agents:
- research_agent: Searches for information online
- analysis_agent: Analyzes data and produces insights
- writing_agent: Produces written content
- review_agent: Reviews and improves content
For each step, specify which agent to call and what to pass it.
Return results when all subtasks are complete.
"""
async def dynamic_orchestrate(task):
messages = [{"role": "system", "content": orchestrator_prompt}]
messages.append({"role": "user", "content": task})
while True:
response = await llm.generate(messages, tools=delegation_tools)
if response.is_final_answer:
return response.content
# Execute delegated subtask
agent_name = response.tool_call.agent
subtask = response.tool_call.task
result = await agents[agent_name].run(subtask)
messages.append({"role": "tool", "content": result})
Pattern 3: DAG-based orchestration
Define subtask dependencies as a directed acyclic graph and execute maximally parallel:
workflow = {
"verify_identity": {"agent": "identity", "depends_on": []},
"check_compliance": {"agent": "compliance", "depends_on": []},
"setup_billing": {"agent": "billing", "depends_on": ["verify_identity"]},
"configure_product": {"agent": "config", "depends_on": ["verify_identity", "check_compliance"]},
"send_welcome": {"agent": "comms", "depends_on": ["setup_billing", "configure_product"]},
}
async def dag_orchestrate(workflow, inputs):
completed = {}
while not all_complete(workflow, completed):
# Find tasks whose dependencies are all met
ready = [t for t, spec in workflow.items()
if t not in completed and all(d in completed for d in spec["depends_on"])]
# Execute all ready tasks in parallel
results = await asyncio.gather(*[
agents[workflow[t]["agent"]].run(get_inputs(t, completed))
for t in ready
])
for task, result in zip(ready, results):
completed[task] = result
return completed
graph LR
subgraph patterns["Orchestration Patterns"]
SEQ["Sequential
Simple, predictable
No parallelism"]
DYN["Dynamic (LLM)
Flexible, adaptive
Less predictable"]
DAG["DAG-based
Maximal parallelism
Requires upfront graph"]
end
style SEQ fill:#E1F5EE,stroke:#0F6E56,color:#085041
style DYN fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style DAG fill:#FAEEDA,stroke:#854F0B,color:#633806
Where the orchestrator pattern breaks
Over-decomposition
Breaking tasks too finely creates coordination overhead that exceeds the benefit. If the orchestrator spends more tokens planning and synthesizing than the subagents spend executing, you have over-decomposed.
Information loss between agents
The orchestrator passes a summary to each subagent, but critical context might be lost in translation. The billing agent might need details from the identity verification that the orchestrator did not think to pass along.
Cascading failures
If a middle step fails and subsequent steps depend on it, the orchestrator must handle partial completion gracefully. “Identity verified but billing failed” needs a clear recovery path.
Orchestrator becoming a bottleneck
If the orchestrator must reason about every step, it becomes the performance bottleneck. For 10 sequential subagent calls, the orchestrator adds 10 LLM calls of overhead just for planning.
Real-world orchestrator systems
- OpenAI Assistants with multi-tool routing - internal orchestration decides which tool(s) to invoke
- LangGraph - graph-based agent orchestration with state machines and conditional routing
- CrewAI - role-based multi-agent orchestration with task delegation
- AutoGen - Microsoft’s multi-agent conversation framework
- Amazon Bedrock Agents - orchestrates across multiple knowledge bases and action groups
How to apply in practice
Use a capable model for the orchestrator, cheaper models for subagents. The orchestrator needs strong planning and reasoning (GPT-4, Claude Sonnet). Subagents often handle simpler, well-defined tasks (GPT-3.5, Claude Haiku, or even fine-tuned small models).
Define clear interfaces between orchestrator and subagents. Each subagent should have a structured input (what it needs to know) and structured output (what it returns). Avoid passing entire conversation histories between agents.
Start with sequential, add parallelism later. Sequential orchestration is simpler to debug and reason about. Only parallelize when latency requires it and you have proven the individual subagents work reliably.
Include fallback logic in the orchestrator. What happens when a subagent fails? Retry? Skip? Escalate to human? The orchestrator should handle all these cases without crashing the workflow.
Monitor per-agent metrics. Track success rate, latency, and cost per subagent independently. This reveals which agents are bottlenecks and which need improvement.
FAQ
Q: How do I decide what to make a separate subagent vs a tool within one agent?
Make it a subagent when: (1) it needs its own specialized system prompt, (2) it has multiple tools that work together, (3) it requires multi-step reasoning within its domain, (4) you want to optimize/replace it independently. Keep it as a tool when: (1) it is a single function call, (2) no domain-specific reasoning is needed, (3) the result is a simple data lookup. Rule of thumb: if the “subtask” would benefit from its own ReAct loop, it is a subagent.
Q: How does the orchestrator know when to stop delegating and return a final answer?
Same termination strategies as single agents: explicit completion criteria (“all required steps done”), maximum delegation count, or the orchestrator model deciding it has enough results. For dynamic orchestration, include a “synthesize and respond” option in the orchestrator’s tool set that it calls when all information is gathered.
Q: Can subagents delegate to other subagents? How deep should nesting go?
They can, but limit depth to 2-3 levels. Each level adds latency, cost, and debugging complexity. In practice, orchestrator → subagent is sufficient for most applications. Orchestrator → subagent → sub-subagent is occasionally justified for complex domains. Deeper nesting usually means you need to redesign the decomposition.
Interview questions
Q: Design a multi-agent system for an AI research assistant that can find papers, analyze methodology, compare findings, and produce a literature review.
Architecture: Orchestrator agent receives the research question and plans the workflow. Subagents: (1) Search agent - queries academic databases (Semantic Scholar, arXiv), returns paper summaries and relevance scores. (2) Analysis agent - reads individual papers and extracts: methodology, key findings, limitations, sample size. (3) Synthesis agent - takes analyzed papers and produces comparisons: where do they agree/disagree, what gaps exist. (4) Writing agent - produces the final literature review with citations. Flow: orchestrator dispatches search → multiple analysis agents in parallel (one per paper) → synthesis → writing. Orchestrator validates each step: enough papers found? Analysis captures required fields? Synthesis is coherent? Writing cites correctly?
Q: Your orchestrator-subagent system works well for simple requests but fails on complex multi-step tasks. The orchestrator either decomposes too finely (15 subtasks for a 3-step job) or too coarsely (gives one agent an impossible subtask). How do you fix this?
The decomposition quality depends on the orchestrator’s planning capability. Fixes: (1) Few-shot examples in the orchestrator prompt showing good decompositions for similar complexity levels. (2) Planning validation step: after the orchestrator generates a plan, evaluate it (is each subtask achievable by its assigned agent? Are there obvious gaps?). (3) Adaptive decomposition: start with a coarse plan (2-3 steps), execute, and refine if a subagent reports the subtask is too complex. (4) Task complexity classification: detect simple vs complex tasks and use different planning prompts (simple tasks get 2-3 steps max, complex tasks get up to 8). (5) Historical performance: track which decomposition patterns succeed and use them as templates for similar future tasks.