Architecture Overview
The Multi-Agent System architecture deploys multiple specialized AI agents, each with a focused role, distinct tool access, and dedicated system prompts. An orchestrator agent plans the overall task, delegates sub-tasks to specialist agents, collects their results, and synthesizes a final response. This pattern excels when no single agent can efficiently handle the full breadth of a complex workflow.
When to Use
- Tasks require multiple distinct capabilities (research, coding, analysis, review)
- A single agent's context window cannot hold all necessary information simultaneously
- Different sub-tasks benefit from different models, prompts, or temperature settings
- You need separation of concerns for security (e.g., only one agent accesses production databases)
- Complex workflows need iterative refinement with feedback loops between specialists
Complexity Level
High. Multi-agent systems multiply the complexity of single-agent architectures. Only use this pattern when you have proven that a single agent with tools cannot meet your quality or capability requirements. Start with 2 agents before scaling to more.
The most common mistake is creating too many agents. Each additional agent adds latency, cost, and failure modes. Start with the minimum viable agent count and add specialists only when you have clear evidence of improvement.
Architecture Diagram
Architecture diagram — Multi-Agent: orchestrator fans out to specialized agents with tools, collects results, and synthesizes a unified response
Components Deep Dive
Orchestrator Agent
The central coordinator that decomposes tasks, assigns work to specialists, tracks progress, handles failures, and merges results. Uses a planner prompt that understands all available agents and their capabilities.
Researcher Agent
Specializes in information gathering using search tools, document retrieval, and knowledge bases. Returns structured findings with source citations. Optimized for breadth and accuracy over speed.
Coder Agent
Writes, debugs, and executes code in a sandboxed environment. Has access to a code executor (REPL, Docker container) and can install packages, run tests, and iterate on errors autonomously.
Reviewer Agent
Reviews outputs from other agents for correctness, quality, and safety. Runs linting, type checking, and test suites. Can request revisions from the coder agent before approving.
Message Bus
Structured communication layer between agents. Messages include sender, recipient, action type (request, response, error), payload, and metadata. Enables logging and replay for debugging.
Safety Controls
Max iteration limits prevent infinite loops. Token budgets cap total cost. Human-in-the-loop escalation triggers when confidence is low or stakes are high. Circuit breakers halt on repeated failures.
Communication Patterns Comparison
| Pattern | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Hierarchical | Orchestrator delegates to agents; agents only talk to orchestrator | Clear control flow, easy to debug | Bottleneck at orchestrator, more round trips | Most production systems |
| Peer-to-Peer | Agents communicate directly with each other | Lower latency, parallel work | Hard to track state, potential loops | Tightly coupled tasks (code + review) |
| Blackboard | Shared workspace; agents read/write to common state | Flexible, decoupled agents | Race conditions, complex state management | Collaborative document editing |
Framework Comparison
| Framework | Architecture | Strengths | Considerations |
|---|---|---|---|
| LangGraph | State machine / graph-based | Explicit control flow, persistence, human-in-the-loop | Steeper learning curve, LangChain ecosystem |
| CrewAI | Role-based crew | Simple API, role/goal/backstory abstractions | Less control over execution order |
| Autogen | Conversational agents | Natural multi-turn dialogue, code execution built in | Can be chatty, harder to constrain |
Implementation
Three-Agent System with Orchestrator
import anthropic
import json
from dataclasses import dataclass, field
from typing import Optional
client = anthropic.Anthropic()
# ---------- Agent definitions ----------
@dataclass
class Agent:
name: str
role: str
system_prompt: str
tools: list = field(default_factory=list)
model: str = "claude-sonnet-4-20250514"
temperature: float = 0.3
max_tokens: int = 2048
# Define specialist agents
researcher = Agent(
name="researcher",
role="Information gathering and analysis",
system_prompt="""You are a research specialist. Your job is to:
1. Search for relevant information on the given topic
2. Summarize findings with source citations
3. Identify key facts, statistics, and expert opinions
Return structured JSON with keys: findings, sources, confidence.""",
tools=[{
"name": "web_search",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}]
)
coder = Agent(
name="coder",
role="Code writing and execution",
system_prompt="""You are a coding specialist. Your job is to:
1. Write clean, well-documented code for the given task
2. Execute code and verify it works correctly
3. Fix any errors and iterate until tests pass
Return the final working code with explanation.""",
tools=[{
"name": "execute_code",
"description": "Execute Python code in a sandbox",
"input_schema": {
"type": "object",
"properties": {"code": {"type": "string"}},
"required": ["code"]
}
}]
)
reviewer = Agent(
name="reviewer",
role="Quality assurance and review",
system_prompt="""You are a code reviewer. Your job is to:
1. Review code for correctness, security, and best practices
2. Run linting and type checking
3. Identify bugs, edge cases, and improvements
Return JSON: {approved: bool, issues: [...], suggestions: [...]}""",
tools=[{
"name": "run_tests",
"description": "Run pytest, ruff, and mypy on code",
"input_schema": {
"type": "object",
"properties": {"code": {"type": "string"}},
"required": ["code"]
}
}]
)
AGENTS = {"researcher": researcher, "coder": coder, "reviewer": reviewer}
Orchestrator with Function Calling
# ---------- Orchestrator ----------
ORCHESTRATOR_SYSTEM = """You are an orchestrator managing a team of specialist agents.
Available agents:
- researcher: Searches and analyzes information
- coder: Writes and executes code
- reviewer: Reviews code quality and runs tests
For each user request:
1. Break the task into sub-tasks
2. Use the delegate_to_agent tool to assign work
3. Collect results and iterate if needed
4. Synthesize a final response
Rules:
- Max 10 total delegations per request
- Always have the reviewer check code before finalizing
- If an agent fails twice, escalate to the user"""
orchestrator_tools = [{
"name": "delegate_to_agent",
"description": "Delegate a sub-task to a specialist agent",
"input_schema": {
"type": "object",
"properties": {
"agent_name": {
"type": "string",
"enum": ["researcher", "coder", "reviewer"]
},
"task": {"type": "string"},
"context": {"type": "string", "default": ""}
},
"required": ["agent_name", "task"]
}
}]
def call_agent(agent: Agent, task: str, context: str = "") -> str:
"""Call a specialist agent with a task."""
message = f"Task: {task}"
if context:
message += f"\n\nContext from other agents:\n{context}"
response = client.messages.create(
model=agent.model,
max_tokens=agent.max_tokens,
system=agent.system_prompt,
messages=[{"role": "user", "content": message}],
temperature=agent.temperature,
)
return response.content[0].text
def run_multi_agent(user_request: str, max_delegations: int = 10) -> str:
"""Run the multi-agent orchestration loop."""
messages = [{"role": "user", "content": user_request}]
delegation_count = 0
total_tokens = 0
while delegation_count < max_delegations:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=ORCHESTRATOR_SYSTEM,
messages=messages,
tools=orchestrator_tools,
)
total_tokens += response.usage.input_tokens + response.usage.output_tokens
# Check if orchestrator wants to delegate
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
agent = AGENTS[block.input["agent_name"]]
result = call_agent(
agent,
block.input["task"],
block.input.get("context", ""),
)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
delegation_count += 1
print(f" [{delegation_count}] {agent.name}: {block.input['task'][:80]}")
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# Orchestrator has synthesized the final answer
print(f"\nCompleted in {delegation_count} delegations, ~{total_tokens} tokens")
return response.content[0].text
return "Max delegations reached. Partial results returned."
# Usage
result = run_multi_agent(
"Research best practices for rate limiting APIs, then write a Python "
"implementation with tests, and have it reviewed for production readiness."
)
Data Flow
Here is the step-by-step flow through the Multi-Agent orchestration:
- 1. User submits request — A complex task that requires multiple capabilities (research, coding, review)
- 2. Orchestrator plans — Breaks the task into sub-tasks, identifies which agents to involve, determines ordering
- 3. Fan-out delegation — Orchestrator calls delegate_to_agent for each sub-task; independent tasks can run in parallel
- 4. Agents execute — Each specialist processes its sub-task using its tools (search, code execution, testing)
- 5. Results returned — Agent outputs flow back to the orchestrator as tool results
- 6. Iterate if needed — Orchestrator may request revisions, send coder output to reviewer, or ask researcher for more data
- 7. Synthesize response — Orchestrator merges all agent outputs into a coherent final answer
- 8. Return to user — Final response includes contributions from all agents, with provenance tracking
Shared context passes relevant results from one agent to another via the orchestrator. Isolated state keeps each agent's working memory separate, preventing context pollution. Most production systems use a hybrid: shared task context with isolated agent scratchpads.
Trade-offs & Considerations
| Advantage | Limitation |
|---|---|
| Handles complex, multi-step tasks beyond single agent capability | Significantly higher latency (multiple LLM calls in sequence) |
| Separation of concerns: each agent is focused and testable | Cost multiplied by number of agents and iterations |
| Different models/configs per agent (cost vs quality optimization) | Harder to debug: failures can cascade across agents |
| Built-in quality control via reviewer agent | Risk of infinite loops or excessive iterations |
| Naturally supports human-in-the-loop at any stage | Complex state management and error recovery |
Cost Control Strategies
- Token budgets: Set per-agent and per-request token limits; terminate early if exceeded
- Iteration caps: Hard limit on total delegations (typically 5-15 per request)
- Model tiering: Use cheaper models for simple agents (researcher: Haiku), expensive models only for the orchestrator
- Caching: Cache agent results for repeated sub-tasks across requests
- Early termination: Stop when the orchestrator has sufficient confidence, even if not all agents have been consulted
When you need to run multi-agent systems at enterprise scale with shared infrastructure, model management, and observability, move to Architecture 10 (Production Platform).
Production Checklist
- Max iteration limits per request (prevent infinite agent loops)
- Per-agent and per-request token budgets with cost tracking
- Structured logging of all inter-agent messages for debugging and replay
- Timeout per agent call with graceful degradation
- Human-in-the-loop escalation for low-confidence or high-stakes decisions
- Sandboxed code execution (Docker containers, no network access)
- Agent-level error recovery: retry with backoff, fallback to simpler approach
- Circuit breaker: halt entire workflow if error rate exceeds threshold
- Observability: distributed traces linking all agent calls in a single request
- A/B test agent configurations (prompts, models, tools) independently
- Rate limiting on tool calls to prevent abuse of external APIs
- Replay capability: re-run any request with the same agent state for debugging