Multi-Agent Orchestration Tradeoffs
"Let's use multiple agents" is the wrong starting question. The right question is: what are the coordination costs, and does the task justify them? Multi-agent systems are not automatically smarter than single agents — they are more parallel, more specialized, and dramatically more complex to debug. Every agent boundary is a serialization point, a potential failure mode, and a place where context gets lost in translation.
Supervisor vs Peer-to-Peer
There are two primary topologies. In the supervisor pattern, one agent owns the task, breaks it into sub-tasks, delegates to specialist agents, and aggregates results. In peer-to-peer (mesh), agents communicate directly — one agent's output becomes another's input without a central coordinator.
In practice, the supervisor pattern is almost always the right choice for complex tasks. P2P pipelines are simpler to implement but create hidden coupling: Agent 2 must understand Agent 1's output format, and a schema change in Agent 1 silently breaks Agent 3. The supervisor mediates and can adapt.
The Latency Cost of Coordination
Every agent call adds latency. A naive multi-agent setup adds one LLM round-trip per agent boundary. If you have a supervisor calling three specialist agents sequentially, your minimum latency is 4× a single LLM call (1 supervisor + 3 specialists). In parallel, it drops to 2× (1 supervisor dispatch + parallel specialist calls + 1 aggregate call), but you have added orchestration complexity.
import asyncio
from anthropic import AsyncAnthropic
client = AsyncAnthropic()
async def run_specialist(name: str, system_prompt: str, task: str) -> str:
"""Run a single specialist agent and return its output."""
resp = await client.messages.create(
model="claude-haiku-4-5", # cheaper model for specialists
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": task}],
)
return f"[{name}]: {resp.content[0].text}"
async def supervisor_dispatch(task: str) -> str:
specialists = [
("Researcher", "You research facts concisely.", f"Research: {task}"),
("Analyst", "You analyze data and find patterns.", f"Analyze: {task}"),
("Writer", "You write clear summaries.", f"Summarize findings on: {task}"),
]
# Fan out in parallel
results = await asyncio.gather(*[
run_specialist(name, sys, sub_task)
for name, sys, sub_task in specialists
])
# Supervisor aggregates
aggregate_prompt = "Combine these specialist reports into a final answer:\n\n"
aggregate_prompt += "\n\n".join(results)
final = await client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": aggregate_prompt}],
)
return final.content[0].textUse cheaper/faster models for specialists and reserve the expensive model for the supervisor's final synthesis. This is the single biggest cost lever in multi-agent systems.
Context Loss at Agent Boundaries
When the supervisor passes a sub-task to a specialist, it must serialize context. Any context not included in the handoff message is lost. This is where multi-agent systems quietly degrade quality.
The fix is a structured handoff schema — never pass raw natural language between agents when you can pass structured context.
from pydantic import BaseModel
class AgentHandoff(BaseModel):
task_id: str
goal: str # the top-level goal
sub_task: str # what this agent specifically must do
constraints: list[str] # constraints from the supervisor
prior_results: dict[str, str] # results from other agents so far
output_schema: str # what format the supervisor expects back
def build_handoff(
task_id: str,
goal: str,
sub_task: str,
constraints: list[str],
prior_results: dict[str, str],
output_schema: str,
) -> AgentHandoff:
return AgentHandoff(
task_id=task_id,
goal=goal,
sub_task=sub_task,
constraints=constraints,
prior_results=prior_results,
output_schema=output_schema,
)Including prior_results in the handoff lets downstream agents build on earlier work. Including output_schema prevents format mismatches that cause supervisor parsing failures.
Debugging Multi-Agent Systems
Single-agent debugging: read the message trace. Multi-agent debugging: read N message traces across N agents, correlate by task_id, reconstruct causality. This is not incrementally harder — it is qualitatively harder.
The minimum viable observability stack for multi-agent systems:
- Correlation ID: every LLM call carries the root
task_id. Log it. - Agent boundary events: log every handoff with timestamp, sender, receiver, payload hash.
- Intermediate outputs: persist each specialist's output before the supervisor aggregates. Otherwise, when aggregation fails, you have no way to know which specialist produced bad output.
- Step timing: latency attribution. When your multi-agent system is slow, you need to know which agent is the bottleneck.
import logging
import hashlib
import time
logger = logging.getLogger("agent.orchestration")
def log_handoff(task_id: str, from_agent: str, to_agent: str, payload: dict):
payload_hash = hashlib.sha256(str(payload).encode()).hexdigest()[:8]
logger.info({
"event": "agent_handoff",
"task_id": task_id,
"from": from_agent,
"to": to_agent,
"payload_hash": payload_hash,
"ts": time.time(),
})Without this, a failure in a five-agent pipeline is a 20-minute archaeology session.
When Not to Use Multiple Agents
Multi-agent overhead is not worth it when:
- The task fits in a single context window with room to spare.
- Specialists do not need different model configurations or system prompts.
- Parallelism does not reduce wall-clock time (sequential tasks).
- The coordination logic is more complex than the task itself.
A single agent with well-designed tools and a strong system prompt outperforms a poorly coordinated multi-agent system almost every time.
Key Takeaways
- Multi-agent systems add parallelism and specialization, but every agent boundary adds latency, coordination complexity, and a potential context loss point.
- Use the supervisor pattern over P2P pipelines — the supervisor mediates format mismatches and can adapt when specialists fail.
- Fanout parallel calls to specialist agents and use cheaper models for specialists; reserve the powerful model for final synthesis.
- Structured handoff schemas (not raw natural language) are essential to prevent silent context loss at agent boundaries.
- Minimum viable observability is: correlation ID, boundary event logs, intermediate output persistence, and per-agent timing.
- If the task fits in one context window, a single well-tooled agent is almost always the right choice over a multi-agent system.