Agent Engineering

Multi-Agent Orchestration Tradeoffs

Ravinder·March 1, 2025·5 min read

AgentsAILLMMulti-AgentOrchestration

Series

Agent Engineering

Part 5 of 10

← Part 4

Planning vs Reacting

Part 6 →

Cost and Latency Budgets

"Let's use multiple agents" is the wrong starting question. The right question is: what are the coordination costs, and does the task justify them? Multi-agent systems are not automatically smarter than single agents — they are more parallel, more specialized, and dramatically more complex to debug. Every agent boundary is a serialization point, a potential failure mode, and a place where context gets lost in translation.

Supervisor vs Peer-to-Peer

There are two primary topologies. In the supervisor pattern, one agent owns the task, breaks it into sub-tasks, delegates to specialist agents, and aggregates results. In peer-to-peer (mesh), agents communicate directly — one agent's output becomes another's input without a central coordinator.

flowchart TD subgraph Supervisor Pattern S[Supervisor Agent] --> A1[Research Agent] S --> A2[Code Agent] S --> A3[Review Agent] A1 & A2 & A3 --> S S --> OUT[Final Output] end subgraph Peer-to-Peer Pipeline P1[Scraper Agent] --> P2[Summarizer Agent] P2 --> P3[Writer Agent] P3 --> POUT[Final Output] end

In practice, the supervisor pattern is almost always the right choice for complex tasks. P2P pipelines are simpler to implement but create hidden coupling: Agent 2 must understand Agent 1's output format, and a schema change in Agent 1 silently breaks Agent 3. The supervisor mediates and can adapt.

The Latency Cost of Coordination

Every agent call adds latency. A naive multi-agent setup adds one LLM round-trip per agent boundary. If you have a supervisor calling three specialist agents sequentially, your minimum latency is 4× a single LLM call (1 supervisor + 3 specialists). In parallel, it drops to 2× (1 supervisor dispatch + parallel specialist calls + 1 aggregate call), but you have added orchestration complexity.

import asyncio
from anthropic import AsyncAnthropic
 
client = AsyncAnthropic()
 
async def run_specialist(name: str, system_prompt: str, task: str) -> str:
    """Run a single specialist agent and return its output."""
    resp = await client.messages.create(
        model="claude-haiku-4-5",   # cheaper model for specialists
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": task}],
    )
    return f"[{name}]: {resp.content[0].text}"
 
async def supervisor_dispatch(task: str) -> str:
    specialists = [
        ("Researcher", "You research facts concisely.", f"Research: {task}"),
        ("Analyst",    "You analyze data and find patterns.", f"Analyze: {task}"),
        ("Writer",     "You write clear summaries.", f"Summarize findings on: {task}"),
    ]
    # Fan out in parallel
    results = await asyncio.gather(*[
        run_specialist(name, sys, sub_task)
        for name, sys, sub_task in specialists
    ])
    # Supervisor aggregates
    aggregate_prompt = "Combine these specialist reports into a final answer:\n\n"
    aggregate_prompt += "\n\n".join(results)
    final = await client.messages.create(
        model="claude-opus-4-5",
        max_tokens=2048,
        messages=[{"role": "user", "content": aggregate_prompt}],
    )
    return final.content[0].text

Use cheaper/faster models for specialists and reserve the expensive model for the supervisor's final synthesis. This is the single biggest cost lever in multi-agent systems.

Context Loss at Agent Boundaries

When the supervisor passes a sub-task to a specialist, it must serialize context. Any context not included in the handoff message is lost. This is where multi-agent systems quietly degrade quality.

The fix is a structured handoff schema — never pass raw natural language between agents when you can pass structured context.

from pydantic import BaseModel
 
class AgentHandoff(BaseModel):
    task_id: str
    goal: str                       # the top-level goal
    sub_task: str                   # what this agent specifically must do
    constraints: list[str]          # constraints from the supervisor
    prior_results: dict[str, str]   # results from other agents so far
    output_schema: str              # what format the supervisor expects back
 
def build_handoff(
    task_id: str,
    goal: str,
    sub_task: str,
    constraints: list[str],
    prior_results: dict[str, str],
    output_schema: str,
) -> AgentHandoff:
    return AgentHandoff(
        task_id=task_id,
        goal=goal,
        sub_task=sub_task,
        constraints=constraints,
        prior_results=prior_results,
        output_schema=output_schema,
    )

Including prior_results in the handoff lets downstream agents build on earlier work. Including output_schema prevents format mismatches that cause supervisor parsing failures.

Debugging Multi-Agent Systems

Single-agent debugging: read the message trace. Multi-agent debugging: read N message traces across N agents, correlate by task_id, reconstruct causality. This is not incrementally harder — it is qualitatively harder.

The minimum viable observability stack for multi-agent systems:

Correlation ID: every LLM call carries the root task_id. Log it.
Agent boundary events: log every handoff with timestamp, sender, receiver, payload hash.
Intermediate outputs: persist each specialist's output before the supervisor aggregates. Otherwise, when aggregation fails, you have no way to know which specialist produced bad output.
Step timing: latency attribution. When your multi-agent system is slow, you need to know which agent is the bottleneck.

import logging
import hashlib
import time
 
logger = logging.getLogger("agent.orchestration")
 
def log_handoff(task_id: str, from_agent: str, to_agent: str, payload: dict):
    payload_hash = hashlib.sha256(str(payload).encode()).hexdigest()[:8]
    logger.info({
        "event": "agent_handoff",
        "task_id": task_id,
        "from": from_agent,
        "to": to_agent,
        "payload_hash": payload_hash,
        "ts": time.time(),
    })

Without this, a failure in a five-agent pipeline is a 20-minute archaeology session.

When Not to Use Multiple Agents

Multi-agent overhead is not worth it when:

The task fits in a single context window with room to spare.
Specialists do not need different model configurations or system prompts.
Parallelism does not reduce wall-clock time (sequential tasks).
The coordination logic is more complex than the task itself.

A single agent with well-designed tools and a strong system prompt outperforms a poorly coordinated multi-agent system almost every time.

Key Takeaways

Multi-agent systems add parallelism and specialization, but every agent boundary adds latency, coordination complexity, and a potential context loss point.
Use the supervisor pattern over P2P pipelines — the supervisor mediates format mismatches and can adapt when specialists fail.
Fanout parallel calls to specialist agents and use cheaper models for specialists; reserve the powerful model for final synthesis.
Structured handoff schemas (not raw natural language) are essential to prevent silent context loss at agent boundaries.
Minimum viable observability is: correlation ID, boundary event logs, intermediate output persistence, and per-agent timing.
If the task fits in one context window, a single well-tooled agent is almost always the right choice over a multi-agent system.

Series

Agent Engineering

Part 5 of 10

← Part 4

Planning vs Reacting

Part 6 →

Cost and Latency Budgets