EP16 advanced

Subagents vs Agent Team: Which Multi-Agent Pattern Wins?

Pipeline vs editorial room — real benchmarks, token costs, and a decision tree for choosing between Claude Code's two multi-agent architectures.

Episode 5 introduced the two multi-agent patterns in Claude Code. This is the updated version with harder data, real implementation code, and a clearer decision framework.

Single-agent Claude Code is powerful. But it hits three ceilings: context window overflow on complex tasks, blind spots from one perspective, and serial execution speed. Multi-agent setups solve all three. The question is which architecture to use.

Claude Code gives you two fundamentally different approaches. I’ve run both in production for months. Here’s what I’ve learned.

Pattern One: Subagents (The Pipeline)

One main agent dispatches child agents sequentially. Each child has one job. Output flows forward like an assembly line.

researcher → outliner → drafter → polisher → reviewer → formatter

The main agent is the foreman. It calls each subagent using the Agent tool, collects the output, and passes it to the next stage. All subagents inherit the main conversation’s context.

Here’s what the dispatch actually looks like in practice:

Use the Agent tool to run a research subagent.

Instructions for the subagent:
- Topic: [your topic]
- Find 5+ credible sources
- Extract key data points with citations
- Output a structured research brief

Return the brief to me when done.

The main agent then takes that research brief and feeds it into the next subagent:

Use the Agent tool to run an outliner subagent.

Instructions for the subagent:
- Here is the research brief: [paste from previous step]
- Create a 5-section outline with a clear narrative angle
- Each section should reference specific data from the brief

Return the outline to me when done.

Six calls in sequence. Each subagent is disposable — it does its job and disappears. The main agent holds the thread.

Real benchmark results:

  • Execution time: 14 minutes for a full 1,500-word article
  • Sources researched: 5
  • Post-publication corrections needed: 3 (a tense error, a unit conversion mistake, a missing citation)
  • Token cost: baseline

The corrections are the interesting part. The reviewer subagent caught some issues, but it only saw the draft — not the original research. It couldn’t verify whether the writer accurately represented the sources. That’s the structural weakness of a pipeline: each stage only sees what the previous stage gave it.

Pattern Two: Agent Team (The Editorial Room)

A team lead spawns parallel agents that communicate directly with each other. Fewer agents, but they argue.

researcher ←→ writer ←→ critic

Enable the feature first:

claude config set env.CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS 1

Each team member gets its own isolated context window. They talk via SendMessage — no routing through a coordinator. The researcher can flag a concern directly to the critic. The critic can challenge the writer without waiting for permission.

The implementation looks different from subagents:

Create an Agent Team with 3 members:

1. RESEARCHER
   - Find 8+ sources on [topic]
   - Flag data reliability issues proactively
   - Send research to writer AND critic simultaneously

2. WRITER
   - Draft article from research
   - Accept up to 3 revision rounds from critic
   - Final draft must address all critic flags

3. CRITIC
   - Compare draft against original research for accuracy
   - No rubber-stamping — every approval must cite specific evidence
   - Maximum 3 revision rounds, then force a decision

The critic role is the secret weapon. In the pipeline pattern, the reviewer only sees the draft. In the team pattern, the critic sees both the research and the draft simultaneously. It can catch contradictions that a pipeline reviewer would miss.

Real benchmark results (same topic as the subagent test):

  • Execution time: 28 minutes (2x slower)
  • Sources researched: 8 (vs 5)
  • Post-publication corrections needed: 1 (minor formatting issue)
  • Token cost: 3-4x baseline

The quality difference showed up in specific ways. The researcher flagged reliability issues upfront: a paywalled source that couldn’t be verified, a valuation number that was a negotiation figure rather than a final deal, and inconsistent reporting methodologies between two companies. These flags went directly to the critic, who forced the writer to either find better sources or add caveats.

The pipeline version published those same questionable numbers without qualification. The reviewer didn’t know they were questionable because it never saw the original source context.

Side-by-Side Comparison

DimensionSubagents (Pipeline)Agent Team (Editorial)
Execution time14 min28 min
Sources cited58
Token cost1x3-4x
Corrections needed31
Error detectionPost-hoc (reviewer catches after writing)Pre-emptive (critic flags before writing)
Debug difficultyEasy (linear flow, check each stage)Hard (async messages, interleaved execution)
ScalabilityHigh (add/remove stages freely)Medium (communication complexity grows with team size)

When the Pipeline Wins

High volume, standardized output. If you’re producing 10 articles a day with a consistent format — product descriptions, SEO pages, email sequences — the pipeline is faster and cheaper. Quality is “good enough” for standardized content because the format itself constrains errors.

Budget matters. At 3-4x the token cost, Agent Team is expensive at scale. Ten pipeline articles cost the same as 2-3 Agent Team articles. For content that doesn’t need deep analysis, that math is hard to argue with.

Predictable workflows. The pipeline executes the same stages in the same order every time. Easy to debug. Easy to optimize. If stage 3 is slow, you fix stage 3. If the Agent Team is slow, good luck figuring out which message chain caused the bottleneck.

Team size limits. Pipelines scale to 8-10 stages without problems. Agent Teams start breaking down above 5 members because inter-agent communication becomes exponentially complex.

When the Editorial Room Wins

Quality over speed. When factual errors carry real consequences — financial analysis, medical content, legal summaries — the critic role pays for itself. One pre-emptive flag is worth more than three post-hoc corrections.

Complex, ambiguous topics. Straightforward topics (product comparison, how-to guide) don’t need multi-perspective analysis. Ambiguous topics (geopolitical analysis, market predictions, technical architecture decisions) benefit from agents that challenge each other’s assumptions.

Novel angles. The pipeline produces competent but predictable output. The editorial room produces surprises. The friction between researcher, writer, and critic generates angles that none of them would have found alone. In my tests, the Agent Team consistently found sharper narrative hooks.

High-stakes one-offs. A weekly newsletter going to 50K subscribers. A whitepaper for investor relations. A technical RFC that will shape six months of engineering work. These are worth the extra 15 minutes and 3x tokens.

The Decision Tree

Ask these questions in order:

  1. Is this standardized content with a repeatable format? Yes → Pipeline.
  2. Am I producing more than 3 pieces per day on this topic? Yes → Pipeline.
  3. Would a factual error cause real damage? Yes → Agent Team.
  4. Do I need a genuinely novel angle? Yes → Agent Team.
  5. None of the above? Start with Pipeline, upgrade if the output feels flat.

The Hybrid Approach

This is what I actually run. Not everything deserves the same architecture.

Daily content (social posts, short updates, routine documentation) goes through a 4-stage pipeline: research → draft → review → format. Fast, cheap, consistent.

Weekly content (long-form articles, analysis pieces, tutorials like this one) goes through a 3-member Agent Team: researcher, writer, critic. Slower, more expensive, better output.

The routing decision happens before execution. I look at the content brief and ask: “If I published this with a factual error, would anyone notice or care?” If yes, Agent Team. If no, Pipeline.

Implementation Tips

Start with subagents. Get comfortable with multi-agent orchestration using the simpler pattern first. Understand how context flows between stages. Learn where handoff points break down. The pipeline teaches you the fundamentals.

Graduate to Agent Team when you feel the ceiling. You’ll know it’s time when you keep finding post-hoc corrections that a critic would have caught upfront. That’s the signal.

Keep teams small. Three members is the sweet spot for Agent Team. Four is fine. Five is the maximum. Beyond that, communication overhead eats the quality gains.

Give the critic teeth. The most common failure mode in Agent Teams is a critic that rubber-stamps everything. Write explicit rules: “No approval without citing specific evidence. ‘Looks good’ is not an acceptable review.” A toothless critic turns the Agent Team into an expensive pipeline.

Set revision limits. Without a cap, the critic and writer can loop indefinitely — I’ve seen 7 revision rounds on a 500-word piece. Three rounds maximum. After that, the critic must either approve or escalate to the team lead with a specific objection.

My recommendation hasn’t changed since Episode 5, but it’s gotten more concrete: master the pipeline first, then add the editorial room to your toolkit for the pieces that matter most. Most of your content doesn’t need multi-perspective debate. But the 10% that does? The quality difference is obvious to anyone reading it.