Orchestrating AI agent teams in Claude Code

There's an experimental feature in Claude Code that is moving us towards the next logical step in AI agent workflows: the ability to spin up multiple AI agents that work together as a team. Not just one assistant helping you code, but several, each tackling a different angle of the same problem and actually talking to each other about what they're finding.

I've been playing with it since its release, and I think I finally have a feel for when it's genuinely useful versus when it's just burning tokens for the sake of looking busy.

Claude Code agent teams visualization showing multiple connected terminal windows

The setup

The way it works: you have one "lead" session that coordinates everything, and then you spawn teammates that each get their own context window. They can message each other, claim tasks from a shared list, and work in parallel. It's a bit like having a small squad you can point at a problem.

If you've used Claude Code's subagents before, the key difference is that subagents just report back to you. They're workers you dispatch and collect results from. Agent teams actually collaborate:

	Subagents	Agent teams
Communication	Report back to caller only	Message each other directly
Coordination	You manage everything	Shared task list, self-organizing
Best for	Quick focused tasks	Problems that benefit from debate
Token cost	Lower	Significantly higher

What surprised me is that the teammates don't just report back to the lead. They can actually push back on each other. I didn't expect that to matter, but it turns out it matters a lot.

Where I've actually found it useful

The debugging scenario sold me on it. I was chasing down a performance issue in Rinkflow where the practice builder had gotten noticeably sluggish after a release. Normally I'd have Claude dig into it sequentially: check this, then check that, follow the thread wherever it leads.

Instead, I tried spawning four teammates: one to look at recent state management changes, one to profile render cycles, one to check how we were fetching drill data, and one to just review what changed in the last few commits.

The interesting part wasn't that they found things faster (though they did). It was that when the commit-review teammate flagged a change to how drills were being filtered, the render-profiling teammate basically said "that's not it, the filtering isn't even in the render path." That back-and-forth narrowed things down to a memoization that got accidentally removed during a refactor.

A single agent might have found the filtering change, concluded it was probably the culprit, and moved on. The adversarial dynamic caught what I might have missed.

I've also used it for code reviews when I want different lenses—one teammate focused on "is this going to cause problems at scale" and another on "does this match our patterns elsewhere in the codebase." Having them disagree surfaced things neither would have caught alone. We've come a long way from styleguides and linters helping us to enforce codebase consistency!

When it's not worth it

Honestly, most of the time. If I'm doing something sequential, or if the task involves editing the same files, or if there's a clear path from A to B, a single session is faster and cheaper. Agent teams burn through tokens quickly because each teammate has their own context window.

Like many of the new AI workflows, I've found that wrapping my head around what's possible is the biggest bottleneck. In this case, you have to think about it like actual people working in parallel, ensuring each agent has their own territory.

The other thing that tripped me up early: teammates don't inherit your conversation history with the lead. So if you've been going back and forth about the problem for twenty minutes and then spawn a teammate, they're starting fresh. You have to be explicit about context in a way that feels redundant when you're used to a single continuous conversation.

The bigger picture

I'm not sure this is how most people will interact with AI assistants long-term, but it's got me thinking about what changes when you can cheaply coordinate multiple agents. The "have them argue with each other" pattern is genuinely useful for avoiding the anchoring problem where the first plausible explanation wins.

It's still rough around the edges. Sessions don't resume cleanly, teammates sometimes forget to mark tasks done, shutdown is weirdly slow. But the core idea of treating AI assistance as a team rather than a single tool is interesting enough that I keep coming back to it.

For now, my rule of thumb: if I catch myself wanting a second opinion on what the AI just told me, that's probably a good candidate for an agent team. Let them argue it out.