Too many agents spoil the broth
GPT-5 and Claude achieve 25% success working together on a coding task. A single agent doing both jobs? Roughly 50% success.
Stanford University and SAP built a benchmark called CooperBench to test something basic: can two AI coding agents collaborate on the same codebase?
They created 652 tasks across 12 real open-source repositories. Each task gave two agents different features to implement—features that were logically compatible but required coordinating on shared code. The kind of thing any software team does every day.
Across all models, cooperation dropped success rates by 30% on average.
Researchers called it "the curse of coordination."
Results read like a dysfunctional team’s retro:
42% were expectation failures
(agents couldn't model what their partner was actually doing)
32% were commitment failures
(agents broke promises or made unverifiable claims)
26% were communication failures
(questions went unanswered, decisions never got made).
Agents spent up to 20% of their total budget just on communication. All that talking reduced merge conflicts but didn't improve success rates. The channel was jammed with repetition, vagueness, and hallucinated status updates.
In 1975, Fred Brooks wrote that "adding manpower to a late software project makes it later." Fifty years later, that law applies to AI agents too.
The bright spot? In rare successful runs, agents spontaneously developed coordination patterns—role division, resource division, negotiation. These behaviors weren't prompted or scaffolded. They emerged. They're just not reliable yet.
The mutli-agent coordination cost is real. Two agents with overlapping scope and no explicit protocol will underperform one agent doing everything. Every time.
This is what I help teams figure out: Where do agents need clear boundaries? Where do they need orchestration? Where will a single agent with the right tools beat a team of five?
Working through this? I'd love to hear how you're approaching it.
—
Source: cooperbench.com