September 1, 2025|5 min read

Multi-Agent Error Cascades: The Double Pendulum Problem Nobody Talks About

Your agents are a chaos machine 🎯

So HumanLayer dropped their Advanced Context Engineering piece last week, and buried in it is this absolute banger of a line:

"Bad research -> bad plan -> bad code. A single wrong line in research cascades to widespread errors."

That's Dex Horthy describing what happens when you chain AI agents together without checkpoints. And it's the exact same mechanics as a double pendulum.

You know the double pendulum, yeah? Simple physics demo. One pendulum hanging from another. The top one swings predictably. The bottom one goes absolutely mental. Tiny changes in the initial swing of the top pendulum produce wildly different trajectories in the bottom one. Chaos theory in action. Looks like it's possessed.

Multi-agent AI systems are double pendulums.

Your research agent makes a small mistake. Misidentifies which module handles authentication. Gets one import path wrong. Reads an outdated API signature. Tiny error. Barely noticeable in the research output.

Your planning agent reads that research and builds a plan around the wrong assumption. The plan isn't obviously wrong. It's coherent. It just starts from a slightly incorrect foundation. The deviation from reality is bigger now but still plausible-looking.

Your implementation agent reads that plan and writes hundreds of lines of code. Code that's internally consistent but built on a foundation of sand. The cascade is complete. One wrong line in research became a hundred wrong lines in code.

Why more agents makes this worse, not better

Here's the bit that does my head in. The whole pitch of multi-agent systems is "more agents = better results." Specialise each agent. Divide the labour. Sounds reasonable.

But every agent you add to the chain is another joint in the pendulum. Every handoff is another point where small errors amplify. A three-agent pipeline (research, plan, implement) has two handoff points. A five-agent pipeline has four. Each one is a potential chaos amplification.

Sean Moran documented this properly in January 2026, showing that unstructured multi-agent architectures amplify errors 17x compared to single-agent baselines. 17x! You're not getting better results by adding agents. You're getting 17x worse results if the coordination is sloppy.

This is why HumanLayer's RPI methodology forces human review between phases. It's not because humans are better at writing code (the agents have us beat there). It's because humans are circuit breakers. They catch the small research error before it cascades into hundreds of bad code lines. The human doesn't need to be a better researcher than the agent. They just need to spot when the research is off.

📚 Geek Corner
Why the double pendulum analogy is more than metaphor. In dynamical systems theory, sensitivity to initial conditions means that prediction accuracy degrades exponentially with each degree of freedom added. A single pendulum is predictable forever. A double pendulum is predictable for a short time, then diverges. A triple pendulum is basically random. Multi-agent AI chains follow the same mathematical pattern. Each agent's output has some error distribution. When that output feeds into the next agent, the errors don't add linearly. They multiply. Agent 1's 5% error rate doesn't become 10% at agent 2. It becomes 5% of correct paths times agent 2's own error rate on each, which is a branching tree of possible failures. The paper "Agents of Chaos" (February 2026, arxiv) formalises this: competing prediction agents can phase-transition from stability into mathematical chaos. This isn't a soft analogy. It's the same maths.

📚 Geek Corner

Why the double pendulum analogy is more than metaphor. In dynamical systems theory, sensitivity to initial conditions means that prediction accuracy degrades exponentially with each degree of freedom added. A single pendulum is predictable forever. A double pendulum is predictable for a short time, then diverges. A triple pendulum is basically random. Multi-agent AI chains follow the same mathematical pattern. Each agent's output has some error distribution. When that output feeds into the next agent, the errors don't add linearly. They multiply. Agent 1's 5% error rate doesn't become 10% at agent 2. It becomes 5% of correct paths times agent 2's own error rate on each, which is a branching tree of possible failures. The paper "Agents of Chaos" (February 2026, arxiv) formalises this: competing prediction agents can phase-transition from stability into mathematical chaos. This isn't a soft analogy. It's the same maths.

The fix is boring (and that's the point)

Human checkpoints between agent phases. That's it. That's the fix.

HumanLayer calls it RPI (Research, Plan, Implement) and the whole point is that a human reviews 400 lines of research and plan before the agent writes 4,000 lines of code. You're spending 10 minutes of human review to prevent 10 hours of debugging broken output.

Not every agent chain needs human checkpoints. A two-agent system doing research and summarisation? The pendulum doesn't have enough joints to go chaotic. But the moment you've got three or more agents in a chain, with each one's output feeding the next one's input, you're playing double pendulum roulette. Add a human circuit breaker at the highest-leverage handoff point (usually between research and implementation) and the chaos stays contained.

Feels like: Stacking Jenga blocks on top of a washing machine. Each individual block is stable. The tower is not. The fix isn't better blocks. It's checking the tower every few layers.

Bottom line: More agents is not better agents. Every handoff is a potential chaos amplification point. If you're running multi-agent pipelines without human checkpoints between phases, you're building a double pendulum and hoping it doesn't go chaotic. Spoiler: it will. HumanLayer's RPI is one answer. Any answer that puts a human circuit breaker at the right handoff point works. The expensive mistake is skipping the checkpoint, not adding it.

Share𝕏 in

Steven Gonsalvez

Multi-Agent Error Cascades: The Double Pendulum Problem Nobody Talks About

Comments & Reactions