Tyranny of Small Decisions: AI Agents and Codebase Drift

March 5, 2025|9 min read

ai agents architecture engineering-culture economics

In 1966, an economist called Alfred Kahn published a paper with one of those titles that sounds boring until it rewires how you think about everything. "The Tyranny of Small Decisions." The argument was simple and brutal: a market economy allocates resources by summing up millions of small individual transactions. Each transaction is rational. But the aggregate result can be something nobody wanted and nobody voted for.

His example was American railroads. Nobody decided to kill passenger rail in the US. No politician stood up and said "let's get rid of trains." What happened was millions of individual passengers, on millions of individual trips, chose to drive instead. Each choice made perfect sense. The car was faster for that trip, or more convenient, or cheaper once you already owned one. But add them all up and you get a country where the railroad system collapsed, cities were rebuilt around cars, and fifty years later everyone's stuck in traffic wondering how it got this way.

Nobody voted to kill the railroad. It died one small decision at a time.

I've been thinking about this paper a lot recently, because I reckon we're running the same experiment with AI agents and codebases. And we're about halfway through the bit where nobody's noticed yet.

Every agent action is a small decision

Every time an agent suggests a code change and you accept it, that's a small decision. Every auto-merged PR, every "looks good, ship it" on an agent-generated diff, every refactor suggestion you approve with a quick skim. Each one is individually reasonable. The code compiles, the tests pass, the change makes local sense.

But thousands of these per week across a codebase? That's architectural drift. The kind of drift that doesn't show up in any single PR but becomes obvious when you zoom out after six months and think "hang on, when did we end up with four different patterns for error handling?"

Nobody voted for "let the AI reshape our architecture." It just happened. One small decision at a time. Kahn's railroad, running on GPU cycles instead of diesel.

The thing about architectural drift is it's invisible at the transaction level. Each three-line change is fine. It's only when you step back and look at the trajectory that you realise the codebase has been slowly walking somewhere you never intended to go.

Too small to evaluate

Kahn makes a point that I think is underappreciated: small transactions don't justify the effort of securing good market information. If you're buying a packet of crisps, you don't commission a market analysis. The stakes are too low for the overhead.

Same thing with agent outputs. Each individual suggestion is too small to deeply review. It's three lines. Maybe ten. You skim it, it looks right, you accept it and move on. The cost of properly evaluating each change (understanding the full context, checking for subtle architectural implications, tracing the ripple effects) massively outweighs the perceived value of the change.

So you don't. Nobody does. You do what Kahn's consumers do: make a quick, locally rational decision and crack on with your day.

Multiply that by hundreds per day and you've got a codebase full of changes that were each individually "fine" but collectively unreviewed in any meaningful sense. The aggregate is something nobody would have approved if they'd seen it as a single proposal. But they never got that chance, because it arrived as five hundred small proposals instead.

📚 Geek Corner
Information asymmetry and transaction costs: Kahn's argument leans on a concept economists call "rational ignorance." When the cost of becoming informed exceeds the expected benefit of being informed, rational actors stay ignorant. Ronald Coase made a similar point about transaction costs in 1960: the overhead of evaluating a transaction can prevent optimal outcomes. Agent-generated code changes sit squarely in this zone. The evaluation cost per change is high (understanding context, tracing implications) but the perceived risk per change is low (it's just three lines). So the rational response is to under-evaluate. At scale, rational under-evaluation produces irrational aggregate outcomes. This is not a tooling problem. It's an economic structure problem.

📚 Geek Corner

Information asymmetry and transaction costs: Kahn's argument leans on a concept economists call "rational ignorance." When the cost of becoming informed exceeds the expected benefit of being informed, rational actors stay ignorant. Ronald Coase made a similar point about transaction costs in 1960: the overhead of evaluating a transaction can prevent optimal outcomes. Agent-generated code changes sit squarely in this zone. The evaluation cost per change is high (understanding context, tracing implications) but the perceived risk per change is low (it's just three lines). So the rational response is to under-evaluate. At scale, rational under-evaluation produces irrational aggregate outcomes. This is not a tooling problem. It's an economic structure problem.

The one that properly messes with your head

Kahn's wildest point, and the one I keep coming back to, is that the cumulation of small choices can change the preference function itself. Not just the outcomes. The preferences.

Applied to his railroad example: as people chose cars over trains on individual trips, car infrastructure expanded, suburbs grew, train service deteriorated. Which made cars even more attractive on the next trip. The small decisions reshaped the environment, which reshaped the preferences, which produced more of the same small decisions. A feedback loop that changed what people wanted, not just what they chose.

Now think about what happens after six months of coding with an agent. Do you write code the way you would have written it before? Or do you write code the way the model writes it?

I notice this in myself. My variable naming has shifted. My function decomposition patterns have shifted. The way I structure modules has shifted. Not because I sat down and decided "I'm going to adopt the model's style." It happened incrementally, through thousands of small interactions where I accepted suggestions that were slightly different from what I'd have written, and over time those slight differences accumulated into a new baseline.

The agent doesn't just write code. It rewrites the developer. Slowly, through exposure, through the path of least resistance of accepting suggestions rather than rewriting them in your own style. Your taste drifts toward whatever the model's training data converged on. And you don't notice because it happens one small decision at a time.

That's the bit that worries me most. Once your preferences have shifted, you can't even see the drift. You've lost the reference point.

Feels like: Boiling a frog. Nobody turns the temperature up all at once. It creeps, degree by degree, and by the time you notice, you're writing code in a style you didn't choose and can't remember choosing.

The large decision remedy

Kahn's prescription for the tyranny of small decisions was to sometimes substitute one large, deliberate decision for the accumulated piecemeal ones. Instead of letting millions of individual trip choices determine the fate of the railroad, society should explicitly decide: do we want passenger rail or not? Make it a conscious, large-scale choice rather than an emergent accident.

For agent-assisted development, I think this means periodic architectural reviews where humans evaluate the cumulative trajectory. Not reviewing each PR (that's the small-decision trap). Stepping back every few weeks and asking: "Where has agent-assisted work taken this codebase? Is that where we want to be?"

This is, in a roundabout way, what the "extend the green" loop is about (I wrote about this in the autonomous factory post). The green extension loop is a structured process for evaluating aggregate outcomes, not individual transactions. You run a batch, measure the results, study the failures, and make a deliberate decision about what to fix. That's Kahn's "large decision" applied to agent quality.

The temptation is to keep optimising at the individual transaction level: better prompts, better context, better code review per PR. Those help. But they don't address the tyranny of small decisions because they're still operating at the small-decision scale. You need something that operates at the large-decision scale. Something that looks at the whole trajectory and says "yes, this is where we want to go" or "no, we've drifted, let's correct."

Agent-to-agent tyranny

And then there's the bit that keeps me up at night. Because so far I've been talking about human-agent interactions, where at least a human is involved in each small decision (even if they're rubber-stamping it). But in a swarm, agents are making small decisions about each other's work.

Agent A writes code. Agent B reviews it. Agent C integrates it. Each interaction is a small decision. Agent B's review is a small transaction, too limited in scope to justify full architectural evaluation. Agent C's integration is another small transaction. The swarm's collective behaviour emerges from the accumulation of these micro-decisions, and nobody, human or agent, evaluates the macro pattern.

This is Kahn's tyranny running at machine speed without even the thin check of a human glancing at the diff. The aggregate outcome emerges from thousands of small agent-to-agent decisions per day, and it might be something no individual agent intended or any human would approve.

The 2-2 factor is one response to this. Composing independent checks combats unreviewed small decisions by ensuring no single agent's small decision goes unchallenged. But even consensus protocols operate at the transaction level. They make each small decision more reliable without addressing the aggregate trajectory.

What's missing is the large-decision equivalent for swarms. A mechanism that periodically evaluates the emergent direction and makes a conscious, large-scale choice about whether to continue on that path. Not just "was this PR correct?" but "what has this swarm been building, taken together, and is that what we wanted?"

I don't have a clean answer for what that mechanism looks like. I'm not sure anyone does yet. But I'm fairly confident that without it, swarms will reproduce Kahn's tyranny at a scale and speed that makes the railroad look like a gentle fade.

Bottom line

Kahn wrote his paper about market economies and passenger trains, but the structure of his argument is universal. Any system that allocates resources through the accumulation of small, individually rational decisions is vulnerable to collectively irrational outcomes. Agent-assisted development is that system, running at a pace Kahn couldn't have imagined.

The fix isn't to make each small decision better (though that helps). The fix is to periodically step outside the stream and make a large one. Look at the trajectory, not the transaction.

Nobody is going to vote to let agents reshape your codebase. But if you don't make that decision explicitly, it'll happen anyway. One small decision at a time. Kahn told us this sixty years ago. We just weren't paying attention because we were too busy accepting the next suggestion.

If the entropy angle interests you, I wrote about why everything you build eventually decays and how the second law of thermodynamics applies to codebases. And war heroes vs the meticulous engineer covers the organisational pattern of rewarding firefighters while ignoring the people who prevent fires in the first place.

Share𝕏 in

Steven Gonsalvez

Tyranny of Small Decisions: AI Agents and Codebase Drift

Comments & Reactions