Steven Gonsalvez

Software Engineer

← Back to Blog

War Heroes vs The Meticulous Engineer

leadershipengineering-cultureorganisationsmanagement

"Nobody ever gets credited for fixing problems that never happened."

That sentence has been rattling around in my head for years. I've seen it play out at every company I've worked at, every team I've been part of, and every org chart I've squinted at trying to understand why that person got promoted.

The War Hero Problem

Large organisations are addicted to firefighters. The people who swoop in at 2am, hair on fire, laptop balanced on one knee in a taxi, and somehow get the deployment unfurled before the client notices. They're war heroes. They get the standing ovation in the all-hands. They get the spot bonus, the promotion, the war story that gets retold at every team dinner for the next three years.

And they deserve recognition. Properly. Fixing a production outage under pressure is hard, stressful, skilled work.

But here's the thing nobody says out loud: most of those fires were preventable. The deployment that went sideways at 2am went sideways because the release process has been held together with duct tape and good intentions since 2019. The database that fell over was running on a single instance because nobody approved the HA ticket six months ago. The API that buckled under load had a caching layer on the roadmap that kept getting bumped for "higher priority" feature work.

The war hero didn't prevent the fire. The war hero just happened to be available when the fire broke out. And because they fixed it visibly, dramatically, and under pressure, they got the credit.

The Meticulous Guy

Meanwhile, somewhere in the org, there's someone who spent three weeks building proper observability. Someone who wrote the runbook that nobody reads until they need it. Someone who set up the canary deployment pipeline that catches bad releases before they hit production. Someone who insisted on the HA database config and fought through four rounds of budget approval to get it.

That person prevented outages. Plural. Outages that never happened because the system was designed to not have them.

And nobody noticed.

Because you can't celebrate a non-event. You can't give a spot bonus for "that time nothing went wrong." There's no all-hands presentation titled "Everything Worked as Expected Last Quarter." The absence of drama is invisible. The presence of drama is a story.

Time Preference and Why We're Wired Wrong

So why does this happen? Why do smart organisations consistently reward the reactive over the preventive?

It comes down to time preference. Economists have been writing about this since Bohm-Bawerk in the 1880s. Humans value immediate rewards more than future ones. A hundred quid today feels worth more than a hundred and twenty quid next year, even though the maths says otherwise.

Organisations are just collections of humans, and they inherit the same bias. A firefighter delivers value now. You can see the fire, see the fix, see the relief on the client's face. It's immediate, tangible, and emotionally satisfying. The meticulous engineer delivers value later, maybe, probably, in the form of things not going wrong. That's abstract, deferred, and boring.

The result: promotion committees reward time preference. They reward the person who delivered visible impact this quarter over the person who prevented invisible disasters next quarter. Not because they're stupid, but because the incentive structure is wired to the immediate.

The Paradox of Effort

Making something look effortless takes enormous effort. And that's exactly the trap.

When prevention works, everything looks easy. The system "just works." Deployments go smoothly. Nothing breaks on Friday afternoon. And because nothing broke, nobody thinks about why nothing broke. They assume it's the default state. It's not. Stability is an output of deliberate, sustained, unglamorous work.

The war hero's effort is visible by definition. They're running around, pinging Slack channels, jumping on calls, pushing hotfixes. You can see them working. The meticulous engineer's effort is invisible by definition. Their best work produces silence. And silence doesn't show up on a performance review.

Denying instant gratification in deference to long-term goals is supposed to be virtuous. We teach kids this. We admire it in theory. But in practice, in the quarterly review cycle, in the annual promotion round, the person who ran toward the fire beats the person who installed the smoke detectors every single time.

The Cultural Rot

Here's where it gets properly dangerous.

When war heroes get promoted, they become managers. When war hero managers hire, they hire people who remind them of themselves. People who thrive under pressure. People who are good at last-minute heroics. People who are, culturally, firefighters.

Over time, the senior leadership of the organisation is composed almost entirely of people whose core skill is reacting to crises. Not preventing them. Not building systems that don't need heroics. Reacting.

And because that leadership shapes culture, the whole organisation starts to orient around crisis response. Process improvement gets lip service. Reliability engineering gets underfunded. Runbooks don't get maintained. Monitoring dashboards exist but nobody watches them until something's already on fire.

The organisation becomes structurally dependent on firefighting because it promoted all the firefighters and starved the prevention work. It's a self-reinforcing cycle. More fires, more heroes, more promotions for heroes, less investment in prevention, more fires.

What Actually Fixes This

I've been thinking about this for a while and I don't have a clean answer. But I have some observations.

Measure stability, not just delivery. If your metrics only track features shipped and bugs fixed, you're incentivising firefighting. Add metrics for: time since last incident, mean time to recovery, deployment success rate, percentage of changes that require rollback. Make stability a first-class outcome, not a side effect.

Make prevention visible. This is the hard one. When someone prevents an outage, you need a mechanism to surface that. Post-mortems for incidents are standard. How about pre-mortems for near-misses? "Here's what would have happened if we hadn't built the circuit breaker last sprint." Make the counterfactual concrete.

Promote the boring people. Not literally boring. But promote the person who built the system that doesn't break, not just the person who fixed it when it did. This requires promotion committees to value prevention, which requires cultural change, which requires leadership buy-in, which requires... promoted boring people. It's circular, but you have to start somewhere.

Watch who you celebrate. The stories you tell in all-hands meetings shape what people aspire to. If every hero story is a firefighting story, people will aspire to firefight. Tell the story about the engineer who spent six months improving deployment reliability until it was boring. Tell the story about the team that went an entire quarter without a P1. Celebrate the silence.

The Real Question

Next time someone gets a standing ovation for saving a production deployment at 2am, ask yourself: why was the deployment at risk in the first place? Who raised the flag six months ago? Who wrote the ticket that got deprioritised? Who tried to fix the process before it broke?

That person probably didn't get promoted. They might not even work there anymore. But they were right. And the organisation is paying the cost of not listening to them, over and over, every time the war hero has to suit up again.

The war hero is a symptom. The meticulous engineer is the cure. And until organisations learn to tell the difference, they'll keep promoting the symptom.

If this resonated, you might also enjoy my piece on why no single human should be trusted to act alone, which digs into the maths of independent review. Or if you're thinking about how entropy slowly dismantles everything you build, there's a post on that too.

Share𝕏in

Comments & Reactions