The Death of MCP: Context Rot, Token Waste, and Why Class Files Win

December 15, 2025|13 min read

ai mcp agents architecture developer-tools

I wrote a whole series on MCP earlier this year. Covered the architecture, the ecosystem, the security risks, the lot. I was properly optimistic about it. "USB-C for AI integrations," people kept saying, and I bought in.

I was wrong.

Not completely wrong. The idea is sound. A standard protocol for connecting AI agents to tools and data sources. Lovely. The problem is that the implementation is burning the most precious resource your agent has (its context window) on thousands of tokens of JSON schema definitions that your agent will never touch. And in a world where context is the thing that determines whether your AI assistant actually understands your codebase or just hallucinates at it, that's not a protocol cost. That's sabotage.

Let me show you the maths.

The 15x Tax You're Paying Without Knowing It

The Playwright MCP server registers 22 browser automation tools. Navigate, click, fill, screenshot, evaluate, the full catalogue. Each tool comes with a JSON schema describing its name, parameters, types, descriptions, examples. The whole lot loads into your context window the moment the MCP server connects.

Total cost: roughly 15,400 tokens. Just sitting there. Not doing anything. Not running a single browser command. Just existing in your context.

Now write a TypeScript class with the same five methods you actually use 80% of the time (goto, click, fill, snapshot, screenshot). About 1,000 tokens. The agent reads it, understands it natively because it's just code, and cracks on.

That's a 15x overhead. Fifteen times more context consumed for the privilege of having 17 tools you won't call today. And Playwright isn't even the worst offender. The MySQL MCP server loads 106 tools at roughly 54,600 tokens on every initialisation. A GitHub MCP server now has 51 tools. Slack varies by implementation but you're looking at 9 to 16 tools depending on which one you picked.

Stack a few of these together and you're looking at 50,000 to 70,000 tokens of tool definitions before your agent has seen a single line of your code. On a 200k context window, that's 25-35% gone. On 128k, it's worse. You're paying rent on tools you're not using, in a building where square footage is everything.

📚 Geek Corner
A benchmark by Scalekit ran 75 head-to-head comparisons of MCP vs CLI for identical operations. The results: MCP cost 4x to 32x more tokens than the CLI equivalent. A simple task consumed 1,365 tokens via CLI and 44,026 via MCP. Not a typo. Thirty-two times more tokens for the same result. The Playwright team's own numbers showed 114,000 tokens per test via MCP versus 27,000 via their new CLI. The protocol overhead isn't marginal. It's the majority of your spend.

📚 Geek Corner

A benchmark by Scalekit ran 75 head-to-head comparisons of MCP vs CLI for identical operations. The results: MCP cost 4x to 32x more tokens than the CLI equivalent. A simple task consumed 1,365 tokens via CLI and 44,026 via MCP. Not a typo. Thirty-two times more tokens for the same result. The Playwright team's own numbers showed 114,000 tokens per test via MCP versus 27,000 via their new CLI. The protocol overhead isn't marginal. It's the majority of your spend.

Feels like: Booking a removal van to carry a sandwich. The van works. The sandwich arrives. But maybe just walk?

Agents Can Write Code Now. That Changes Everything.

The whole premise of MCP was reasonable in 2024: "Give the model pre-built tools so it doesn't have to figure out how to do things." At the time, models were a bit rubbish at generating reliable tool-calling code. You needed guardrails. You needed structured schemas. Fair enough.

But it's September 2025. Claude Code generates Playwright scripts from scratch. Cursor writes API integration code on the fly. Codex spins up entire test suites. These agents don't need a 15,000-token JSON schema to know how to click a button in a browser. They can just write the code.

// This is all you need. ~1K tokens.
class Browser {
  async goto(url: string) { /* ... */ }
  async click(selector: string) { /* ... */ }
  async fill(selector: string, value: string) { /* ... */ }
  async snapshot(): Promise<string> { /* ... */ }
  async screenshot(path: string) { /* ... */ }
}

The agent reads that class, understands the interface, and writes whatever browser automation it needs. No JSON schema parsing. No tool selection overhead. No 15,000-token preamble. Just code calling code.

This is the bit that I reckon MCP advocates haven't properly reckoned with. The protocol was designed for a world where models needed training wheels. We're past that now. The models can ride the bike.

curl + jq: The Integration Nobody Talks About

Here's something that makes me slightly mental. Most MCPs are thin wrappers around REST APIs. That's it. They take an HTTP endpoint, wrap it in a JSON schema, register it as a tool, and charge you 200-500 tokens per tool definition for the privilege.

Your agent can do this instead:

curl -s api.example.com/data | jq '.items[] | {name, status}'

Fifty tokens. Maybe sixty. Gets exactly the data it needs, filtered down to what matters, with zero protocol overhead. The MCP equivalent loads 3,000 tokens of tool definitions, makes the same API call under the hood, and returns unfiltered JSON that your agent then has to parse anyway.

The gh CLI does everything the GitHub MCP does. The gcloud CLI covers Google Cloud. aws-cli handles AWS. These CLIs exist, they're well-documented, the models know them inside out from training data, and they cost basically nothing in context.

I'm not saying CLIs are perfect for everything. But for the 80% case where you're hitting a REST API and filtering some JSON, you're burning thousands of tokens on a wrapper you don't need.

The Market Is Already Telling You

Here's where it gets interesting. People are building tools to undo MCP.

mcporter converts MCP servers into standalone TypeScript libraries and CLIs. No context bloat, no protocol overhead. Just regular code your agent can import and call.

Mario Zechner wrote a proper good post in November making the same argument from first principles. His take: replace the entire Playwright MCP (13,700 tokens of tool definitions) with a handful of Bash CLI tools. A start command, a navigate command, a JavaScript eval command, and a screenshot command. The whole README for his approach? 225 tokens. That's 1.6% of what Playwright MCP costs.

I've been doing something similar with a lightweight browser-tools CLI that wraps Chrome DevTools Protocol into simple commands: start, nav, eval, screenshot, pick, console, search, content. No MCP server. No tool definitions. Just a TypeScript CLI that the agent calls directly. It even converts Playwright selectors to native DOM queries so you get compatibility without the protocol tax.

The pattern is clear. People are building tools to undo MCP. If the abstraction was adding value, nobody would be building escape hatches. You don't see people writing "React-to-jQuery converters." When an abstraction earns its keep, you lean into it. When it doesn't, you get converters, slimmers, and alternatives. MCP is accumulating escape hatches at a rate that should concern anyone betting their toolchain on it.

📚 Geek Corner
Vercel's agent-browser team found that reducing from 17 browser tools to 2 produced a 93% reduction in context window usage, and their December 2025 blog showed that removing 80% of an agent's tools made it 3.5x faster with a 100% success rate. The fewer tools you load, the better the agent performs. This is counterintuitive if you think of tools as capabilities. But it makes perfect sense if you think of tool definitions as noise. Every tool definition is a distraction the model has to evaluate before choosing what to do. Fewer tools means less noise means faster, better decisions.

📚 Geek Corner

Vercel's agent-browser team found that reducing from 17 browser tools to 2 produced a 93% reduction in context window usage, and their December 2025 blog showed that removing 80% of an agent's tools made it 3.5x faster with a 100% success rate. The fewer tools you load, the better the agent performs. This is counterintuitive if you think of tools as capabilities. But it makes perfect sense if you think of tool definitions as noise. Every tool definition is a distraction the model has to evaluate before choosing what to do. Fewer tools means less noise means faster, better decisions.

The Chunky Problem Nobody Wants to Fix

MCP servers tend to expose ALL their capabilities as tools. Every endpoint. Every operation. Every edge case method you'll use once a year.

The GitHub MCP: 51 tools. The MySQL MCP: 106 tools. Slack: somewhere between 9 and 16 depending on implementation. And here's the kicker: there's no lazy loading. No partial tool registration. No "give me just the three tools I need." It's all or nothing. Your agent loads every single tool definition whether it needs create_issue and list_issues or it needs all fifty-one.

This is a proper architectural problem. The protocol has no mechanism for on-demand tool discovery that doesn't burn context. Yes, Claude Code recently added tool search that auto-defers loading when definitions exceed 10% of the context window. That's a patch on a symptom. The underlying design still assumes that the way to give an agent capabilities is to front-load every possible tool definition into the prompt.

Compare that to how a human developer works. You don't memorise every method on every library you've ever used. You know the library exists, you know roughly what it does, and you look up the specific method when you need it. MCP is the equivalent of forcing you to read the entire API reference of every dependency in your package.json before you're allowed to write a line of code.

The One MCP That Might Earn Its Keep

I'll give credit where it's due. Context7 is arguably the one MCP server that earns its context cost.

It does something a class file can't: fetches live, version-specific documentation for whatever library you're working with. The model's training data might be six months stale. Context7 pulls the current docs, the actual API signatures, the latest code examples, and injects them into your context right when you need them. That's a properly useful thing that a static class file can't replicate.

But even Context7 is really just two tools: resolve-library-id and query-docs. Two tools. Not fifty-one. Not a hundred and six. Two. And the documentation it fetches is the payload, not the tool definition overhead. That's what "earning your context cost" looks like: the useful content massively outweighs the schema overhead.

You could, if you were feeling bolshie, replicate most of that with a curl to the right documentation endpoint. But Context7's library resolution is convenient enough that I'd call it a net positive. One out of however many MCP servers exist now. Not a great hit rate for a protocol that was supposed to be universal.

What Actually Works

So if MCP is a tax, what do you use instead?

Class files and modules. A TypeScript class with five methods that the agent can import, read, and call. No schema overhead. No protocol layer. The agent understands the code because it is code. If the agent needs a method that doesn't exist, it writes one. That's the whole point of code-generating agents. Let them generate code.

Direct CLI commands. curl, jq, gh, aws, gcloud, kubectl. The models know these tools from training. The man pages are well-indexed. A single curl | jq pipeline costs 50 tokens and does what a 3,000-token MCP tool definition does.

On-demand skills. Claude Code's skill system loads capabilities when they're needed and unloads them when they're not. No permanent context occupation. No 50,000-token boot tax. The agent asks for what it needs, gets it, uses it, moves on. That's the model.

The agent writes the integration itself. This is the one that MCP advocates really don't want to hear. If your agent can write a Playwright script from scratch, why does it need a Playwright MCP? If it can write a curl command that hits the GitHub API, why load 51 tool definitions? The agent is the integration layer. It doesn't need another one between itself and the tools.

The SOAP Trajectory

I've been building software long enough to recognise this pattern.

SOAP was going to standardise web services. Every enterprise bought in. The schemas got bigger. The WSDLs got longer. The tooling got more complex. And then REST showed up and everyone quietly migrated because it turned out that the simpler, less structured approach worked better for 90% of use cases.

ESBs were going to be the universal integration layer. Enterprise Service Buses connecting everything to everything through a central protocol. What they actually delivered was XML hell, schema sprawl, and a new class of debugging problems that didn't exist before the "solution" arrived.

MCP is on the same trajectory. Not because the people building it aren't smart (they are). Not because the idea is fundamentally broken (it isn't). But because the abstraction's cost exceeds its value for the majority of use cases. And when that happens, the market routes around it. Every time. Without exception.

The academic research is already showing it. A paper on MCP-augmented LLMs found that input token volume increases by 236.5x across models and tasks, and, properly mad this one, MCP integration actually reduced accuracy by an average of 9.5%. More tokens, worse results. That's not overhead. That's active harm.

📚 Geek Corner
The research paper "Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models" tested MCP across six LLMs on three core task categories. The finding that MCP reduces accuracy by 9.5% on average suggests that the noise introduced by verbose tool schemas actively degrades model performance. This aligns with the attention mechanism research showing that irrelevant context tokens compete for attention weights with relevant tokens. More tools in context means more noise in the attention distribution means worse outputs. The context window isn't just a storage limit. It's a signal-to-noise ratio.

📚 Geek Corner

The research paper "Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models" tested MCP across six LLMs on three core task categories. The finding that MCP reduces accuracy by 9.5% on average suggests that the noise introduced by verbose tool schemas actively degrades model performance. This aligns with the attention mechanism research showing that irrelevant context tokens compete for attention weights with relevant tokens. More tools in context means more noise in the attention distribution means worse outputs. The context window isn't just a storage limit. It's a signal-to-noise ratio.

Where This Goes

MCP won't disappear overnight. It has institutional momentum. Anthropic backs it. Microsoft, Google, and others have adopted it. There are thousands of MCP servers in the wild.

But the trajectory is clear. The protocol will get thinner and thinner as the escape hatches multiply. Tool search, lazy loading, slim wrappers, CLI converters. Each one is an admission that the original design loads too much context. Eventually you'll have an MCP server that lazily loads one tool at a time, generates a minimal schema on demand, and calls the underlying API directly. At which point you've reinvented a function call with extra steps.

The smart money is already moving. Vercel's agent-browser proved that fewer tools means better performance. Playwright shipped a CLI that uses 4x fewer tokens than its MCP. The tools-to-strip-MCP ecosystem is growing faster than the MCP ecosystem itself. That's not a protocol winning. That's a protocol being routed around.

Bottom line: MCP was a good idea for a specific moment in time when models needed structured tool definitions to do anything useful. That moment has passed. Context windows are finite, tokens cost money, and agents can write code. The maths doesn't work. Fifteen thousand tokens for tool definitions your agent will never call, when a thousand-token class file does the job better. I wrote an optimistic series about MCP six months ago. I'm writing its obituary now. The protocol isn't dead yet, but the patient is on a ventilator and the family is starting to have that conversation. For the broader context-window discipline that sits around this, the full token-optimisation playbook covers the rest: /effort tuning, conversation capping, model routing, and the silent drains most people miss. Crack on with class files. Your context window will thank you.

Share𝕏 in

Steven Gonsalvez

The Death of MCP: Context Rot, Token Waste, and Why Class Files Win

Comments & Reactions