October 13, 2025|5 min read

Claude Code Skills Just Made Half Your MCP Servers Redundant

Skills are what MCP should have been 📝

Anthropic announced Agent Skills on October 16th, and I reckon this is one of those quiet releases that ends up mattering more than the flashy model drops.

A skill is a markdown file with YAML frontmatter. That's it. No server process. No JSON-RPC. No WebSocket connections. No Docker containers. A markdown file. And the clever bit is how it loads.

Level 1 is metadata. Name, description, trigger conditions. Costs you about 30 to 100 tokens and it's always in context. Level 2 is instructions. The actual "how to do the thing" content. Under 5,000 tokens. Only gets loaded when the model decides it's relevant to your task. Level 3 is resources. Referenced files, example code, whatever. Only pulled in when the instructions explicitly reference them.

So you've got a system where 100 skills can sit in your context at a cost of maybe 3,000 to 10,000 tokens total. The metadata layer alone. The model reads the names and descriptions, figures out which ones matter, and loads only what it needs.

Now compare that to MCP. Four or five MCP servers and you're looking at 40,000 to 60,000 tokens of JSON schemas. Loaded upfront. All of them. Whether you need them or not. Sitting in your context window like furniture in a flat you never use, taking up space and making the place harder to navigate.

Most MCP servers were never about live data

Here's the thing I keep coming back to. What were people actually using MCP for?

Some of it was legitimate live data connections. Database queries. API calls. Fetching real-time information the model can't have in its training data. Fair enough. That's a genuine use case and skills can't replace it.

But a huge chunk of the MCP ecosystem was procedural knowledge. How to deploy this thing. How to format commits. How to run the test suite. How to interact with Jira. Step-by-step instructions wrapped in a protocol layer and loaded as tool definitions. Tens of thousands of tokens to tell the model "when someone asks about deployment, here's what to do."

Skills do that at roughly 1/100th the context cost. Not an exaggeration. 30 tokens of metadata versus 3,000 tokens of tool schema. And the instructions only load when they're needed, so the actual runtime cost is even better than the comparison suggests.

I've been running my own setup with skills for commits, code review, session management, research workflows, all sorts. Dozens of them. The context overhead is negligible. Try running dozens of MCP servers and watch your model forget what it was doing halfway through a conversation.

📚 Geek Corner
Progressive loading vs. eager loading: MCP uses eager loading. Every tool schema enters context at connection time. Skills use progressive loading across three tiers: metadata (always loaded, ~50 tokens each), instructions (loaded on relevance, <5k tokens), and resources (loaded on reference). The difference is architectural. MCP treats every tool as equally likely to be needed. Skills treat most tools as unlikely to be needed until proven otherwise. In information retrieval terms, MCP optimises for recall (everything available) while skills optimise for precision (only what's relevant). For context-constrained systems, precision wins.

📚 Geek Corner

Progressive loading vs. eager loading: MCP uses eager loading. Every tool schema enters context at connection time. Skills use progressive loading across three tiers: metadata (always loaded, ~50 tokens each), instructions (loaded on relevance, <5k tokens), and resources (loaded on reference). The difference is architectural. MCP treats every tool as equally likely to be needed. Skills treat most tools as unlikely to be needed until proven otherwise. In information retrieval terms, MCP optimises for recall (everything available) while skills optimise for precision (only what's relevant). For context-constrained systems, precision wins.

Mario Zechner was asking the same question

Two weeks after skills launched, Mario Zechner published "What if you don't need MCP at all?" on November 2nd. Different angle, same conclusion. His argument was that most of what people use MCP for can be handled by the agent calling CLIs and APIs directly, or by injecting knowledge through simpler mechanisms. Skills are exactly that simpler mechanism for the knowledge-injection half of the equation.

I don't think this is subtle anymore. MCP tried to be the universal connector for everything: live data, procedural knowledge, tool access. Turns out that's too many jobs for one protocol, and the ecosystem is quietly unbundling it. Keep MCP for live data connections where you genuinely need them. Move procedural knowledge to skills. And for direct tool access, just call the CLI. Each replacement is cheaper and simpler than MCP was for that specific job.

Feels like: Realising you've been driving a lorry to the corner shop. The lorry works, technically. But a bicycle gets you there faster, cheaper, and without having to find parking for a 12-tonne vehicle.

Bottom line: Skills are the right abstraction for procedural knowledge. MCP is the right abstraction for live data. Most people were using MCP for both and paying a massive context tax for the privilege. If you haven't moved your "how to do things" knowledge from MCP servers to skills, you're burning tokens for no reason. The Death of MCP piece covers the full trajectory, but the short version is this: the protocol's scope is shrinking, and that's a good thing. Smaller scope, less overhead, better results. Crack on.

Share𝕏 in

Steven Gonsalvez

Claude Code Skills Just Made Half Your MCP Servers Redundant

Comments & Reactions