August 4, 2025|8 min read

GPT-5, Opus 4.1, and Duct-Tape Security: AI's Wildest Week in 2025

The Week That Had Everything 🎪

A 24-year-old lands a $250M AI pay package. Meanwhile, link wrappers are nicking your login credentials. Same industry. Same week.

Right, where do I even start with this one. Week 32 was the kind of week where every newsletter hit different. GPT-5 dropped. Opus 4.1 dropped. OpenAI went open-source. Cursor got a CLI and got poisoned. North Korean devs are still out here catfishing hiring managers. And someone, somewhere, is writing a quarter-billion-dollar cheque to a researcher who can't legally rent a car in most US states.

Let's crack on.

The Money's Gone Absolutely Mental 💰

AI researchers are being recruited like Premier League strikers now. We're talking $250M packages. For a 24-year-old. I don't care how good your transformer architecture paper is, that number should make everyone uncomfortable.

The maths works out to roughly "we'd rather overpay by 10x than let a competitor have you." Which, fine, that's how bidding wars work. But it tells you something about the state of things when the talent pool is so thin that a single researcher commands more than most companies are worth.

Here's what bothers me though. These packages aren't salary. They're structured as equity, retention bonuses, and golden handcuffs. The researcher doesn't actually get $250M unless the company's valuation holds. And if we've learned anything from the last two decades of tech, valuations are vibes until they're not.

Feels like: Paying someone a quarter billion to fix your plumbing while the rest of the house is on fire and nobody's rung the fire brigade.

GPT-5: The Main Event (That Got Upstaged) 🎬

OpenAI shipped GPT-5 on Friday. Three variants: Pro, Mini, and Nano. Available to everyone in ChatGPT. Smartest and fastest, they say.

And honestly? The reaction was a bit muted. Not because GPT-5 is bad. By all accounts it's proper good. But it landed in a week where everything dropped. Opus 4.1 on Wednesday. GPT-OSS on Wednesday. Gemini coding agent. Cursor CLI. The news cycle was so saturated that a flagship model launch felt like just another item on the list.

That's either a sign of how fast things move now, or a sign that we've collectively lost the ability to be impressed for more than about four hours.

The GPT-OSS release is actually the more interesting story, if you ask me. A 120B open-weight model under Apache 2.0 that nearly matches o4-mini on reasoning benchmarks and runs on a single 80GB GPU. OpenAI going actually open-source after years of "open" being a punchline. DeepSeek and Qwen3 still beat it on raw intelligence, but the fact that OpenAI is playing this game at all is a shift.

📚 Geek Corner
GPT-OSS-120B and the open-weight arms race: OpenAI's 120B model hits near-parity with o4-mini on core reasoning while fitting on a single A100/H100. The trick is aggressive distillation from their larger proprietary models, not architectural novelty. This puts it behind DeepSeek R1 and Qwen3-235B on intelligence benchmarks, but the Apache 2.0 licence means it's actually deployable without lawyers. The real question: does OpenAI releasing competitive open models undermine their own API revenue, or does it function as a loss leader to keep developers in the OpenAI ecosystem? My bet is the latter. Get them building on your architecture, then upsell the proprietary stuff. Classic developer relations playbook, just at a different scale.

📚 Geek Corner

GPT-OSS-120B and the open-weight arms race: OpenAI's 120B model hits near-parity with o4-mini on core reasoning while fitting on a single A100/H100. The trick is aggressive distillation from their larger proprietary models, not architectural novelty. This puts it behind DeepSeek R1 and Qwen3-235B on intelligence benchmarks, but the Apache 2.0 licence means it's actually deployable without lawyers. The real question: does OpenAI releasing competitive open models undermine their own API revenue, or does it function as a loss leader to keep developers in the OpenAI ecosystem? My bet is the latter. Get them building on your architecture, then upsell the proprietary stuff. Classic developer relations playbook, just at a different scale.

Opus 4.1: The Quiet Mid-Week Drop 🔬

Anthropic shipped Claude Opus 4.1 on Wednesday. Improvements in agentic tasks, real-world coding, and reasoning. No massive fanfare. No countdown timer. Just... here it is, it's better, carry on.

I've been running Claude Code daily and the improvements in agentic task completion are noticeable. Less faff with multi-step operations. Better at holding context across long sessions. The kind of upgrade that doesn't make you say "wow" but does make you say "huh, that worked first time" more often.

Multiple newsletters called it "the biggest AI week of the year" and they weren't wrong. Three major model releases in a single week from three different companies. Proper arms race energy.

Meanwhile, Everything Is On Fire 🔥

Right, so while everyone's mucking about with their shiny new models, let's talk about the absolute state of security this week.

Link wrappers stealing logins. Cloudflare's Email Security team caught threat actors abusing link-wrapping services from Proofpoint and Intermedia. The tools that are meant to protect you from dodgy links are being weaponised to deliver dodgy links. That's not a vulnerability, that's a comedy sketch.

Public prompts to local shells. Exactly what it sounds like. Prompts that can escape into your local shell. If you're running AI tools that execute code, this should keep you up at night.

Cursor MCPoison. The same week Cursor launches their CLI, a poisoning attack surfaces that targets Cursor specifically. You couldn't write better timing if you tried. Build the tool on Monday, someone finds a way to poison it by Friday.

North Korean fake devs. Still happening. Thousands of IT workers deployed abroad with fake identities, landing remote jobs at Western companies. We've known about this for over a year and the industry response has been... well, it's been nothing much, hasn't it.

Perplexity's stealth crawlers. Cloudflare caught Perplexity using stealth crawling to bypass website restrictions. Robots.txt? Never heard of her. We're just going to hoover up your content and serve it back without attribution. Cheeky doesn't begin to cover it.

Chrome cookie encryption blown up. Student financial data stolen. Informants exposed in a hack. Summer cyber attack spike.

All the same week. All happening while someone signs a $250M retention package.

The Smell of Vibe Coding 😷

Changelog ran a piece titled "The smell of vibe coding" and I proper love the framing. Because vibe coding does have a smell. It's the smell of code that works but nobody understands why. It's the smell of a codebase where the AI wrote 80% of it and the human vibed their way through the remaining 20%. It compiles. Tests pass. Ship it.

Until it doesn't. And nobody can debug it because nobody actually wrote it.

The Gemini coding agent also launched this week, because apparently every company needed to ship something. Google's approach is different to Claude Code and Cursor, leaning harder into the "agent that does things for you" model rather than the "copilot that helps you do things" model. Jury's still out on which philosophy wins.

📚 Geek Corner
The coding agent spectrum: There's a real philosophical split forming. On one end: Claude Code and Cursor, which are essentially power tools. You drive, they assist. On the other: Gemini's coding agent and similar products that try to do the whole job autonomously. The first approach keeps the developer in the loop but limits throughput. The second approach scales but introduces the "smell" problem. Nobody knows how to maintain code they didn't write, whether the author was a junior dev in Bangalore or an LLM in a data centre. The answer is probably somewhere in the middle, but right now everyone's racing to the extremes.

📚 Geek Corner

The coding agent spectrum: There's a real philosophical split forming. On one end: Claude Code and Cursor, which are essentially power tools. You drive, they assist. On the other: Gemini's coding agent and similar products that try to do the whole job autonomously. The first approach keeps the developer in the loop but limits throughput. The second approach scales but introduces the "smell" problem. Nobody knows how to maintain code they didn't write, whether the author was a junior dev in Bangalore or an LLM in a data centre. The answer is probably somewhere in the middle, but right now everyone's racing to the extremes.

The Juxtaposition That Won't Leave My Head 🤯

Here's what I keep coming back to.

On Monday, someone signs a $250M deal because AI talent is that valuable.

On Tuesday, North Korean operatives are infiltrating Western companies with fake dev profiles.

On Wednesday, three flagship AI models drop simultaneously.

On Thursday, Chrome's cookie encryption gets cracked and student financial data gets nicked.

On Friday, GPT-5 launches alongside a poisoning attack on one of the most popular AI coding tools.

We're building absurdly capable systems and securing them with bodge jobs and hope. The same industry that can afford quarter-billion retention packages can't figure out how to stop phishing attacks that abuse its own security tools. The gap between what we're building and how well we're protecting it is getting wider every week.

And Perplexity's over there just crawling whatever it wants, because apparently rules are for other people.

What's Actually Worth Your Time This Week

If you only track three things from this week:

GPT-OSS under Apache 2.0 - More interesting than GPT-5 itself. Open-weight models that rival proprietary ones change the economics for everyone running inference at scale.
Cursor MCPoison - If you're using AI coding tools (and you are), this is the attack vector you need to understand. Poisoned context that makes your AI write vulnerable code. Sneaky and nasty.
The link-wrapper abuse - Security tools being turned into attack vectors. If your org uses Proofpoint for email protection, go check your configs.

Bottom line: We're in the era where a single week delivers three frontier model launches, an open-source bombshell, and half a dozen security disasters. The money's flowing. The models are shipping. The security is held together with duct tape and wishful thinking. I wrote about MCP security risks in detail earlier this year, and the Cursor MCPoison story is exactly the kind of thing I was warning about. If you reckon vibe coding peaked last month, this week says hold my beer. If that doesn't sum up 2025, I don't know what does.

Share𝕏 in

Steven Gonsalvez

GPT-5, Opus 4.1, and Duct-Tape Security: AI's Wildest Week in 2025

Comments & Reactions