Browser Tools for AI Agents Part 1: Playwright, Puppeteer, and Why Your Agent Picked Playwright
The first time I watched an AI agent drive a browser, I laughed. Out loud. Not because it was funny, but because it was so profoundly uncanny. Like watching a toddler who somehow knows how to file taxes. The cursor moved with purpose. It found form fields I would've struggled to locate myself. It waited patiently for a spinner to clear, then clicked a button that hadn't even rendered when the page first loaded.
I sat there with my coffee going cold thinking: how does it know what to click? What is it even seeing?
Turns out, the answer to that question is the entire reason Playwright won the browser automation war for AI agents. And it's not the answer most people expect.
What this series covers (and what it doesn't)
Quick scope check before we crack on. This series is NOT about consumer agentic browsers like Comet, Dia, BrowserOS, ChatGPT Atlas, or Google's Project Mariner. Those are interesting products, but they're for end users browsing the web with AI assistance.
This series is about something more specific: how does a software developer, working with coding agents, give those agents the right browser tools to build a fully closed loop system? Research, plan, implement, validate. That whole cycle. And the "validate" bit, in most cases, means checking what a real user would see in a real browser.
That's the lens. Every tool in this four-part series is evaluated through it. Can my agent use this to research a problem, build a solution, and then open a browser to verify it actually works from a user's perspective?
(Native app validation for React Native, Capacitor, or Swift is a separate conversation. We'll get there.)
One thing worth saying upfront: I've tried and used every tool in this series in some form. Not just read the README. Actually ran them, built things with them, hit the walls, found the workarounds. The ones that stuck in my daily workflow are dev-browser (for agent dev loops), Stagehand (for repeated workflows with caching), expect (for closed-loop test validation), and Playwright directly (for anything that needs to live in a CI pipeline). Everything else I've evaluated, formed opinions on, and moved on from or keep in the toolbox for specific situations.
Playwright: The Main Event
Right. Let's not faff about with history lessons. Playwright won. Every serious AI coding agent, Claude Code, Cursor, GitHub Copilot's agent mode, they all reach for Playwright when they need a browser. The question worth asking is why.
It's not because Playwright is the fastest browser automation tool. It isn't. Puppeteer actually beats it by 15 to 20 percent on raw Chromium tasks because Puppeteer stays closer to the Chrome DevTools Protocol wire, exchanging roughly 11KB of websocket messages where Playwright sends 326KB for the same job. That's a proper gap.
But speed on Chromium isn't the game agents are playing. Agents are playing a reliability and comprehension game. And Playwright absolutely smashes that.
The accessibility tree trick. This is the bit that matters. When Playwright's MCP server talks to an AI agent, it doesn't send screenshots. It doesn't dump raw HTML. It sends the browser's accessibility tree: a structured, semantic, text-based representation of the page. Roles, labels, states. A "Submit" button is Role: button, Name: Submit regardless of whether the CSS class is btn-primary or xK7_submit_v3_final_FINAL. An accessibility snapshot averages 2 to 5KB. A screenshot of the same page runs 100KB or more. That's a 20x to 50x difference in token cost, and for an agent burning through context windows, that's the whole ballgame.
ARIA roles and labels were designed for screen readers and assistive technology. Turns out they're perfect for AI agents too. Same problem, really: "tell me what's on this page in a way I can act on without seeing it."
Auto-wait. Playwright waits for elements to be actionable before it tries to interact with them. Assertions retry automatically until conditions are met. No sleep(2000). No polling loops. No flaky race conditions. For an agent that can't eyeball the page and think "oh, that spinner's still going, I'll wait a sec," this is everything. The agent says "click the checkout button" and Playwright handles the timing. Sorted.
Browser Contexts. Playwright can spin up isolated browser contexts that share a single browser instance. Different cookies, different storage, different sessions, all running in parallel without stepping on each other. For agents that need to test multi-user flows or run parallel scraping jobs, this is far cheaper than launching separate browser processes.
Multi-browser support. Chromium, Firefox, WebKit. One API. Puppeteer gives you Chrome and that's it. Selenium gives you everything but slowly. Playwright gives you all three major engines with the same code, and for cross-browser testing that actually matters.
| 📚 Geek Corner |
|---|
The MCP tax vs just writing Playwright: Playwright MCP is still widely used, and I get why. Microsoft shipped it in March 2025, it works with VS Code, Cursor, Claude Desktop, and GitHub's Copilot has it baked in. Snapshot Mode reads the accessibility tree, Vision Mode uses screenshots. Sounds great on paper. But here's my take: MCP is a context killer. The Playwright MCP alone burns ~15,000 tokens in tool definitions before your agent has done anything. My strong preference is to skip MCP entirely and let the agent write Playwright code directly. Agents are good at this. They can generate a goto, click, fill, screenshot script in seconds. The code is reusable, testable, and costs ~1,000 tokens instead of 15,000. Microsoft recently shipped playwright-cli which makes this even cleaner. And if you pair it with a skill (use /skill-creator to build one), you end up with a customised Playwright generation layer for your application: your selectors, your auth flow, your common patterns. That's worth more than any MCP server because it compounds over time. The agent learns your app, not a generic tool schema. |
Getting started:
# MCP (if you must, but see the MCP tax discussion above)
npx @anthropic-ai/claude-code mcp add playwright -- npx @playwright/mcp@latest
# Better: let your agent write Playwright code directly
npm install playwright
# Or: npx playwright-cli
# Best: build a custom skill with /skill-creator
# that wraps YOUR app's common Playwright patterns
Feels like: Playwright is the Swiss Army knife where every blade is sharp. Puppeteer is a really excellent scalpel, but only for one specific material.
Chrome DevTools MCP: When You Need the Full Inspector
Worth calling out separately: Chrome DevTools MCP (32.9k stars, Google's official MCP server for Chrome) gives agents direct access to Chrome DevTools. 29 tools across six categories: input automation, navigation, emulation, performance tracing, network inspection, and debugging. It's the tool you reach for when your agent needs to do more than just click buttons. Performance profiling, Lighthouse audits, memory snapshots, network request analysis, console log capture with source-mapped stack traces.
The interesting bit is it has a "slim mode" that drops from 29 tools down to 3. Which is a tacit admission that 29 tools is too many to load by default. Same MCP tax problem I keep banging on about.
There's also a performance issue worth flagging. CDP (Chrome DevTools Protocol) feels noticeably sluggish for basic click-and-navigate work compared to Playwright, and there's a proper technical reason for that. CDP simulates OS-level input: when you fire Input.dispatchMouseEvent, Chrome processes it through the full input pipeline (compositor thread, hit-testing, event dispatch, bubbling). It's simulating a real mouse click at the hardware level. Playwright often shortcuts this by injecting JavaScript that calls the element's click handler directly, skipping the compositor entirely.
On top of that, CDP has no auto-wait. You click, and if the element isn't ready, it misses or hits the wrong thing. You end up stacking manual waits (Page.loadEventFired, polling for selectors) which add up fast. Navigation is even worse: Page.navigate fires and then you're listening for separate Page.lifecycleEvent messages. Playwright bundles "navigate and wait until ready" into one call. And if you're using Chrome DevTools MCP specifically, every action goes through an extra hop: agent decides (LLM round-trip), MCP server receives tool call, translates to CDP, Chrome executes, result back through MCP, back to agent. Two network hops on top of the CDP execution.
Getting started:
npx chrome-devtools-mcp@latest
# No official skill. Consider mcporter to convert to CLI instead.
My honest take: if your agent needs performance profiling or deep debugging (tracing, memory snapshots, network watchers), Chrome DevTools MCP is the right tool for that specific job. But for the 90% case of "navigate, click, fill, screenshot, verify," you're paying a hefty context premium AND getting slower execution for capabilities you won't use. The agent can write a quick Puppeteer script to call Performance.getMetrics() directly via CDP and it'll cost a fraction of the context. Use Chrome DevTools MCP when you genuinely need the full inspector. For everything else, let the agent write the code.
dev-browser: The Agent-Native Option I Actually Use
So if Playwright is the answer, why do I keep reaching for dev-browser?
dev-browser (by Sawyer Hood) is a different beast. It's not a testing framework. It's not trying to replace Playwright for CI/CD pipelines or regression suites. It's a sandboxed browser automation tool built specifically for AI agents to muck about with web pages during development.
The architecture is clever. It runs scripts in a QuickJS WASM sandbox, meaning your agent's browser automation code has zero access to the host filesystem or network. The scripts use the full Playwright Page API under the hood, including a snapshotForAI() function that does exactly what it sounds like. But the isolation model means your agent can't accidentally rm -rf your home directory while it's trying to click a button.
The benchmarks from the dev-browser-eval suite tell the story:
| Approach | Duration | Cost | Iterations | Success |
|---|---|---|---|---|
| dev-browser | 3m 53s | $0.88 | 29 | 100% |
| Playwright MCP | 4m 31s | $1.45 | 51 | 100% |
| Playwright Skill | 8m 07s | $1.45 | 38 | 67% |
That's 30% faster and 40% cheaper than Playwright MCP for the same task, with fewer iterations. The agent figures out what it needs to do quicker because the tool was designed for how agents think, not how test engineers think.
But (and this is a proper big but) dev-browser is not the tool for repeatable test suites. It doesn't do assertions the way Playwright does. It doesn't generate CI-friendly reports. It doesn't integrate with your test runner. If you need "run these 200 tests on every PR and fail the build if something breaks," that's Playwright. If you need "let my agent open a browser, poke around, check if the thing I just built actually works," that's dev-browser.
5.3K stars on GitHub. MIT licensed. TypeScript core with a bit of Rust. Active development. I reckon it's going to keep growing because the "agent wants a browser for five minutes during development" use case is massive and Playwright MCP is overkill for it.
Getting started:
# Install the tool
npm install -g dev-browser && dev-browser install
# The skill is at github.com/SawyerHood/dev-browser/tree/main/skills
# Copy to your skills directory:
cp -r node_modules/dev-browser/skills/dev-browser ~/.claude/skills/
# Pre-approve: add "Bash(dev-browser *)" to .claude/settings.json
Bottom line: dev-browser for agent-driven development. Playwright for everything that needs to be repeatable, cross-browser, or in CI.
Vercel agent-browser: Token-Cheap, Time-Expensive (and Here's Why)
26,200 stars. Written in Rust. And I need to be honest about this one because the marketing and the lived experience don't match.
The pitch is context window efficiency. agent-browser compresses page snapshots using semantic element references (@e1, @e2) instead of full accessibility tree dumps. Vercel claims 93% context reduction, and the token savings are real. A typical snapshot is 200-400 tokens vs Playwright MCP's 15,000 tokens in tool definitions alone. Across a 6-step test, that's 1,364 tokens vs 7,779. You can fit 5.7x more test cycles in the same context budget.
But it feels slow. And after digging into the architecture, I reckon I know why.
The original version was a Rust CLI talking to a Node.js daemon over a unix socket, with Playwright driving Chrome underneath. In v0.20.0 they rewrote the daemon in pure Rust and dropped the Node.js layer entirely. Memory went from 143MB to 8MB, install from 710MB to 7MB, cold start from ~1,002ms to ~617ms. Proper engineering.
The problem is Chrome itself still takes 2-5 seconds to launch on first use. So your realistic first-command wall clock is: 617ms daemon start + 2-5 seconds Chrome launch + page navigation time. Subsequent commands on a warm daemon are 50-100ms, which is fast. But that cold start is a killer in CI or ephemeral environments where every run starts fresh.
There are also real stability issues. GitHub Issue #1113 documents orphaned headless Chrome processes that block normal Chrome from launching. Issue #1101 shows the idle timeout not being respected on Unix/macOS. Issue #1035 shows the daemon hanging on Linux server environments. These aren't edge cases if you're running agents at any scale.
| 📚 Geek Corner |
|---|
| The "3.5x faster" claim is about LLM planning, not browser speed. The Vercel blog post ("We removed 80% of our agent's tools") benchmarked their internal text-to-SQL agent, not browser automation. They went from 15 tools to 2, and task completion dropped from 274.8 seconds to 77.4 seconds. The speedup came from the LLM making fewer decisions per turn (fewer tools = less confusion = faster planning), not from faster browser execution. Sample size was 5 queries, self-selected. Against a lean Playwright setup with 3-5 tools, the speed advantage would likely be negligible. The token savings, however, are genuine and well-measured: 82.5% fewer characters in responses, 37% fewer total tokens per task. Token-cheap and time-cheap are orthogonal problems. Stagehand's caching is the approach that actually reduces wall-clock time, replaying actions at <100ms without calling the LLM at all. |
No anti-bot capabilities. For protected sites you need Browserbase or similar. Windows has documented socket issues (#398). And there are no independent wall-clock benchmarks from anyone outside Vercel. Every performance number in every blog post traces back to Vercel's own measurements.
Getting started:
# Install
npm install -g agent-browser
# Or: brew install agent-browser
# Or: cargo install agent-browser
# Official skill at vercel-labs/agent-browser/skills/
Bottom line: agent-browser is the right choice when your binding constraint is context window size and you can tolerate the wall-clock cost. The token efficiency is real and well-engineered. The speed claims are marketing that conflates LLM planning improvements with browser execution speed. If wall-clock time matters, let the agent write Playwright code directly or use Stagehand's caching.
Stagehand: Playwright with a Learning Layer
Right. So Stagehand (21.8k stars) is Browserbase's TypeScript SDK and it does something properly clever. It wraps Playwright with three AI primitives: act() (do something on the page), extract() (pull structured data out), and agent() (let the LLM figure out a multi-step flow). On paper that sounds like yet another Playwright wrapper. It isn't.
The v3 release dropped the Playwright dependency entirely and went CDP-native. Direct Chrome DevTools Protocol, no middleware. The result is 44% faster than v2 on their benchmarks, and the architecture is cleaner for it. Fewer moving parts, fewer things to break.
But the real trick, the bit that made me sit up, is the caching system. First run: the LLM plans and executes actions, same as any AI browser tool. Stagehand records every action it takes. Subsequent runs: it replays those cached actions at sub-100ms latency without making a single LLM call. Zero tokens burned. That's the economic insight nobody talks about enough. Every other tool in this post charges you LLM inference on every run. Stagehand charges you once, then replays for free. For repetitive workflows (login flows, checkout tests, data extraction pipelines), the cost curve is completely different.
The catch is the business model. The Stagehand SDK itself is MIT licensed, free, use it wherever you like. But Browserbase, the cloud browser infrastructure that makes it dead simple to run at scale (session management, proxy rotation, persistent contexts), that's paid. The SDK works locally with your own Chrome too. You just lose the managed infra. Fair trade, honestly.
Getting started:
# Stagehand SDK (quickstart scaffold)
npx create-browser-app
# Browserbase has 3 official skills: browser, browserbase-cli, functions
# Install via:
npx skills add browserbase/skills
# Or in Claude Code:
# /plugin marketplace add browserbase/skills
Bottom line: If you're in TypeScript and you want Playwright-level reliability with an AI layer that actually learns from previous runs, Stagehand is the one to look at. The caching system is the differentiator. First run costs tokens, every subsequent run is basically free.
The "Still Relevant?" Section
Three tools that come up in every browser automation conversation. Two of them shouldn't, at least not for agent work.
Puppeteer is still relevant. Full stop. If you're doing Chrome-specific performance work, profiling, coverage reporting, or sending raw CDP commands, Puppeteer is closer to the metal and 15 to 20 percent faster than Playwright on Chromium. Google maintains it. It connects directly to Chrome's DevTools Protocol, and for things like Performance.getMetrics() or network traffic monitoring, that direct CDP access matters. Where it falls short for agents: Chrome only, no accessibility tree abstraction out of the box, and no auto-wait. You're writing the retry logic yourself. For agent work, Playwright is the better pick. For Chrome-specific performance analysis, Puppeteer still earns its keep. About 89K stars on GitHub, Apache 2.0 license, very much alive.
npm install puppeteer
# No official skill. Agent writes Puppeteer code directly.
Selenium is a different story. It's still the most widely deployed browser automation tool in the world, and it's still getting investment (Selenium 5 brings WebDriver BiDi for real-time event streaming). But the WebDriver protocol is an HTTP request-response cycle for every single browser action. Click? HTTP request. Type? HTTP request. Check visibility? HTTP request. That latency overhead adds up fast, and for an agent that might execute hundreds of actions in a session, it's painful. Teams report spending 40 to 70 percent of their automation effort just maintaining existing Selenium tests. For new agent-driven work, I can't recommend it. For legacy enterprise suites that already exist and work, ripping them out is probably not worth the effort either. That's the honest answer.
Cypress runs inside the browser, not outside it. That's its party trick for developer testing (direct DOM access, time travel debugging, automatic waiting within its own execution model) and it's also the reason it's rubbish for agents. You can't visit different domains in the same test (same-origin policy). You can't run multiple browser instances. You can't easily hook an external LLM into a browser process that the test framework itself is running inside of. Cypress is brilliant for developers writing their own tests interactively. It's architecturally wrong for agents that need to control browsers from the outside. Cypress knows this, incidentally. They're investing in AI features (Cypress Studio AI) but they're about helping humans write Cypress tests, not about letting agents drive browsers.
The Dodgy Stuff: Stealth Tools
Sometimes you need to automate a browser and you'd rather the website didn't know about it. I'm not here to judge. Maybe you're scraping your own data from a service that doesn't offer an API. Maybe you're doing competitive research. Maybe you're testing your own anti-bot defences. Whatever. Two tools worth knowing about.
Patchright: Playwright, But Sneaky
Patchright is a fork of Playwright with one job: don't get detected as an automated browser.
The core problem it solves is called the Runtime.enable leak. When normal Playwright talks to Chrome, it sends a CDP command called Runtime.enable that lets it manage JavaScript execution contexts. Anti-bot systems from Cloudflare, DataDome, Kasada, and Akamai all specifically look for this command. If they see it, you're flagged as a bot. Game over.
Patchright patches this out. Instead of using Runtime.enable, it executes JavaScript through isolated ExecutionContexts with unknown IDs. The bot detection systems can't see the telltale CDP command because it never fires.
| 📚 Geek Corner |
|---|
The 22 patches: Patchright applies 22 patches comprising roughly 5,856 lines of modifications to Playwright's source code using AST manipulation. Beyond the Runtime.enable fix, it removes --enable-automation from Chrome's launch flags, adds --disable-blink-features=AutomationControlled to hide navigator.webdriver, and disables Console.enable entirely (trading debugging for stealth). It can also interact with closed Shadow DOM elements using standard locators, which vanilla Playwright can't do. The result passes detection tests from Cloudflare, Kasada, Akamai, Fingerprint.com, and CreepJS. Drop-in replacement for Playwright: same API, same code, just swap the import. Chromium only though. No Firefox, no WebKit. And some Playwright tests fail because the patches change internal behaviour. 2.8K stars, v1.58.0 as of March 2026, implementations in Python, Node.js, and .NET. |
Getting started:
pip install patchright # Python
npm install patchright # Node.js
# No official skill. Drop-in Playwright replacement, same API.
Scrapling: The Full-Stack Stealth Scraper
Scrapling takes a different approach. Where Patchright patches Playwright to avoid detection, Scrapling builds stealth into the entire request pipeline from the ground up.
It's a Python framework with three fetcher tiers. The basic Fetcher does fast HTTP requests with TLS fingerprint spoofing (it mimics real browser TLS handshakes at the transport layer, which is lower in the stack than anything Playwright touches). The DynamicFetcher uses Playwright/Chrome for pages that need JavaScript rendering. And the StealthyFetcher combines browser automation with anti-detection features to handle Cloudflare Turnstile and similar protections out of the box.
The clever bit is adaptive element tracking. Scrapling learns from website changes and automatically relocates elements when pages update. If a site redesigns its layout, Scrapling adjusts its selectors without you rewriting anything. That's useful for long-running scraping jobs against sites that change frequently.
It also has a built-in MCP server for AI-assisted scraping, so agents can use it directly. 92% test coverage. Full async support. Spider framework with Scrapy-like APIs for concurrent crawling. MIT licensed.
Getting started:
pip install scrapling
# Official skill:
clawhub install scrapling-official
# Or: npx skills add D4Vinci/Scrapling:main
When would you pick Scrapling over Patchright? If you're doing scraping at scale and need the full pipeline (request management, session rotation, proxy support, adaptive parsing), Scrapling is the more complete package. If you just need Playwright but undetected, Patchright is simpler.
Lightpanda: The Speed Problem Nobody's Solved Yet
This one needs a proper mention even though I haven't run it in production yet. Lightpanda is building a browser engine from scratch in Zig, specifically optimised for headless automation and AI agents. Not a wrapper around Chromium. Not a fork. A ground-up browser engine designed to be fast.
Why does this matter? Because all my regression test suites run for hours. Even headless Playwright is slow. You're spinning up a full Chromium process with all the rendering overhead, compositor threads, GPU process, extension subsystems. For headless automation where nobody is looking at the screen, 90% of that machinery is dead weight.
Lightpanda claims to be dramatically faster by stripping out everything a headless agent doesn't need. No GPU compositor. No extension runtime. Minimal rendering pipeline. Just the DOM, JavaScript execution, network stack, and the bits you actually test against.
It's early. The engine doesn't support the full web platform yet. Complex SPAs and heavy JS frameworks might hit gaps. But the direction is right. The current approach of "take a browser designed for humans looking at screens and run it without the screen" is fundamentally wasteful for agent workloads. Someone was always going to build a purpose-built headless engine. Worth watching.
PinchTab and the Two Browser Problems
PinchTab is a 12MB Go binary that gives AI agents browser control via plain HTTP. Token-efficient (800 tokens per page snapshot vs thousands for screenshots), accessibility-first element references, multi-instance parallel Chrome, headless or headed. Any agent that can make HTTP calls can use it. No MCP needed.
Here's the thing I keep coming back to with tools like PinchTab. There are actually two completely different browser automation problems, and people keep conflating them.
Problem 1: The dev inner loop. You're engineering. Your agent writes code, you need it to check the result in a browser, confirm something rendered correctly, validate a flow. Speed matters here because you're waiting. The agent is waiting. The tighter this loop, the faster you ship. dev-browser, agent-bridge, PinchTab, and Playwright all serve this use case. PinchTab's token efficiency is nice here but on paper I don't see it making a massive difference to my actual bottleneck, which is Playwright being slow (see Lightpanda above).
Problem 2: Fully agentic task completion. "Go fill out this waitlist form." "Book these train tickets." "Apply to these 20 job postings." There's no human in the loop. The agent needs to complete the task end-to-end, handle CAPTCHAs, deal with unexpected modals, retry on failure. Speed is secondary. Precision and fault tolerance are everything. You'd rather it takes 30 seconds and succeeds than takes 5 seconds and fails.
For problem 2, Claude for Chrome is still the best fit I've found. It's not the fastest. It's not the most token-efficient. But it runs in your actual browser with your actual cookies and extensions, it handles the weird edge cases that trip up headless tools, and when something goes sideways it recovers better than anything else I've tried. The resilience matters more than the speed when the goal is "complete this task without me watching."
PinchTab is solving problem 1 well. Problem 2 needs a different tool.
Building Your Own Closed Loop
Here's a pattern that doesn't get enough airtime. All the tools above launch a separate browser. Playwright spins up a new Chromium instance. dev-browser does the same in a sandbox. agent-browser fires up its own headless Chrome. That's fine for testing and scraping. But for local development, where you've already got your app running in your browser with your auth cookies and your dev server hot-reloading, launching a second browser is bonkers. You're paying 2-5 seconds of cold start to get a browser that doesn't know anything about your running app.
The debug-bridge pattern flips this on its head. Instead of the agent launching a browser, you connect the agent to YOUR already-running browser. CDP over WebSocket. The agent sees your DOM, your console logs, your network requests, in real time. Zero cold start. The app is already there.
agent-bridge is a WebSocket relay that sits between your AI agent and your running webapp. The architecture is dead simple: Agent <-> CLI Server (localhost:4000) <-> Your Webapp. The agent gets a live feed of UI tree snapshots, DOM state, console output, and network activity. It can send commands back. Bidirectional, real-time, and the agent never needs to launch a browser at all. Change your code, the dev server hot-reloads, and the agent sees the result instantly. Closed loop sorted, no context switching.
OpenClaw browser relay takes a slightly different angle. It's a Chrome extension plus gateway combo. Install the extension in your browser, point it at the gateway, and now external agents can route browser control through it. Works locally or remotely through connected nodes. Handy if you've got a team setup where agents need access to browsers running on different machines.
Why does this pattern beat Playwright MCP for local dev? Let me count the ways. Zero cold start (app already running). Real-time bidirectional visibility (agent sees network requests and console errors as they happen, not after a page load). Token-cheap (structured text, not screenshots, not 15,000 tokens of tool definitions). And you control what's exposed. Bake in your auth flow, your selectors, your common patterns. The agent gets faster at YOUR app specifically.
The contrast with Playwright MCP is stark. Playwright launches a separate browser (2-5s cold start). No visibility into your running dev server. Agent works in isolation from your dev workflow. And you're burning 15,000 tokens in tool definitions before the agent has clicked a single button.
Here's my strong recommendation: in Claude Code, use /skill-creator to wrap your debug-bridge setup into a reusable skill. Codex, Copilot, and other agentic tools have their own equivalent skill creation mechanisms. The point is identical regardless of tool: codify your app's browser patterns (auth flow, common selectors, test scenarios) into something the agent can reuse without you explaining it every session. This compounds over time. The agent gets faster at YOUR app, not just at generic browsing.
Getting started:
# agent-bridge
npm install -g agent-bridge
agent-bridge start --port 4000
# Add the client snippet to your webapp's dev entry point:
# <script src="http://localhost:4000/agent-bridge-client.js"></script>
# Then build a skill with /skill-creator that wraps
# your app's common debug-bridge patterns
Bottom line: If your main use case is "agent verifies the thing I just built in my already-running app," skip the separate browser entirely. agent-bridge or a debug-bridge pattern gives you a tighter loop, faster feedback, and cheaper token costs than any of the launch-a-browser tools.
The Comparison Table
| Tool | Best For | Multi-Browser | Agent-Ready | Stealth | Speed | License |
|---|---|---|---|---|---|---|
| Playwright | Testing + Agent automation | Yes (3 engines) | Excellent (MCP) | No | Fast | Apache 2.0 |
| Chrome DevTools MCP | Debugging + Performance profiling | Chrome only | Good (29 tools / slim 3) | No | Fast | Apache 2.0 |
| dev-browser | Agent dev workflows | Chromium | Built for it | No | Fastest | MIT |
| agent-browser | Token-constrained agents | Chromium | Good (CLI) | No | Slow (LLM bound) | OSS |
| Puppeteer | Chrome perf/CDP work | Chrome only | Decent | No | Fastest (Chrome) | Apache 2.0 |
| Selenium | Legacy enterprise suites | Yes (all) | Poor | No | Slow | Apache 2.0 |
| Cypress | Developer interactive testing | Limited | Poor | No | Fast (in-browser) | MIT |
| Stagehand | AI-augmented Playwright (caching) | Chromium (CDP) | Excellent (3 skills) | No | Fast (cached: <100ms) | MIT |
| agent-bridge | Local dev closed loop | Your browser | Built for it | No | Instant (no launch) | MIT |
| Patchright | Anti-detection automation | Chromium only | Same as Playwright | Yes | Same as Playwright | Apache 2.0 |
| Scrapling | Stealth scraping at scale | Via fetcher tiers | Yes (MCP) | Yes | Varies by fetcher | MIT |
| Lightpanda | Speed-critical headless automation | Own engine | Early | No | Very fast (claimed) | AGPL-3.0 |
| PinchTab | Token-efficient agent browser control | Chrome | Yes (HTTP API) | No | Fast | MIT |
| Claude for Chrome | Agentic task completion (forms, bookings) | Chrome | Built for it | No | Slow (resilient) | Proprietary |
Decision Tree: "I Need Browser Automation"
Start here. What are you actually trying to do?
"I've got agentic systems or augmented engineering building code and I need a closed loop where the agent verifies its own work in a browser." dev-browser. This is what I use. It's faster and cheaper than Playwright MCP, sandboxed so agent-generated code can't escape, and built specifically for this workflow. The agent writes code, opens a browser, checks it works, moves on. Closed loop sorted.
"I'm building an AI agent that needs to interact with web pages more broadly." Playwright with MCP. The accessibility tree integration, auto-wait, and cross-browser support make it the right default for general browser automation.
"I need Chrome-specific performance profiling or raw CDP access." Puppeteer. It's closer to the metal and faster for Chrome-only work.
"We have 10,000 existing Selenium tests and they work." Keep them. Migrate new work to Playwright. Don't rewrite working tests for the sake of it.
"I'm writing tests interactively as a developer and I want the best DX." Cypress is still good at this. Just know it won't work for agent-driven automation.
"The website I'm automating has bot detection." Patchright if you just need Playwright minus the fingerprint. Scrapling if you need the full scraping pipeline with TLS-level stealth.
"I need to scrape at scale with proxy rotation, adaptive selectors, and anti-bot bypass." Scrapling. It's built for exactly this use case.
"I need an agent to complete a real-world task in a browser without me watching." Claude for Chrome. Precision and fault tolerance beat speed when the goal is task completion, not development speed.
"My headless test suites are too slow and I need a faster engine." Keep an eye on Lightpanda. Purpose-built headless engine in Zig. Early but the direction is right.
What's Coming in Part 2
Low-level tools are the foundation. But most people building AI agents don't write raw Playwright code. They use frameworks that wrap these tools into higher-level abstractions: Stagehand, Browser Use, AgentQL, LaVague. Some of them are brilliant. Some of them are solving problems that don't exist anymore.
Part 2 covers the frameworks and SDKs, the middle layer of the browser tools stack. We'll look at what they add on top of Playwright, whether the abstraction is worth the overhead, and which ones are actually shipping production-grade agent experiences versus which ones are demo-ware with nice landing pages.
If you want the short version: the stack is Playwright at the bottom, a framework in the middle, and an orchestration platform at the top. Part 1 was the bottom. Part 2 is the middle. Crack on to Part 2 when you're ready.