browser-use: Give Your LLM a Browser and Watch It Go
Python framework that lets LLMs control browsers through a perceive-act loop. DOM-first approach, works with Claude, GPT, Gemini, and local models. 89% on WebVoyager.
The Browser Problem
Right, so here's the thing. We've had browser automation for years. Selenium, Playwright, Puppeteer. All brilliant if you know exactly what you want to click and where it lives on the page. But what if you want an LLM to just... use a browser? Like a person would? Navigate around, read what's on the screen, figure out the next step, and do it?
That's what browser-use does. It's a Python framework that gives your LLM an agentic perceive-act loop for controlling a browser. The model looks at the page (DOM-first, not screenshots), decides what to do, executes the action, then looks at the result and decides the next thing. Rinse and repeat until the task is done.
How It Works
The approach is DOM-first. Instead of taking a screenshot and feeding pixels to a vision model, browser-use extracts the DOM structure and presents it as text the model can reason over. This keeps things fast and token-efficient, and it means even non-vision models can drive a browser.
It scored 89.1% on the WebVoyager benchmark, which is a proper test of whether an agent can actually complete real web tasks. Not synthetic toy problems. Real sites, real workflows.
Works with Claude, GPT-4, Gemini, and local models. You pick your LLM, browser-use handles the rest. The tradeoff is that every single step needs an LLM call. Navigate to page? LLM call. Find the button? LLM call. Click it? LLM call. Read the result? LLM call. For complex multi-step tasks, your API bill starts adding up properly. Something to keep in mind when you're planning workflows.
The project blew up to 85k+ stars on GitHub, and for good reason. It just works for a surprisingly wide range of tasks.
Getting Started
Dead simple:
pip install browser-use
There's also an official skill for Claude Code:
curl -o ~/.claude/skills/browser-use/SKILL.md https://raw.githubusercontent.com/browser-use/browser-use/main/skills/browser-use/SKILL.md
Point it at a task, let it rip, and try not to wince at the API costs.