FreeJanuary 15, 2025|2 min read

browser-use: Give Your LLM a Browser and Watch It Go

browser-use is a Python framework that gives LLMs agentic browser control via a DOM-first perceive-act loop. Works with Claude, GPT-4, Gemini, and local models. 89% on WebVoyager benchmark. 85k+ GitHub stars.

Visit tool →

browser-use llm browser automation ai web agent dom-first browser control playwright alternative ai python browser framework webvoyager benchmark

The Browser Problem

Right, so here's the thing. We've had browser automation for years. Selenium, Playwright, Puppeteer. All brilliant if you know exactly what you want to click and where it lives on the page. But what if you want an LLM to just... use a browser? Like a person would? Navigate around, read what's on the screen, figure out the next step, and do it?

That's what browser-use does. It's a Python framework that gives your LLM an agentic perceive-act loop for controlling a browser. The model looks at the page (DOM-first, not screenshots), decides what to do, executes the action, then looks at the result and decides the next thing. Rinse and repeat until the task is done.

How It Works

The approach is DOM-first. Instead of taking a screenshot and feeding pixels to a vision model, browser-use extracts the DOM structure and presents it as text the model can reason over. This keeps things fast and token-efficient, and it means even non-vision models can drive a browser.

It scored 89.1% on the WebVoyager benchmark, which is a proper test of whether an agent can actually complete real web tasks. Not synthetic toy problems. Real sites, real workflows.

Works with Claude, GPT-4, Gemini, and local models. You pick your LLM, browser-use handles the rest. The tradeoff is that every single step needs an LLM call. Navigate to page? LLM call. Find the button? LLM call. Click it? LLM call. Read the result? LLM call. For complex multi-step tasks, your API bill starts adding up properly. Something to keep in mind when you're planning workflows.

The project blew up to 85k+ stars on GitHub, and for good reason. It just works for a surprisingly wide range of tasks.

Getting Started

Dead simple:

pip install browser-use

There's also an official skill for Claude Code:

curl -o ~/.claude/skills/browser-use/SKILL.md https://raw.githubusercontent.com/browser-use/browser-use/main/skills/browser-use/SKILL.md

Point it at a task, let it rip, and try not to wince at the API costs.

Share𝕏 in

Steven Gonsalvez

browser-use: Give Your LLM a Browser and Watch It Go

Comments & Reactions