BROWSER USE

Products:
- [Browser Harness](https://browser-harness.com)
- [Stealth Browsers](https://browser-use.com/stealth-browsers)
- [Browser Use Box](https://browser-use.com/bux)
- [Web Agents](https://browser-use.com/web-agents)
- [Custom Models](https://browser-use.com/custom-models)
- [Proxies](https://browser-use.com/proxies)

[Pricing](https://browser-use.com/pricing)
[Blog](https://browser-use.com/posts)
[Cloud Docs](https://docs.cloud.browser-use.com)
[Open Source Docs](https://docs.browser-use.com)

[GET STARTED](https://cloud.browser-use.com)
[GITHUB](https://github.com/browser-use/browser-use)

---

# Benchmarks - Browser Use

These benchmarks compare Browser Use against other web automation frameworks and cloud browser providers on accuracy and stealth across real-world websites.

[View benchmarks page](https://browser-use.com/benchmarks)
[Cloud Platform](https://cloud.browser-use.com)

---

## Web Agent Benchmarks

### Online-Mind2Web

300 tasks across 136 live websites — shopping, finance, travel, government, and more. All 300 tasks used, no tasks removed. All tasks run on live websites. Scored by an agentic judge built on Claude Agent SDK, aligned with human judges. Testing conducted March 2026.

Benchmark source: https://github.com/OSU-NLP-Group/Online-Mind2Web
Blog post: https://browser-use.com/posts/online-mind2web-benchmark

| Agent | Accuracy |
|-------|----------|
| Browser Use Cloud (v3) | 97% |
| ABP + Opus 4.6 | 86% |
| TinyFish | 81% |
| Navigator | 78% |
| Gemini CUA | 69% |
| Stagehand (Gemini 2.5 CU) | 65% |
| OpenAI Operator | 61% |
| Sonnet 4.0 CU | 61% |
| Stagehand (Sonnet 4.5) | 55% |

### BU Bench V1

100 hand-selected tasks from WebBench, Mind2Web, GAIA, BrowseComp, and 20 custom page interaction challenges. All tasks are hard but verified completable. Each task run multiple times across different LLMs and agent settings. Scored by LLM judge (Gemini 2.5 Flash), 87% alignment with 200 hand-labeled traces. Testing conducted January 2026, updated March 2026.

Benchmark source: https://github.com/browser-use/benchmark
Blog post: https://browser-use.com/posts/ai-browser-agent-benchmark

| Model | Accuracy |
|-------|----------|
| Browser Use Cloud (bu-ultra) | 78.0% |
| OSS + ChatBrowserUse-2 | 63.3% |
| claude-opus-4-6 | 62.0% |
| gemini-3-1-pro | 59.3% |
| claude-sonnet-4-6 | 59.0% |
| gpt-5 | 52.4% |
| gpt-5-mini | 37.0% |
| gemini-2.5-flash | 35.2% |

---

## Stealth Benchmarks

### BrowserBench

Third-party benchmark created by Halluminate. 296 tasks across antibot-protected sites. Includes lower-security sites, so scores are generally higher than the BU Stealth Benchmark. Browser Use ran BrowserBench across all providers. Scoring is pass/fail based on whether the agent was blocked by antibot protection.

Benchmark source: https://github.com/Halluminate/browserbench
Blog post: https://browser-use.com/posts/stealth-benchmark

| Provider | Success Rate |
|----------|-------------|
| Browser Use Cloud | 84.8% |
| Hyperbrowser | 76.4% |
| Anchor | 76.0% |
| Steel | 73.3% |
| Browserbase | 70.3% |

### BU Stealth Benchmark

Built from 300,000 real production security check events. 71 high-security websites across Cloudflare, PerimeterX, Datadome, Akamai, reCaptcha, and others. Simple 3-step tasks per site — if it fails, the browser got blocked. Each provider tested multiple times with the same agent (bu-2-0) and model. Scored by LLM judge (gemini-2.5-flash) on whether the agent was blocked. Page load failures count as blocks. Controls: Headless Chromium scored 2%, Headful Chromium scored 50%.

Benchmark source: https://github.com/browser-use/benchmark
Blog post: https://browser-use.com/posts/stealth-benchmark

| Provider | Bypass Rate |
|----------|------------|
| Browser Use Cloud | 81% |
| Anchor | 77% |
| Onkernel | 67% |
| Steel | 47% |
| Browserbase | 42% |
| Hyperbrowser | 40% |