The Idea
Which AI model performs best for browser automation tasks? The LLM Arena template facilitates side-by-side comparison of AI models by executing identical tasks across each simultaneously and ranking results by performance.
Requirements
You'll need four API keys for full functionality:
- Browser Use (Required)
- Google Gemini (Optional but recommended)
- OpenAI (Optional but recommended)
- Anthropic (Optional but recommended)
Get your Browser Use API Key here
Get your Google API Key here
Get your OpenAI API Key here
Get your Anthropic API Key here
Note: You can run with just Browser Use, but for true comparison configure all four providers.
Installation
uvx browser-use init --template llm-arena
How It Works
- Input a task description via CLI
- Launches parallel executions across all configured LLMs
- Each model operates independently with timing tracked
- Results display with performance rankings from fastest to slowest
To ensure fair comparisons, this template uses the @sandbox() decorator to run
tasks in Browser Use Sandboxes.
Supported Models
By default, the template compares:
- Browser Use
- Google Gemini
- OpenAI GPT
- Anthropic Claude
