LLM Arena

The Idea

Which AI model performs best for browser automation tasks? The LLM Arena template facilitates side-by-side comparison of AI models by executing identical tasks across each simultaneously and ranking results by performance.

Requirements

You'll need four API keys for full functionality:

Browser Use (Required)
Google Gemini (Optional but recommended)
OpenAI (Optional but recommended)
Anthropic (Optional but recommended)

Get your Browser Use API Key here

Get your Google API Key here

Get your OpenAI API Key here

Get your Anthropic API Key here

Note: You can run with just Browser Use, but for true comparison configure all four providers.

Installation

uvx browser-use init --template llm-arena

How It Works

Input a task description via CLI
Launches parallel executions across all configured LLMs
Each model operates independently with timing tracked
Results display with performance rankings from fastest to slowest

To ensure fair comparisons, this template uses the @sandbox() decorator to run tasks in Browser Use Sandboxes.

Supported Models

By default, the template compares:

Browser Use
Google Gemini
OpenAI GPT
Anthropic Claude