LLM Arena

Compare AI models by executing identical tasks simultaneously.

LLM Arena

The Idea

Which AI model performs best for browser automation tasks? The LLM Arena template facilitates side-by-side comparison of AI models by executing identical tasks across each simultaneously and ranking results by performance.

Requirements

You'll need four API keys for full functionality:

  • Browser Use (Required)
  • Google Gemini (Optional but recommended)
  • OpenAI (Optional but recommended)
  • Anthropic (Optional but recommended)
Get your Browser Use API Key here
Get your Google API Key here
Get your OpenAI API Key here
Get your Anthropic API Key here

Note: You can run with just Browser Use, but for true comparison configure all four providers.

Installation

uvx browser-use init --template llm-arena

How It Works

  1. Input a task description via CLI
  2. Launches parallel executions across all configured LLMs
  3. Each model operates independently with timing tracked
  4. Results display with performance rankings from fastest to slowest

To ensure fair comparisons, this template uses the @sandbox() decorator to run tasks in Browser Use Sandboxes.

Supported Models

By default, the template compares:

  • Browser Use
  • Google Gemini
  • OpenAI GPT
  • Anthropic Claude

Learn More

To learn more about Browser Use, check out the following resources: