< BACK_TO_GRID
TEMPLATE_ID: LLM-ARENA

LLM Arena

Compare AI models by executing identical tasks simultaneously.

LLM Arena

CREATED_BY

B
Browser Use Team

APPLICABLE_PROTOCOLS

Testing & BenchmarkingBy Browser Use

CONFIGURATION

modelMultiple
>_

DOCUMENTATION

The Idea

Which AI model performs best for browser automation tasks? The LLM Arena template facilitates side-by-side comparison of AI models by executing identical tasks across each simultaneously and ranking results by performance.

Requirements

You'll need four API keys for full functionality:

  • Browser Use (Required)
  • Google Gemini (Optional but recommended)
  • OpenAI (Optional but recommended)
  • Anthropic (Optional but recommended)
Get your Browser Use API Key here
Get your Google API Key here
Get your OpenAI API Key here
Get your Anthropic API Key here

Note: You can run with just Browser Use, but for true comparison configure all four providers.

Installation

uvx browser-use init --template llm-arena

How It Works

  1. Input a task description via CLI
  2. Launches parallel executions across all configured LLMs
  3. Each model operates independently with timing tracked
  4. Results display with performance rankings from fastest to slowest

To ensure fair comparisons, this template uses the @sandbox() decorator to run tasks in Browser Use Sandboxes.

Supported Models

By default, the template compares:

  • Browser Use
  • Google Gemini
  • OpenAI GPT
  • Anthropic Claude