Compare AI models by executing identical tasks simultaneously.

Which AI model performs best for browser automation tasks? The LLM Arena template facilitates side-by-side comparison of AI models by executing identical tasks across each simultaneously and ranking results by performance.
You'll need four API keys for full functionality:
Note: You can run with just Browser Use, but for true comparison configure all four providers.
uvx browser-use init --template llm-arena
To ensure fair comparisons, this template uses the @sandbox() decorator to run
tasks in Browser Use Sandboxes.
By default, the template compares: