Browser Agent Benchmark: Comparing LLM models for web automation
Summary
Browser Use releases an open-source benchmark to compare LLM models for web automation. It details 100 hard but feasible tasks drawn from existing benchmarks plus 20 custom tasks, describes a standardized judging approach, and reports results showing strong performance overall with ChatBrowserUse 2 leading the pack; it also covers replication, costs, and collaboration opportunities.