Show HN: Cua-Bench – a benchmark for AI agents in GUI environments
Summary
Cua-Bench is a benchmark suite within the open-source Cua platform for evaluating AI agents that can interact with GUIs. It provides RL environments (OSWorld, ScreenSpot, Windows Arena) and supports exporting trajectories for training, with setup and run instructions, architecture visuals, and a permissive MIT license.