ARC-AGI-3
Summary
ARC-AGI-3 is an interactive reasoning benchmark designed to measure human-like intelligence in AI agents. It requires agents to explore environments, acquire goals on the fly, build adaptable world models, and learn continuously, with scoring based on long-horizon planning and experience-driven adaptation in 100% human-solvable environments. The platform offers replayable runs, a developer toolkit, and a UI for transparent evaluation, plus documentation and technical papers.