Review: Measuring AI Ability to Complete Long Software Tasks
Summary
An analysis of the arXiv paper Measuring AI Ability to Complete Long Software Tasks, focusing on the 'time horizon' metric that tracks how long a task solvable by AI at a given success rate would take a human. The post notes that AI time horizons have doubled roughly every seven months, discusses potential biases and the implications for software engineering and AI tooling.