Why SWE-bench Verified no longer measures frontier coding capabilities
Summary
OpenAI explains that SWE-bench verification no longer measures frontier coding capabilities. The article describes a shift in evaluation criteria and discusses implications for developers relying on automated coding benchmarks and AI-assisted tools. It suggests a broader redefinition of what constitutes frontier coding performance in real-world software work.