How we built a real-world benchmark for AI code review
Summary
Qodo introduces Code Review Benchmark 1.0, a scalable methodology that injects defects into real merged PRs to evaluate AI-powered code review on both bug detection and code quality. The post details the methodology, evaluation setup across 7 tools, and shows Qodo achieving the best recall with competitive precision, under Precise and Exhaustive configurations. This benchmark aims to provide a more realistic, enterprise-relevant evaluation framework for AI code review tools.