N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

April 13, 2026 at 21:54

Quality: 8/10 Relevance: 9/10

Summary

N-Day-Bench evaluates how frontier language models can identify real-world vulnerabilities (N-Days) disclosed after their knowledge cut-off, using a standardized harness and monthly-updated test cases. The benchmark publishes traces and a leaderboard, highlighting model differences in vulnerability discovery and offering insights for AI-assisted security tooling. This reflects current capabilities and has implications for security teams evaluating AI-based code review and vulnerability detection.

Read Original Article