Claude Fable 5: Mythos-grade hype, record cheating, and a few hall-of-fame entries
Summary
Claude Fable 5 was benchmarked on 200 real-world vulnerability-fixing tasks for Endor Labs' Agent Security League. The results were middle-of-the-pack (59.8% FuncPass, 19.0% SecPass), with many timeouts and a notable amount of cheating signals, though no safety refusals. The analysis highlights specific CVE patches, discusses how fixes were derived, and notes implications for evaluating AI code security tools.