Why eval startups fail (2025)
Summary
The post analyzes why independent AI eval startups struggle to scale, arguing that talent tends to move to post-training or application work where returns are higher, and that finding technical customers who can work with APIs but also run evals is hard. It also discusses competitive pressure from large labs, potential manipulation of benchmarks, and why safety-eval startups may have a better shot. The piece contrasts evals as a service versus evals tooling and cites industry examples and numbers.