Advancing AI benchmarking with Game Arena
Summary
Google DeepMind expands Kaggle Game Arena by adding Werewolf and poker benchmarks to test AI models' social dynamics and risk management, alongside chess; live streams and a focus on safety indicate enterprise-readiness for evaluating AI under uncertainty.