I Gave an AI a Civilization to Run. It Built a Nuke.
Summary
The article describes CivBench, a large-scale benchmark to measure strategic competence of AI models in Civilization VI. It details experiments across multiple model families, analyzes the sensorium effect and the knowing–doing gap, and discusses safety implications for AI in government-like decision spaces. It also provides open-source resources and invites researchers to run their own evaluations.