LLM Skirmish: Real-Time Strategy Benchmark for AI Agents
Summary
LLM Skirmish is a real-time strategy benchmark where LLMs write and execute code to control in-game agents in 1v1 matches. The study examines in-context learning across five rounds, using an open-source coding harness (OpenCode) and a dockerized, isolated environment to compare model performance, strategy evolution, and cost efficiency. It discusses challenges such as context rot and the balance between script simplicity and complexity.