Waypoint-1: Real-Time Interactive Video Diffusion from Overworld
Summary
Waypoint-1 is Overworld’s real-time interactive video diffusion model controlled via text, mouse, and keyboard, enabling users to generate and interact with a world in real-time. It uses a frame-causal rectified flow transformer trained on thousands of hours of video game footage, employs diffusion forcing with self-forcing to address inference dynamics, and relies on the WorldEngine inference library for low-latency streaming. The post details performance benchmarks, optimization techniques, and a build-hackathon, emphasizing open tooling and developer-oriented capabilities for interactive AI worlds.