The First Fully General Computer Action Model
Summary
The First Fully General Computer Action Model introduces FDM-1, a foundation model trained on an 11-million-hour video corpus to perform complex computer-use tasks, CAD workflows, and real-world driving at high frame rates. The article details a three-stage training pipeline (IDM labeling, forward dynamics prediction), a novel video encoder with long context, and an eval infrastructure capable of large-scale rollouts. It also presents demonstrations, evaluation results, and future ambitions toward scalable, general-purpose computer-use agents.