GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Summary
GLM-5V-Turbo presents a native foundation model designed for multimodal agents, integrating perception into reasoning, planning, tool use, and execution. The paper highlights improvements in model design, multimodal training, RL, and integration with agent frameworks, with strong performance in multimodal coding and visual tool use.