Google’s latest DiffusionGemma open AI model comes with a 4x speed boost
Summary
Google's DiffusionGemma is a 26B Mixture of Experts open-model that activates 3.8B parameters during inference and can generate text in a large parallel block, delivering roughly 4x speedups on local GPUs. The article explains how diffusion-based text generation differs from autoregressive models, the hardware considerations, and licensing (Apache 2.0) with availability via Hugging Face. It also discusses the trade-offs and potential use cases for local AI deployment.