Google’s Gemma 4 AI models get 3x speed boost by predicting future tokens
Summary
Google's Gemma 4 open AI models gain up to 3x speed using speculative decoding (Multi-Token Prediction) to generate tokens faster with no quality loss. The approach uses smaller drafters that share caches and sparse decoding, enabling faster local inference on consumer hardware and full model verification of draft tokens; licensing is Apache 2.0 and testing shows strong speedups across devices.