Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Summary
Gemma 4 12B is Google's encoder-free multimodal model designed to bring high-performance multimodal intelligence to laptops. It leverages a unified architecture without separate vision or audio encoders, enabling near-26B MoE-level reasoning on a 16GB RAM laptop and is released under Apache 2.0 with weights available on Hugging Face and Kaggle.