TurboQuant: Redefining AI efficiency with extreme compression
Summary
Google Research unveils TurboQuant, a trio of compression algorithms (TurboQuant, QJL, PolarQuant) that enable massive vector and KV-cache compression with zero accuracy loss. The approach delivers up to 8x speedups in attention computation and at least 6x memory reduction for KV caches, enabling faster vector search and scalable AI workloads without retraining.