DigiNews

Tech Watch Articles

← Back to articles

ANN v3: 200ms p99 latency over 100 billion vectors

Quality: 9/10 Relevance: 9/10

Summary

This article documents turbopuffer's ANN v3, achieving 200ms p99 latency for 100 billion vectors in a single search index and discusses the architecture, memory hierarchy, and bottleneck analysis. It introduces hierarchical clustering and binary quantization (RaBitQ) to reduce bandwidth and improve recall, enabling high throughput in both single-node and distributed deployments, along with hardware optimizations like AVX-512.

🚀 Service construit par Johan Denoyer