Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter
Summary
The arXiv paper proposes Prefill-as-a-Service (PrfaaS), a cross-datacenter architecture that offloads long-context prefill to dedicated prefill clusters and transfers the resulting KVCache to local decode clusters. It argues that reducing KVCache size alone is insufficient for heterogeneous deployments, and introduces bandwidth-aware scheduling and cache-aware placement to improve throughput across loosely coupled datacenters. A case study with a 1T-parameter model reports substantial throughput gains with modest cross-datacenter bandwidth requirements.