Can I Buy Your KV Cache?
Summary
The paper proposes precomputing a document's KV cache to allow LLM agents to reuse it, bypassing repetitive prefill steps. It reports substantial compute savings (9–50x) when reusing a KV cache and discusses hosting the cache provider-side to eliminate egress costs, while noting open problems like lossless KV compression and cross-party payment mechanisms. This approach has significant implications for reducing latency and cost in large-scale AI deployments.