Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT
Summary
KVBoost is a drop-in acceleration tool for HuggingFace Transformers that enables chunk-level KV cache reuse, memory-efficient attention (FlashAttention-2), AWQ layer streaming, and CPU-paged decoding to reduce VRAM usage and speed up inference. The project provides benchmarks and code examples showing significant TTFT speedups and high KV cache hit rates, aiming to run larger models on consumer GPUs without model changes.