DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Introducing RadixAttention to Trellis

Quality: 8/10 Relevance: 9/10

Summary

Trellis introduces RadixAttention to accelerate LLM inference on commodity hardware by caching KV embeddings and sharing prefixes across requests. The approach uses a block-paged KV cache and a radix-tree-based prefix caching to reduce recomputation and memory usage, with benchmarks showing 30-40% faster performance and lower memory usage. The post includes design notes, benchmarks, and a call for feedback.

🚀 Service construit par Johan Denoyer