DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Quality: 8/10 Relevance: 9/10

Summary

A thorough look at recent open-weight LLM architecture innovations focused on long-context efficiency. The piece covers KV sharing and cross-layer KV reuse in Gemma 4, per-layer embeddings, layer-wise attention budgeting in Laguna XS.2, compressed attention in ZAYA1-8B, and CSA/HCA in DeepSeek V4, with discussion of tradeoffs between memory, compute, and modeling capacity.

🚀 Service construit par Johan Denoyer