Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

February 4, 2026 at 14:33

Quality: 8/10 Relevance: 9/10

Summary

This arXiv paper proposes a method to achieve constant cost per token in self-attention by using a symmetry-aware Taylor approximation. It decomposes the Taylor expansion into symmetric tensor products to map queries and keys into a minimal polynomial-kernel feature basis, enabling fixed per-token computation that scales with head size. The work discusses implementation details, empirical validation, and potential implications for reducing memory and energy requirements in large-scale Transformer models.

Read Original Article