Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation
Summary
This arXiv paper proposes a method to achieve constant cost per token in self-attention by using a symmetry-aware Taylor approximation. It decomposes the Taylor expansion into symmetric tensor products to map queries and keys into a minimal polynomial-kernel feature basis, enabling fixed per-token computation that scales with head size. The work discusses implementation details, empirical validation, and potential implications for reducing memory and energy requirements in large-scale Transformer models.