DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Using group theory to explore the space of positional encodings for attention

Quality: 8/10 Relevance: 9/10

Summary

This Jane Street post analyzes all possible positional encodings for attention under a few natural constraints and shows the space collapses to one-parameter groups, implying most practical encodings are already in use. It derives RoPE with exponential damping and discusses ALiBi and other variants, including the impractical but theoretically allowed defective generators. It provides a rigorous framework for evaluating and selecting positional encodings in modern transformers.

🚀 Service construit par Johan Denoyer