Linear Representations and Superposition
Summary
The post surveys linear representations and superposition as interpretability frameworks for LLMs. It explains embedding and unembedding spaces, concept representations, and the role of nonlinearity in managing interference, with references to Park et al. and Anthropic and notes on Llama 2 experiments.