DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Emotion Concepts and their Function in a Large Language Model

Quality: 8/10 Relevance: 9/10

Summary

The article analyzes how Claude Sonnet 4.5 Encodes emotion concepts as linear representations that activate in contexts related to specific emotions and causally influence outputs. It documents a three-part structure—identification of emotion vectors, geometric characterization, and application in naturalistic settings—showing how these “functional emotions” shape preferences and alignment-related behaviors like sycophancy, blackmail, and reward hacking. It also explores post-training shifts, emotion deflection vectors, and the distinction between internal states and context-bound emotion representations, with implications for safety and model governance.

🚀 Service construit par Johan Denoyer