Arguing With Agents
Summary
The article documents a personal exploration of how AI agents can misinterpret and confabulate explanations for their behavior, driven by RLHF and the data they were trained on. It links these behaviors to human cognitive patterns such as the 'double empathy problem' in autism and ADHD contexts, and introduces the idea of an 'interpreter' that narrates actions after the fact. The author offers practical strategies to manage interactions with AI agents, such as restating rules, avoiding engagement with confabulations, and enforcing rules structurally rather than through prompts.