A Theory of Prompt Injection (and why you should study roles)

June 22, 2026 at 15:48

Quality: 9/10 Relevance: 9/10

Summary

The writeup discusses a theory that prompt injection arises from role confusion in LLMs. It introduces role probes (CoTness, Userness) to measure how tokens are interpreted as think, user, or tool, and demonstrates how writing style can masquerade as a role, enabling novel attacks like CoT Forgery. The piece argues for treating roles as a core research object in AI safety and outlines open questions and future directions.

LLM & Prompting AI Research Security

Read Original Article