DigiNews

Tech Watch by Johan Denoyer

← Back to articles

A Theory of Prompt Injection (and why you should study roles)

Quality: 9/10 Relevance: 9/10

Summary

The writeup discusses a theory that prompt injection arises from role confusion in LLMs. It introduces role probes (CoTness, Userness) to measure how tokens are interpreted as think, user, or tool, and demonstrates how writing style can masquerade as a role, enabling novel attacks like CoT Forgery. The piece argues for treating roles as a core research object in AI safety and outlines open questions and future directions.

🚀 Service construit par Johan Denoyer