PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free
Summary
The article introduces PIGuard, a prompt injection guardrail designed to reduce overdefense bias in guard models. It also presents NotInject, an evaluation dataset to measure over-defense, and claims MOF (Mitigating Over-defense for Free) training yields state-of-the-art results while remaining open-source.