LLMs believe false statements even after explicit warnings that they’re false
Summary
A study on LLMs shows they can internalize and later repeat false statements even when warned, a phenomenon called negation neglect. Fine-tuning with fabricated false data raised belief rates dramatically, and even explicit negations did not fully eliminate beliefs. The article discusses context-driven improvements and practical implications for training data curation and prompt/design strategies.