LLMs Corrupt Your Documents When You Delegate
Summary
The arXiv paper investigates how large language models (LLMs) perform in delegated workflows and finds that current models corrupt documents during extended interactions. Through the DELEGATE-52 framework across 52 domains and 19 models, the study shows an average 25% content degradation, with agentic tool use not improving results; degradation worsens with larger documents and distractors. The work highlights reliability concerns for AI-assisted document editing and delegation.