Alignment Whack-a-Mole: Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
Summary
The article reviews a GitHub project and its associated arXiv paper on finetuning large language models to recall verbatim copyrighted text. It details a data preprocessing pipeline, the use of excerpts from The Road by Cormac McCarthy for demonstration, multiple finetuning options (GPT-4o, Gemini-2.5-Pro, DeepSeek via Tinker), and four memorization evaluation metrics. It also discusses the ethical and copyright implications of model memorization, making it a timely read for AI researchers and practitioners concerned with safety, policy, and copyright risk.