Microsoft removes guide on how to train LLMs on pirated Harry Potter books
Summary
Ars Technica reports that Microsoft removed a blog guiding users to train LLMs on a Kaggle dataset of pirated Harry Potter texts, which was mistakenly marked public domain. The piece explores copyright implications, fair use arguments, and expert opinions on potential liability and governance for AI training materials.