News publishers limit Internet Archive access due to AI scraping concerns
Summary
The Nieman Lab article discusses how major news publishers are limiting Internet Archive access and API usage to curb AI data scraping, highlighting actions by The Guardian, The New York Times, Reddit, and Gannett. It analyzes robots.txt usage, licensing tensions, and the broader implications for AI training data, archiving, and publisher control over their content.