News publishers limit Internet Archive access due to AI scraping concerns

January 28, 2026 at 00:00

Quality: 8/10 Relevance: 9/10

Summary

The Nieman Lab article discusses how major news publishers are limiting Internet Archive access and API usage to curb AI data scraping, highlighting actions by The Guardian, The New York Times, Reddit, and Gannett. It analyzes robots.txt usage, licensing tensions, and the broader implications for AI training data, archiving, and publisher control over their content.

Read Original Article