Sometimes it's just better to load all the data
Summary
The article argues that bulk-loading data into memory for batch processing can dramatically reduce I/O and total processing time when handling large datasets. It provides a real-world example where per-item DB queries caused hours of latency, and shows how batching by day and loading data into memory cut runtime from around 24 hours to about 15 minutes, with room for further improvement. It also covers general principles, caveats, and tool considerations for efficient batch processing.