Task Failed Successfully: Saturating NIC and Disk Bandwidth
Summary
The article documents a detailed HPC performance investigation where a system saturates a NIC and disks using io_uring, RDMA, and AI-assisted coding. It walks through a sequence of experiments to identify bottlenecks, from per-I/O memory preparation costs to the dramatic impact of 4 KiB page translation (dTLB misses). The final insight shows hugepages as a practical remedy to approach NIC saturation, with careful measurements and flamegraphs to support the conclusions.