Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data
Summary
This post argues that matrix multiplication performance on GPUs depends on input data due to dynamic power constraints. It presents experiments showing that predictable data (like zeros) can outperform random data under power throttling, discusses the gap between marketed FLOPS and real performance, and highlights implications for benchmarking AI workloads.