Occupancy Math on the AMD MI355X: A From-First-Principles Guide
Summary
A from-first-principles guide to occupancy on AMD MI355X (CDNA4). It derives occupancy from four hardware limiters (VGPRs, SGPRs, LDS, and workgroup/barrier slots), explains the private-vs-shared resource split, and walks through practical examples showing how occupancy interacts with Little’s Law. The article emphasizes that maximizing occupancy is not the same as maximizing kernel throughput and provides a workflow for kernel tuning focused on keeping the matrix core fed rather than chasing a percentage.