Occupancy Math on the AMD MI355X: A From-First-Principles Guide
Summary
This post provides a from-first-principles walkthrough of occupancy on AMD MI355X (CDNA4). It explains the four resource limiters (VGPRs, SGPRs, LDS, and workgroup/barrier slots), how to compute the occupancy ceiling by hand, and how granularity and per-SIMD vs per-CU budgeting affect results. The author demonstrates with MXFP8 GEMM examples that maximizing occupancy is not always optimal and argues for ILP-based strategies to keep the matrix core fed, with practical measurement guidance.