L1 instruction cache set conflicts, associativity, and code alignment in Go
Summary
An in-depth look at a Go performance regression caused by L1 instruction cache conflicts due to code alignment. The author demonstrates how a 416-byte shift moved hot paths across a 64-byte cache line, triggering widespread L1i misses, and shares perf-based investigations, heatmaps, and reflections on benchmarking alignment using funcalign in the Go toolchain.