How much do amd64 microarchitecture levels help in Go?
Summary
This article investigates how amd64 microarchitecture levels (v1–v4) affect Go performance, using Roaring Bitmap benchmarks on modern hardware. It finds that enabling popcnt at v2 yields a ~43% speedup for population counts, while v3 (AVX2) provides additional gains in certain code paths; v4 often adds little, suggesting a move toward finer-grained feature detection and benchmarking on current hardware.