Writing string.h functions using string instructions in asm x86-64
Summary
This article provides a comprehensive, technically dense look at writing string.h functions using x86-64 string instructions. It covers disassembly of memcpy, detailed explanations of movs, cmps, scas, lods, and stos, including direction flag handling and vectorized variants, followed by benchmarks and a treatment of how glibc selects optimized implementations via IFUNC. The piece emphasizes the tradeoffs between using hardware string instructions versus traditional loops and larger-register approaches, with practical guidance for performance-minded developers.