Integer Quantization: Deep Dive
Summary
This article provides a foundational deep-dive into integer quantization for neural networks, covering why quantization matters (memory, energy, throughput), the math behind scale and zero-point, and how quantized computations are executed on MAC units. It compares PTQ and QAT, discusses per-tensor vs per-channel vs per-block granularity, and includes practical equations and visuals to illustrate the concepts.