Floating point from scratch: Hard Mode
Summary
This in-depth article chronicles a personal attempt to implement floating point arithmetic from scratch, focusing on bfloat16 versus IEEE 754, subnormals, and rounding modes. It also covers hardware verification, RTL design decisions, and Tiny Tapeout experiments, offering practical insights into precision choices for AI accelerators.