Talos: Hardware accelerator for deep convolutional neural networks
Summary
Talos presents an FPGA based hardware accelerator designed for CNN inference with deterministic cycle level performance. It eliminates runtime and software overhead, uses fixed point Q16.16, and employs time multiplexing and ROM based weights to fit on a small FPGA while achieving efficient inference. The article covers architecture decisions, data flow, and engineering lessons, offering a blueprint for hardware accelerators in edge AI and business IT deployments.