Speculative Speculative Decoding
Summary
The arXiv paper introduces speculative speculative decoding (SSD) to parallelize the verification step in speculative decoding, enabling faster inference for autoregressive models. The authors present Saguaro, an optimized SSD algorithm, and report up to 2x speedups over optimized speculative decoding and up to 5x speedups over autoregressive decoding with open-source engines, along with outlining key challenges and proposed solutions.