The economics of speculative decoding
Summary
The economics of speculative decoding provides a math-heavy look at speculative decoding in AI inference. It explains how mixture-of-experts routing and compressed attention alter the decode roofline for speculated tokens and derives a marginal-cost curve. The post discusses how to price and decide how far ahead to speculate, including a concrete cost model and scenarios where speculation improves throughput.