How fast is N tokens per second really?
Summary
The article examines real-world token-per-second benchmarks for LLMs, arguing that published tok/s numbers are hard to grasp without live streaming. It introduces a multi-mode benchmark interface (code, text, think, agent) and explains how tokenization and content type affect perceived throughput, offering practical speeds to test across devices from Raspberry Pi to Cerebras-class systems.