Mercury 2: The fastest reasoning LLM, powered by diffusion
Summary
Mercury 2 introduces diffusion-based real-time reasoning to deliver production-ready AI with sub-second latency. It claims 1,009 tokens/sec on NVIDIA Blackwell GPUs, 128K context, and API compatibility, with use cases across coding, agentic loops, voice, and search pipelines. The article emphasizes speed, cost, and deployment considerations for latency-sensitive applications.