Orthrus-Qwen3: up to 7.8×tokens/forward on Qwen3, identical output distribution
Summary
Orthrus presents a memory-efficient dual-view diffusion approach to parallelize token generation for LLMs, achieving significant speedups while preserving exact output fidelity. The GitHub project provides a model zoo based on Qwen3 with reported up to 7.8× speedups, a shared KV cache for zero memory overhead, and only a fraction of parameters tuned. This is a valuable case study for AI tooling and efficient inference strategies.