RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8
Summary
A detailed guide to a dual-GPU setup (RTX 5080 + RTX 3090) for local AI inference using Qwen 3.6 27B with Q8 quantization, achieving 80+ tokens per second. The post covers BIOS and kernel configurations, driver choices, multi-GPU orchestration via llama.cpp, and practical verification steps and performance logging. It also includes specifics on hardware wiring, PCIe topology, and debugging tips for cross-GPU setups.