RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

June 13, 2026 at 00:00

Quality: 8/10 Relevance: 9/10

Summary

A detailed guide to a dual-GPU setup (RTX 5080 + RTX 3090) for local AI inference using Qwen 3.6 27B with Q8 quantization, achieving 80+ tokens per second. The post covers BIOS and kernel configurations, driver choices, multi-GPU orchestration via llama.cpp, and practical verification steps and performance logging. It also includes specifics on hardware wiring, PCIe topology, and debugging tips for cross-GPU setups.

Hardware AI News LLM & Prompting

Read Original Article