DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Megakernel Qwen3.5 0.8B on RTX 3090 and DFlash 27B on RTX 3090: Local LLM Inference Benchmarks

Quality: 8/10 Relevance: 9/10

Summary

This GitHub repo showcases two hand-tuned LLM inference projects for the RTX 3090: a Megakernel for Qwen3.5-0.8B achieving high efficiency with a single CUDA dispatch, and a DFlash DDTree port for Qwen3.5-27B delivering up to 207 tok/s. It emphasizes local AI, reproducible benchmarks, and an open-source MIT license, with setup steps and benchmark results to replicate on consumer hardware.

🚀 Service construit par Johan Denoyer