🦥Unsloth Dynamic 2.0 GGUFs
Summary
Unsloth announces Dynamic 2.0 GGUFs, a major upgrade to Dynamic Quants. The new Dynamic v2.0 quantization selectively quantizes layers across models, enables model specific quants, and supports formats such as Q4_NL, Q5.1, Q5.0, Q4.1, and Q4.0 to boost efficiency on Apple Silicon and ARM devices. A calibration dataset containing more than 1.5 million tokens is used to improve conversational chat performance, and an internal evaluation framework enables apples to apples comparison with full precision, QAT and imatrix GGUF baselines. Benchmarks emphasize KL divergence as a key metric and show reductions alongside acceptable or improved MMLU scores. The article also covers new results for Qwen 3.5 benchmarks and describes a Gemma 3 QAT replication showing that a 12B Q4_0 GGUF can achieve 67.07 percent on MMLU while the BF16 baseline is 67.15, with 27B models near parity. A simple efficiency metric that accounts for disk size is introduced and used to compare variants. In addition the piece highlights bug fixes for Llama 4 affecting RoPE scaling and QK norm defaults, and shows that Unsloth GGUFs deliver higher accuracy than some third party providers when running on llama.cpp. The guide includes steps to run Llama 4 Scout, including commands to clone code, download the quantized GGUFs, and run the model with a large context. The article also points to further resources, community links and a Last Updated note. Overall the Dynamic 2.0 release offers a more adaptive quantization strategy, broader compatibility, and concrete benchmarks that are useful for developers evaluating quantization approaches for production apps.