DigiNews

Tech Watch by Johan Denoyer

← Back to articles

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Quality: 8/10 Relevance: 9/10

Summary

Tri Dao and collaborators introduce Gram Newton-Schulz, a hardware-aware variant of the Muon Newton-Schulz orthogonalization that operates on the Gram matrix to reduce FLOPs and exploit symmetric GEMMs. They analyze stability, propose Restarting, Polar Express coefficients, and CuTeDSL kernels, and report substantial speedups in training-time benchmarks while preserving model quality. The post also provides open-source implementations and practical guidance on stability and deployment.

🚀 Service construit par Johan Denoyer