DigiNews

Tech Watch Articles

← Back to articles

NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute

Quality: 8/10 Relevance: 9/10

Summary

NanoGPT Slowrun is an open project from Q Labs aiming to achieve data-efficient language modeling with unlimited compute. It shows 2.4x initially and 5.5x data efficiency so far by focusing on algorithmic improvements (shuffle, value embedding projections, SwiGLU activation, ensembling), with directions including second-order methods, diffusion, and curriculum learning.

🚀 Service construit par Johan Denoyer