NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute
Summary
NanoGPT Slowrun is an open project from Q Labs aiming to achieve data-efficient language modeling with unlimited compute. It shows 2.4x initially and 5.5x data efficiency so far by focusing on algorithmic improvements (shuffle, value embedding projections, SwiGLU activation, ensembling), with directions including second-order methods, diffusion, and curriculum learning.