Better activation functions for NNUE
Summary
An experimental study of activation functions for Viridithas's NNUE shows replacing SCReLU layers with Swish, and then SwiGLU on L2, yields Elo gains and smoother evaluation scales. The article discusses challenges with Hard-Swish causing reduced sparsity, the use of regularization to restore density, and the broader potential of applying DL insights to chess NNUE design.