LLM Neuroanatomy: How I Topped the AI Leaderboard Without Changing a Single Weight
Summary
The author describes a hardware-only technique to top the HuggingFace Open LLM Leaderboard by duplicating middle transformer layers in a 72B model, without training or weight changes, using two gaming GPUs and quantized inference. He introduces LLM Neuroanatomy, arguing transformers comprise functional circuits—reading, thinking, and decoding—with the middle layers forming reusable reasoning blocks, evidenced by heatmap analyses and two orthogonal probes (hard math and EQ benchmarks). The piece discusses implications for hardware-efficient model deployment and mechanistic interpretability, and notes subsequent fine-tuning can yield further gains.