LLMs are complicated now
Summary
The post analyzes how large language models have become more complex since early 2020s work, highlighting architectural variations, mixture-of-experts, and the move from simpler two-tower recsys-style designs to multi-GPU inference and diverse attention variants. It references open models, frameworks, and notable figures (e.g., Llama, FlexAttention, Karpathy) to illustrate the shift toward composability and kernel-level optimizations. The piece serves as a technical reflection on model design, tooling, and the challenges of evolving architectures.