DigiNews

Tech Watch by Johan Denoyer

← Back to articles

A 10 year old Xeon is all you need

Quality: 8/10 Relevance: 9/10

Summary

This post documents running a 26B parameter Mixture-of-Experts LLM on a 2016 Xeon with DDR3 RAM and no GPU, focusing on memory bandwidth bottlenecks and CPU-based optimizations. It walks through the hardware constraints, the set of flags for ik_llama.cpp, and the concept of speculative decoding, MoE routing, and memory management to achieve usable performance on aging hardware. The piece emphasizes open-weight ideas and deploying AI locally without black-box tooling.

🚀 Service construit par Johan Denoyer