DigiNews

Tech Watch by Johan Denoyer

← Back to articles

sectorllm: llama2 inference in < 1500 bytes of x86 assembly

Quality: 8/10 Relevance: 9/10

Summary

Sectorllm claims to be the world's smallest llama2 inference engine, fitting in 1369 bytes of x86 real mode assembly. It boots from disk, loads a quantized model, and performs a full transformer forward pass with greedy sampling, all before an operating system loads. The project uses int8 weights, precomputed exp and silu lookups, and fused matrices to minimize decoding overhead, and is open source with potential for larger models and future improvements.

🚀 Service construit par Johan Denoyer