sectorllm: llama2 inference in < 1500 bytes of x86 assembly

May 4, 2026 at 19:23

Quality: 8/10 Relevance: 9/10

Summary

Sectorllm claims to be the world's smallest llama2 inference engine, fitting in 1369 bytes of x86 real mode assembly. It boots from disk, loads a quantized model, and performs a full transformer forward pass with greedy sampling, all before an operating system loads. The project uses int8 weights, precomputed exp and silu lookups, and fused matrices to minimize decoding overhead, and is open source with potential for larger models and future improvements.

AI Tools Open Source Open Source News

Read Original Article