SALOMI: Salomi, a research repo on extreme low-bit transformer quantization
Summary
SALOMI is a research repo focused on extreme low-bit transformer quantization and inference, exploring whether binary or near-binary weight representations can approach or exceed ternary baselines under realistic evaluation. It documents the onebit toolkit, evaluation suite, and research notes, highlighting that post-hoc 1-bit quantization struggles for GPT-2-class models and that around 1.2–1.35 bits-per-parameter is more credible using Hessian-guided VQ and mixed-precision approaches.