VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models
Summary
VibeThinker-3B is a 3-billion-parameter model exploring verifiable reasoning within a small-model regime. Built on Spectrum-to-Signal post-training, it uses curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation to push frontier reasoning, achieving strong results on tasks like AIME26, LiveCodeBench, and LeetCode, and introducing the Parametric Compression-Coverage Hypothesis.