Embarrassingly Simple Self-Distillation Improves Code Generation
Summary
The paper demonstrates self-distillation to improve code generation without external teachers or RL, showing gains across multiple model sizes and decoding setups. It attributes improvements to a precision-exploration dynamic in decoding and suggests SSD as a post-training option for enhancing LLM coding capabilities, with potential implications for AI-assisted development workflows.