DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
Summary
DeepSeek-V4 introduces two Mixture-of-Experts models with a million-token context, featuring Hybrid Attention, mHC connections, and the Muon optimizer for efficient long-context processing. The technical report details model sizes, context length, training and post-training pipelines, and comprehensive benchmarks across knowledge, reasoning, and agentic tasks. It also provides model downloads, license information, local.run instructions, and a discussion of chat templates and citations.