DeepSeek 4 Flash local inference engine for Metal
Summary
The article introduces DS4.c, a local inference engine for DeepSeek V4 Flash using a Metal backend. It highlights architecture decisions, performance traits like a 1M token context and 2-bit quantization, and the disk KV cache for persistence. It also provides setup and usage guidance, including download scripts and server/CLI usage.