Claude Code: connect to a local model when your quota runs out
Summary
The article explains how to continue Claude Code usage by connecting to local open-source models when quota is exhausted. It covers recommended models (GLM-4.7-Flash, Qwen3-Coder-Next), how to use LM Studio for local inference, and an alternative path via llama.cpp, including concrete setup steps and commands. It also notes trade-offs in speed and code quality and suggests monitoring quota with a /usage command.