Running local models on an M4 with 24GB memory
Summary
This article explores running local LLMs on a 24GB-memory M4 MacBook Pro using Ollama, llama.cpp, and LM Studio, evaluating models like Qwen 3.5-9B and discussing configurations, context windows, and thinking mode. It provides practical setup snippets through Pi and OpenCode, compares with SOTA models, and highlights tradeoffs of local vs cloud-based AI.