Running Local LLMs Offline on a Ten-Hour Flight
Summary
The post documents a ten-hour flight experiment running local LLMs on a high-end MacBook, using Gemma 4 31B and Qwen 4.6 36B via LM Studio to build a DuckDB-based billing analytics tool. It details power, heat, and context-latency constraints, introduces two instrumentation tools (powermonitor and lmstats), and concludes local inference is viable for tight-scoped work but cloud models remain superior for large-context tasks.