Pool spare GPU capacity to run LLMs at larger scale
Summary
Mesh LLM pools spare GPU capacity to run LLMs at scale across multiple nodes. It uses pipeline parallelism for dense models and expert parallelism for MoE models with zero cross-node inference traffic, and includes network optimizations, a web console, and multi-model serving. Practical deployment steps, benchmarks, and integrations are provided for building scalable, private LLM workflows.