ggml-org/llama.cpp
Summary
ggml-org/llama.cpp is an open-source project focused on high-performance LLM inference in C/C++ with multi-backend support (including CPU and GPUs via CUDA, OpenCL, Vulkan, etc.). The repository provides CLI tools (llama-cli), a server API (llama-server), model quantization to GGUF, and extensive documentation for obtaining and running models locally. It emphasizes cross-platform efficiency, edge deployment, and broad language bindings and integrations within the AI tooling ecosystem.