MLX-VLM: Inference and Fine-Tuning of Vision-Language Models on macOS
Summary
MLX-VLM is a macOS-focused toolkit for inference and fine-tuning of Vision Language Models. It provides a CLI, Gradio-based chat UI, and a server API, with multi-modal inputs, video support, and caching/quantization techniques to boost performance and memory efficiency, along with LoRA/QLoRA fine-tuning support.