microsoft/VibeVoice
Summary
VibeVoice is an open-source frontier voice AI family from Microsoft, including ASR, TTS, and real-time streaming components. It introduces long-form processing (60-minute ASR, 90-minute TTS), continuous speech tokenizers, and a diffusion-based inference framework, with multilingual support and customization options. The project emphasizes responsible AI use and provides extensive docs, demos, and reports.