Ask HN: What's the current best local/open speech-to-speech setup?

January 23, 2026 at 11:04

Quality: 6/10 Relevance: 8/10

Summary

The Hacker News discussion seeks the best local/open speech-to-speech setup for real-time voice assistants, focusing on low latency, streaming capability, and reproducible open-weight stacks. It references potential stacks (Qwen3 Omni, dsnote, WhisperFlow, delayed streams) and hardware considerations, highlighting gaps in end-to-end locally run solutions. The thread also points to Nvidia's PersonaPlex as a promising option and invites concrete configurations from the community.

Read Original Article