Show HN: I trained a 9M speech model to fix my Mandarin tones

January 31, 2026 at 00:51

Quality: 9/10 Relevance: 9/10

Summary

Simon Edwardsson trains a 9M-parameter Mandarin pronunciation model using a Conformer encoder and CTC loss on ~300 hours of data to provide frame-level feedback and on-device inference. The approach uses Pinyin+tone tokens, forces alignment with the Viterbi algorithm, and achieves strong tone accuracy with a small browser/mobile-friendly model, illustrating a practical path for CAPT-style language learning tools.

Read Original Article