Show HN: I trained a 9M speech model to fix my Mandarin tones
Summary
Simon Edwardsson trains a 9M-parameter Mandarin pronunciation model using a Conformer encoder and CTC loss on ~300 hours of data to provide frame-level feedback and on-device inference. The approach uses Pinyin+tone tokens, forces alignment with the Viterbi algorithm, and achieves strong tone accuracy with a small browser/mobile-friendly model, illustrating a practical path for CAPT-style language learning tools.