Speech synthesis systems are now getting smarter and more natural thanks to the power of deep neural networks. However, each language has a different phonological and contex- tual characteristics, we have conducted experiments, statistics, and applied Vietnamese phonetics to improve speech synthesis systems based on Tacotron2 neural networks. Our methods achieve the accuracy of 97% in text normalization task, and the synthesized speeches with a MOS score of 3.97, asymptotic to 4.43 of the voices that are directly recorded. We also provide a library for standardizing Vietnamese text called Vinorm and a package that converts text into a phonetic format called Viphoneme, which is used as an input for end-to-end neural networks, make the synthesis process faster, more intelligent and natural than using character inputs
toperi-nguyen / vitacotron2 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from v-nhandt21/vitacotron2
Vietnamese Speech Synthesis with End-to-End Model and Text Normalization: 2020 7th NAFOSTED Conference on Information and Computer Science