GithubHelp home page GithubHelp logo

turkictts's Introduction

TurkicTTS
⌨️ 🗣

GitHub stars GitHub issues ISSAI Official Website

This repository provides a demo and a pre-trained model for the paper
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

Languages 💬

The model supports ten Turkic languages, including Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek. Spoken across a wide geographical area stretching from the Balkans through Central Asia to northeastern Siberia, these languages share a wide range of common linguistic features, such as vowel harmony, extensive agglutination, subject-object-verb order, and the absence of grammatical gender and articles.

Dataset 🗃️

Our study became feasible thanks to a large-scale and open-source speech corpus called KazakhTTS2. The corpus contains five voices (three female and two male) and more than 270 hours of high-quality transcribed data. KazakhTTS2 is publicly available, which permits both academic and commercial use.

Approach 🛠

To enable speech synthesis for the Turkic languages, we constructed an IPA-based conversion module. The IPA-based converter takes letters from the alphabets of other Turkic languages and converts them into the letters of the Kazakh alphabet. For this purpose, the letters entered are first converted into the corresponding IPA representations. Next, the IPA symbols are converted into the letters of the Kazakh alphabet, which can be used as input for the TTS models constructed.

The mappings of the Turkic alphabets onto IPA symbols were manually created based on our expertise, as we could not find a complete mapping that would allow an error-free conversion from Turkic to Kazakh and cover all the languages addressed. Since Kazakh is used as a source language, we selected only 42 IPA symbols corresponding to the 42 letters of the Kazakh alphabet. It is worth mentioning that, of the Turkic languages in question, Kazakh—along with Bashkir—has the most letters and contains a large majority of the phonemes of the target languages. The developed mappings can also be used as a guide for other work aimed at building multilingual systems for Turkic languages, such as speech recognition, speech translation, and so on. The mapping of the Turkic alphabets onto IPA symbols can be found here.

Surveys 🎧 → 😡☹️😐🙂😀 → ✅ → ⌨️

Below are the links to the ten questionnaires used in the study to collect subjective evaluations. These questionnaires were distributed on popular social media platforms operating in the Turkic languages. If you are interested, feel free to check them out. Your participation and input are greatly appreciated in helping us gather valuable data for our research. Your insights will contribute to a deeper understanding of the subject matter under investigation.

Each questionnaire consists of 20 short questions and should take you about 5 minutes. No background knowledge is required.

You will be asked to

  • listen to 10 audio recordings and rate their quality,
  • listen to 5 short questions and choose answers,
  • listen to 5 short sentences and type them.

Thank you for your time and consideration.

Azerbaijani ▫️ Bashkir ▫️ Kazakh ▫️ Kyrgyz ▫️ Sakha ▫️ Tatar ▫️ Turkish ▫️ Turkmen ▫️ Uyghur ▫️ Uzbek

Evaluation results

The survey statistics for rater number (R), gender (F & M), and age ( < 45 & 45+) and the evaluation results of the overall quality (Q), comprehensibility (C), and intelligibility (I) of synthesised speech.

Language R F M < 45 45+ Q C I
Azerbaijani 47 22 25 22 25 2.93 90% 52%
Bashkir 11 8 3 4 7 2.67 92% 47%
Kazakh 151 89 62 120 31 4.18 97% 80%
Kyrgyz 14 12 2 6 8 3.54 86% 43%
Sakha 254 155 99 147 107 2.85 93% 15%
Tatar 15 12 3 3 12 2.82 79% 17%
Turkish 18 6 12 15 3 3.25 91% 61%
Turkmen 6 0 6 6 0 2.37 67% 57%
Uyghur 10 6 4 6 4 3.01 45% 26%
Uzbek 22 2 20 19 3 2.85 80% 45%
Total 548 312 236 348 200 3.25 92% 41%

Pretrained models ⚙️

Unzip both the pre-trained vocoder and the acoustic model in the same directory.

vocoder: parallelwavegan_male2_checkpoint

acoustic model: kaztts_male2_tacotron2_train.loss.ave

Inference 🐍

from parallel_wavegan.utils import load_model
from espnet2.bin.tts_inference import Text2Speech
from scipy.io.wavfile import write
from utils import normalization
import torch

fs = 22050
vocoder_checkpoint="parallelwavegan_male2_checkpoint/checkpoint-400000steps.pkl" ### specify vocoder path
vocoder = load_model(vocoder_checkpoint).to("cuda").eval()
vocoder.remove_weight_norm()

### specify path to the main model(transformer/tacotron2/fastspeech) and its config file
config_file = "exp/tts_train_raw_char/config.yaml"
model_path = "exp/tts_train_raw_char/train.loss.ave_5best.pth"

text2speech = Text2Speech(
    config_file,
    model_path,
    device="cuda", ## if cuda not available use cpu
    ### only for Tacotron 2
    threshold=0.5,
    minlenratio=0.0,
    maxlenratio=10.0,
    use_att_constraint=True,
    backward_window=1,
    forward_window=3,
    ### only for FastSpeech & FastSpeech2
    speed_control_alpha=1.0,
)
text2speech.spc2wav = None  ### disable griffin-lim

text = "merhaba"
### available options are azerbaijani, bashkir, kazakh, kyrgyz, sakha, tatar, turkish, turkmen, uyghur, uzbek
lang = "turkish"

text = normalization(text, lang)
with torch.no_grad():
    c_mel = text2speech(text)['feat_gen']
    wav = vocoder.inference(c_mel)
write("result.wav", fs, wav.view(-1).cpu().numpy())

Synthesised samples 🔈

Azerbaijani

Azərbaycan Xəzər dənizi hövzəsinin qərbində yerləşir.
az_01.mov

Bashkir

Башҡортостан Республикаһы шарттарында ауыл хужалығы етерлек хеҙмәт ресурстарына нигеҙләнә.
ba_01.mov

Kazakh

Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет.
kk_01.mov

Kyrgyz

Кыргыз Республикасы — Борбордук Азияда жайгашкан мамлекет.
ky_01.mov

Sakha

Саха Өрөспүүбүлүкэтэ Сибиир хотугулуу-илин өттүгэр сытар.
sa_01.mov

Tatar

Татарстан территориясе — урманлы җирдә яткан тигезлек.
tt_01.mov

Turkish

Türk dünyası, tüm Türk halkları kapsayan bir kavramdır.
tr_01.mov

Turkmen

Türkmenistan merkezi Aziýada bir döwletdir.
tm_01.mov

Uyghur

Arabic: ئۇيغۇر خەلقى تۈركىي مىللەتلىرىنىڭ ئايرىلماس بىر قىسمى ھەم مۇھىم بىر تەركىبىي قىسمى.
Cyrillic: Уйғур хәлқи түркий милләтлириниң айрилмас бир қисми һәм муһим бир тәркибий қисми.
Latin: Uyghur xelqi türkiy milletlirining ayrilmas bir qismi hem muhim bir terkibiy qismi.
ug_01.mov

Uzbek

Oʻzbekiston — Markaziy Osiyoning markaziy qismida joylashgan mamlakat.
uz_01.mov

Acknowledgements 🙏

We would like to extend our heartfelt thanks to all individuals who contributed to the recruitment of participants for this study. Their efforts were critical to the success of our survey. In particular, we would like to express our deepest appreciation to Viktor Krivogornitsyn for his extraordinary dedication in attracting a substantial number of Sakha speakers. His contribution was invaluable, and we are grateful for his support.

Citation 🎓

We kindly request that if you utilise our model in your work, you consider citing our paper to acknowledge its contribution. Citing the appropriate sources helps promote academic integrity and ensures that credit is given to the original authors. By acknowledging our paper in your research, you contribute to the ongoing development and advancement of the scientific community. We appreciate your support and recognition of our efforts.

@inproceedings{yeshpanov23_interspeech,
  author={Rustem Yeshpanov and Saida Mussakhojayeva and Yerbolat Khassanov},
  title={{Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={5521--5525},
  doi={10.21437/Interspeech.2023-249}
}

turkictts's People

Contributors

yeshpanovrustem avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.