This repository is a phonemic multilingual (Russian-English) implementation based on Multi-Tacotron-Voice-Cloning. It's a telegram bot that used toolbox from original project to clone (Russian-English) speach and make TTS.
You will need the following whether you plan to use the bot, the toolbox only or to retrain the models.
Python3.7≥ version of python ≥Python 3.6.
PyTorch (>=1.0.1).
Run pip install -r requirements.txt
to install the necessary packages.
If you plan to use bot.py, you will need to create new bot with @botfarther and get your private TOKEN for it. I put a link to the tutorial in wiki section below.
A GPU is mandatory, but you don't necessarily need a high tier GPU if you only want to use the bot or toolbox.
Download the latest here.
Name | Language | Link | Comments | My link | Comments |
---|---|---|---|---|---|
Phoneme dictionary | En, Ru | En,Ru | Phoneme dictionary | link | Совместил русский и английский фонемный словарь |
LibriSpeech | En | link | 300 speakers, 360h clean speech | ||
VoxCeleb | En | link | 7000 speakers, many hours bad speech | ||
M-AILABS | Ru | link | 3 speakers, 46h clean speech | ||
open_tts, open_stt | Ru | open_tts, open_stt | many speakers, many hours bad speech | link | Почистил 4 часа речи одного спикера. Поправил анотацию, разбил на отрезки до 7 секунд |
Voxforge+audiobook | Ru | link | Many speaker, 25h various quality | link | Выбрал хорошие файлы. Разбил на отрезки. Добавил аудиокниг из интернета. Получилось 200 спикеров по паре минут на каждого |
RUSLAN | Ru | link | One speaker, 40h good speech | link | Перекодировал в 16кГц |
Mozilla | Ru | link | 50 speaker, 30h good speech | link | Перекодировал в 16кГц, Раскидал разных пользователей по папкам |
Russian Single | Ru | link | One speaker, 9h good speech | link | Перекодировал в 16кГц |
You can then try the bot
python bot.py
You can then try the toolbox:
python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py
[Tutorial how to get your private TOKEN for chat bot] https://www.siteguarding.com/en/how-to-get-telegram-bot-api-token
Тренировка (и для других языков)
Training (and for other languages)
URL | Designation | Title | Implementation source |
---|---|---|---|
1806.04558 | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | CorentinJ |
1802.08435 | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | fatchord/WaveRNN |
1712.05884 | Tacotron 2 (synthesizer) | Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions | Rayhane-mamah/Tacotron-2 |
1710.10467 | GE2E (encoder) | Generalized End-To-End Loss for Speaker Verification | CorentinJ |