Light

loong1989 / multi_speaker_tts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from codejin/multi_speaker_tts

0.0 1.0 0.0 79 KB

Implementation of Multi speaker TTS

Python 100.00%

multi_speaker_tts's Introduction

Multi speaker TTS

This code is an implementation of the paper 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis', except 'WAVENET'. The algorithm is based on the following papers:

Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., ... & Le, Q. (2017). Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135.
Wan, L., Wang, Q., Papir, A., & Moreno, I. L. (2017). Generalized end-to-end loss for speaker verification. arXiv preprint arXiv:1710.10467.
Jia, Y., Zhang, Y., Weiss, R. J., Wang, Q., Shen, J., Ren, F., ... & Wu, Y. (2018). Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis. arXiv preprint arXiv:1806.04558.
Prenger, R., Valle, R., & Catanzaro, B. (2019, May). Waveglow: A flow-based generative network for speech synthesis. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3617-3621). IEEE.

Structrue

The model is divided into three parts that are learned independently of each other: speaker embedding, tacotron 2, and vocoder. Of these, there are two types of vocoder can be attached: the Tacotron 1 style and Waveglow.

Used dataset

Currently uploaded code is compatible with the following datasets. The O mark to the left of the dataset name is the dataset actually used in the uploaded result.

Speaker embedding

[X] VCTK: https://datashare.is.ed.ac.uk/handle/10283/2651
[X] LibriSpeech: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
[O] VoxCeleb: http://www.openslr.org/12/

Mel to Spectrogram

[O] VCTK: https://datashare.is.ed.ac.uk/handle/10283/2651
[O] LibriSpeech: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/

Waveglow

[O] VCTK: https://datashare.is.ed.ac.uk/handle/10283/2651
Any voice wav files can be used.

Multi speaker TTS

[X] LJSpeech: https://keithito.com/LJ-Speech-Dataset/
[O] VCTK: https://datashare.is.ed.ac.uk/handle/10283/2651
[X] LibriSpeech: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
[X] Tedlium: http://www.openslr.org/12/
[O] TIMIT: http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

Instruction

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameter.py' according to your environment.

Training

Speaker embedding

Generate pattern

python -m Speaker_Embedding.Pattern_Generate [options]

option list:
-vctk <path>		Set the path of VCTK. VCTK's patterns are generated.
-ls <path>		Set the path of LibriSpeech. LibriSpeech's patterns are generated.
-vox1 <path>		Set the path of VoxCeleb1. VoxCeleb1's patterns are generated.
-vox2 <path>		Set the path of VoxCeleb2. VoxCeleb2's patterns are generated.

Set inference files path while training for verification. Edit 'Speaker_Embedding_Inference_in_Train.txt'

Run

python -m Speaker_Embedding.Speaker_Embedding

Mel to spectrogram

Generate pattern

python -m Taco1_Mel_to_Spect.Pattern_Generate [options]

option list:
-vctk <path>		Set the path of VCTK. VCTK's patterns are generated.
-ls <path>		Set the path of LibriSpeech. LibriSpeech's patterns are generated.

Set inference files path while training for verification. Edit 'Mel_to_Spect_Inference_in_Train.txt'

Run

python -m Taco1_Mel_to_Spect.Taco1_Mel_to_Spect

Waveglow

There is no pattern generate step. Waveglow use wav file directly as patterns.

Set inference files path while training for verification. Edit 'WaveGlow_Inference_File_Path_in_Train.txt'

Run

python -m WaveGlow.WaveGlow

Multi speaker TTS

Generate pattern

python Pattern_Generate.py [options]

option list:
-lj <path>		Set the path of LJSpeech. LJSpeech's patterns are generated.
-vctk <path>		Set the path of VCTK. VCTK's patterns are generated.
-ls <path>		Set the path of LibriSpeech. LibriSpeech's patterns are generated.
-tl <path>		Set the path of Tedlium. Tedlium's patterns are generated.
-timit <path>		Set the path of TIMIT. TIMIT's patterns are generated.
-all		All save option. Generator ignore the 'Use_Wav_Length_Range' hyper parameter. If this option is not set, only patterns matching 'Use_Wav_Length_Range' will be generated.

Set inference files path and sentence while training for verification. Edit 'Inference_Sentence_in_Train.txt'

Run

python MSTTS_SV.py

Test

Run 'ipython' in the model's directory.

Run following command:

from MSTTS_SV import Tacotron2
new_Tacotron2 = Tacotron2(is_Training= False)
new_Tacotron2.Restore()

Set the speaker's Wav path list and text list like the following example:

path_List = [
    'E:/Multi_Speaker_TTS.Raw_Data/LJSpeech/wavs/LJ040-0143.wav',
    'E:/Multi_Speaker_TTS.Raw_Data/LibriSpeech/train/17/363/17-363-0039.flac',
    'E:/Multi_Speaker_TTS.Raw_Data/VCTK/wav48/p314/p314_020.wav',
    'E:/Multi_Speaker_TTS.Raw_Data/VCTK/wav48/p256/p256_001.wav'
    ]
text_List = [
    'He that has no shame has no conscience.',
    'Who knows much believes the less.',
    'Things are always at their best in the beginning.',
    'Please call Stella.'
    ]

※Two lists should have same length.

Run following command:

new_Tacotron2.Inference(
    path_List = path_List,
    text_List = text_List,
    file_Prefix = 'Result'
    )

Result

Speaker embedding

Mel to spectrogram

Waveglow

Currently, the performance of Waveglow was not good.

Multi speaker TTS

Exported wav files: WAV.zip

Trained checkpoint

https://drive.google.com/drive/folders/1wXrJY-gQTOs9yZ7nxvxPaAa6Wf8uF7zP?usp=sharing

Future works

Waveglow performance improvment

multi_speaker_tts's People

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs