Arabic Tacotron TTS

An implementation of Tacotron speech synthesis in TensorFlow for Arabic.

Audio Samples

Check Audio Samples from models trained using this repo on Nawar Halabi's speech corpus

Background

In April 2017, Google published a paper, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs. However, they didn't release their source code or training data. This is an attempt to provide an open-source implementation of the model described in their paper.

This implementation is pretty much the same as Keithito's implemetation. Here are changes I made.

Check this article to know more about this project Arabic-Tacotron-TTS

Quick Start

Installing dependencies

Install Python 3. Use version 3.5 instead of newer python versions for tensorflow support. You could use anaconda to create a new environment by
1. conda create -n myenv python=3.5
2. activate myenv
Install requirements: pip install -r requirements.txt
Install tensorflow
- pip install tensorflow or pip install tensorflow-gpu

Using a pre-trained model

Download and unpack the pretrained model
Extract the model files into a folder in a destination of your choice
Run the demo server: python demo_server.py --checkpoint .\{folder_in_a_destination_of_your_choice}\model.ckpt-200000
Point your browser at localhost:9200
- Type what you want to synthesize. Use only diacritised Arabic text.

Training

Note: you need 40GB (more or less) of free disk space to train a model.

Download a speech dataset. The following are supported out of the box:
- Nawar Halabi You can use other datasets if you convert them to the right format. See TRAINING_DATA.md for more info.
Preprocess
- Unpack the dataset`
- Add a folder called nawar_without_hag9 in ~/tacotron
  - Download temp_filtered.csv and add it there.
  - Add a folder called wavs there in which all wav files are there Your tree should look like this
```
tacotron
    |- nawar_without_hag9
        |- temp_filtered.csv
        |- wavs
```
- Run python .\preprocess.py --dataset nawar
- Update max_iters to 400 if not already set to that number
Train
- python .\train.py
Monitor with Tensorboard (optional) The trainer dumps audio and alignments every 1000 steps. You can find these in ~/tacotron/logs-tacotron. You could use tensorboard to make sense out of these data using the following command. tensorboard --logdir ~/tacotron/logs-tacotron

Changes from the original repo

Added Arabic speech corpus preprocessing code and created temp_filtered.csv
Hosted Arabic trained model based on Nawar Halabi's Speech Corpus
Updated hparams to work with the Arabic speech corpus
Added an instructional explanation on how to reproduce
Added Arabic specific tests
Removed some of the unused code
Updated symbols to match the Arabic phonetic language
Adjusted data-feeder so that all input text are phonetised by arabic_pronounce

Areas of Improvement

Add cleaners
Add embedded diacritiser

Summary of important commands

python .\preprocess.py --dataset nawar
python .\train.py --restore_step 201000
python .\demo_server.py --checkpoint C:\Users\User\tacotron\logs-tacotron\model.ckpt-70000
python -m pytest

Notes

Remember to delete the files in the training folder then preprocess again if you changed the config

Thanks to

Suhail Kwailat, Dr. Taha Zerrouki, Dr. Motaz Saad, Dr. Nawar Halabi, Keith Ito, Dr. Basem Ahmed, and Leo Ma for their detailed feedback and recommendations.

ramanova / arabic-tacotron-tts Goto Github PK