GithubHelp home page GithubHelp logo

whitefu / tacotron Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bshall/tacotron

0.0 1.0 0.0 1015 KB

A Tacotron implementation with location relative attention

Home Page: https://bshall.github.io/Tacotron/

License: MIT License

Python 82.91% Jupyter Notebook 17.09%

tacotron's Introduction

Open In Colab

Tacotron (with Dynamic Convolution Attention)

A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis. Audio samples can be found here. Colab demo can be found here.

Tacotron (with Dynamic Convolution Attention)
Fig 1:Tacotron (with Dynamic Convolution Attention).
Example Mel-spectrogram and attention plot
Fig 2:Example Mel-spectrogram and attention plot.

Quick Start

Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install this package with:

pip install tacotron

Example Usage

Open In Colab

import torch
import soundfile as sf
from univoc import Vocoder
from tacotron import load_cmudict, text_to_id, Tacotron

# download pretrained weights for the vocoder (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
    "https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()

# download pretrained weights for tacotron (and optionally move to GPU)
tacotron = Tacotron.from_pretrained(
    "https://github.com/bshall/Tacotron/releases/download/v0.1/tacotron-ljspeech-yspjx3.pt"
).cuda()

# load cmudict and add pronunciation of PyTorch
cmudict = load_cmudict()
cmudict["PYTORCH"] = "P AY1 T AO2 R CH"

text = "A PyTorch implementation of Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis."

# convert text to phone ids
x = torch.LongTensor(text_to_id(text, cmudict)).unsqueeze(0).cuda()

# synthesize audio
with torch.no_grad():
    mel, _ = tacotron.generate(x)
    wav, sr = vocoder.generate(mel.transpose(1, 2))

# save output
sf.write("location_relative_attention.wav", wav, sr)

Train from Scatch

  1. Clone the repo:
git clone https://github.com/bshall/Tacotron
cd ./Tacotron
  1. Install requirements:
pip install -r requirements.txt
  1. Download and extract the LJ-Speech dataset:
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
  1. Download the train split here and extract it in the root directory of the repo.
  2. Extract Mel spectrograms and preprocess audio:
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
  1. Train the model:
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1 text_dir=path/to/LJSpeech-1.1/metadata.csv

Pretrained Models

Pretrained weights for the LJSpeech model are available here.

Notable Differences from the Paper

  1. Trained using a batch size of 64 on a single GPU (using automatic mixed precision).
  2. Used a gradient clipping threshold of 0.05 as it seems to stabilize the alignment with the smaller batch size.
  3. Used a different learning rate schedule (again to deal with smaller batch size).
  4. Used 80-bin (instead of 128 bin) log-Mel spectrograms.

Acknowlegements

tacotron's People

Contributors

bshall avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.