GithubHelp home page GithubHelp logo

mozilla / tts Goto Github PK

View Code? Open in Web Editor NEW
9.2K 186.0 1.2K 122.83 MB

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

License: Mozilla Public License 2.0

Python 12.61% Jupyter Notebook 87.22% HTML 0.09% Shell 0.08%
deep-learning text-to-speech python pytorch tacotron tts speaker-encoder dataset-analysis tacotron2 tensorflow2

tts's Introduction

TTS: Text-to-Speech for all.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

CircleCI License PyPI version

๐Ÿ“ข English Voice Samples and SoundCloud playlist

๐Ÿ‘จโ€๐Ÿณ TTS training recipes

๐Ÿ“„ Text-to-Speech paper collection

๐Ÿ’ฌ Where to ask questions

Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly, so that more people can benefit from it.

Type Platforms
๐Ÿšจ Bug Reports GitHub Issue Tracker
โ” FAQ TTS/Wiki
๐ŸŽ Feature Requests & Ideas GitHub Issue Tracker
๐Ÿ‘ฉโ€๐Ÿ’ป Usage Questions Discourse Forum
๐Ÿ—ฏ General Discussion Discourse Forum and Matrix Channel

๐Ÿ”— Links and Resources

Type Links
๐Ÿ’พ Installation TTS/README.md
๐Ÿ‘ฉ๐Ÿพโ€๐Ÿซ Tutorials and Examples TTS/Wiki
๐Ÿš€ Released Models TTS/Wiki
๐Ÿ’ป Docker Image Repository by @synesthesiam
๐Ÿ–ฅ๏ธ Demo Server TTS/server
๐Ÿค– Running TTS on Terminal TTS/README.md
โœจ How to contribute TTS/README.md

๐Ÿฅ‡ TTS Performance

"Mozilla*" and "Judy*" are our models. Details...

Features

  • High performance Deep Learning models for Text2Speech tasks.
    • Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
    • Speaker Encoder to compute speaker embeddings efficiently.
    • Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
  • Fast and efficient model training.
  • Detailed training logs on console and Tensorboard.
  • Support for multi-speaker TTS.
  • Efficient Multi-GPUs training.
  • Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference.
  • Released models in PyTorch, Tensorflow and TFLite.
  • Tools to curate Text2Speech datasets underdataset_analysis.
  • Demo server for model testing.
  • Notebooks for extensive model benchmarking.
  • Modular (but not too much) code base enabling easy testing for new ideas.

Implemented Models

Text-to-Spectrogram

Attention Methods

  • Guided Attention: paper
  • Forward Backward Decoding: paper
  • Graves Attention: paper
  • Double Decoder Consistency: blog

Speaker Encoder

Vocoders

You can also help us implement more models. Some TTS related work can be found here.

Install TTS

TTS supports python >= 3.6, <3.9.

If you are only interested in synthesizing speech with the released TTS models, installing from PyPI is the easiest option.

pip install TTS

If you plan to code or train models, clone TTS and install it locally.

git clone https://github.com/mozilla/TTS
pip install -e .

Directory Structure

|- notebooks/       (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
|- utils/           (common utilities.)
|- TTS
    |- bin/             (folder for all the executables.)
      |- train*.py                  (train your target model.)
      |- distribute.py              (train your TTS model using Multiple GPUs.)
      |- compute_statistics.py      (compute dataset statistics for normalization.)
      |- convert*.py                (convert target torch model to TF.)
    |- tts/             (text to speech models)
        |- layers/          (model layer definitions)
        |- models/          (model definitions)
        |- tf/              (Tensorflow 2 utilities and model implementations)
        |- utils/           (model specific utilities.)
    |- speaker_encoder/ (Speaker Encoder models.)
        |- (same)
    |- vocoder/         (Vocoder models.)
        |- (same)

Sample Model Output

Below you see Tacotron model state after 16K iterations with batch-size 32 with LJSpeech dataset.

"Recent research at Harvard has shown meditating for as little as 8 weeks can actually increase the grey matter in the parts of the brain responsible for emotional regulation and learning."

Audio examples: soundcloud

example_output

Datasets and Data-Loading

TTS provides a generic dataloader easy to use for your custom dataset. You just need to write a simple function to format the dataset. Check datasets/preprocess.py to see some examples. After that, you need to set dataset fields in config.json.

Some of the public datasets that we successfully applied TTS:

Example: Synthesizing Speech on Terminal Using the Released Models.

After the installation, TTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under the TTS project.

Listing released TTS models.

tts --list_models

Run a tts and a vocoder model from the released model list. (Simply copy and paste the full model names from the list as arguments for the command below.)

tts --text "Text for TTS" \
    --model_name "<type>/<language>/<dataset>/<model_name>" \
    --vocoder_name "<type>/<language>/<dataset>/<model_name>" \
    --out_path folder/to/save/output/

Run your own TTS model (Using Griffin-Lim Vocoder)

tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path output/path/speech.wav

Run your own TTS and Vocoder models

tts --text "Text for TTS" \
    --model_path path/to/config.json \
    --config_path path/to/model.pth.tar \
    --out_path output/path/speech.wav \
    --vocoder_path path/to/vocoder.pth.tar \
    --vocoder_config_path path/to/vocoder_config.json

Note: You can use ./TTS/bin/synthesize.py if you prefer running tts from the TTS project folder.

Example: Training and Fine-tuning LJ-Speech Dataset

Here you can find a CoLab notebook for a hands-on example, training LJSpeech. Or you can manually follow the guideline below.

To start with, split metadata.csv into train and validation subsets respectively metadata_train.csv and metadata_val.csv. Note that for text-to-speech, validation performance might be misleading since the loss value does not directly measure the voice quality to the human ear and it also does not measure the attention module performance. Therefore, running the model with new sentences and listening to the results is the best way to go.

shuf metadata.csv > metadata_shuf.csv
head -n 12000 metadata_shuf.csv > metadata_train.csv
tail -n 1100 metadata_shuf.csv > metadata_val.csv

To train a new model, you need to define your own config.json to define model details, trainin configuration and more (check the examples). Then call the corressponding train script.

For instance, in order to train a tacotron or tacotron2 model on LJSpeech dataset, follow these steps.

python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json

To fine-tune a model, use --restore_path.

python TTS/bin/train_tacotron.py --config_path TTS/tts/configs/config.json --restore_path /path/to/your/model.pth.tar

To continue an old training run, use --continue_path.

python TTS/bin/train_tacotron.py --continue_path /path/to/your/run_folder/

For multi-GPU training, call distribute.py. It runs any provided train script in multi-GPU setting.

CUDA_VISIBLE_DEVICES="0,1,4" python TTS/bin/distribute.py --script train_tacotron.py --config_path TTS/tts/configs/config.json

Each run creates a new output folder accomodating used config.json, model checkpoints and tensorboard logs.

In case of any error or intercepted execution, if there is no checkpoint yet under the output folder, the whole folder is going to be removed.

You can also enjoy Tensorboard, if you point Tensorboard argument--logdir to the experiment folder.

Contribution Guidelines

This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the Mozilla Community Participation Guidelines.

  1. Create a new branch.
  2. Implement your changes.
  3. (if applicable) Add Google Style docstrings.
  4. (if applicable) Implement a test case under tests folder.
  5. (Optional but Prefered) Run tests.
./run_tests.sh
  1. Run the linter.
pip install pylint cardboardlint
cardboardlinter --refspec master
  1. Send a PR to dev branch, explain what the change is about.
  2. Let us discuss until we make it perfect :).
  3. We merge it to the dev branch once things look good.

Feel free to ping us at any step you need help using our communication channels.

Collaborative Experimentation Guide

If you like to use TTS to try a new idea and like to share your experiments with the community, we urge you to use the following guideline for a better collaboration. (If you have an idea for better collaboration, let us know)

  • Create a new branch.
  • Open an issue pointing your branch.
  • Explain your idea and experiment.
  • Share your results regularly. (Tensorboard log files, audio results, visuals etc.)

Major TODOs

Acknowledgement

tts's People

Contributors

anand-371 avatar bajibabu avatar dependabot[bot] avatar edresson avatar erogol avatar fatihkiralioglu avatar forcecore avatar geneing avatar gerazov avatar jyegerlehner avatar lexkoro avatar lstolcman avatar luhavis avatar m-toman avatar maxbachmann avatar mic92 avatar mittimithai avatar mweinelt avatar nmstoker avatar repodiac avatar reuben avatar richardburleigh avatar sanjaesc avatar thllwg avatar thorstenmueller avatar tomzx avatar tset-tset-tset avatar twerkmeister avatar weberjulian avatar yweweler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tts's Issues

Checkpoint Sharing

Hi guys,
Thank you for your work! This is very nice. I was wondering if you could share your trained model so I can play a bit with it without having to train my own which I assume takes a long time. Also would you be kind enough to indicate how long it took you on what kind of hardware?

Thanks!

Missing keys in State Dictionary

When running both with and without CUDA, using either pretrained model, I get the following error:

image

RuntimeError: Error(s) in loading state_dict for Tacotron:
	Missing key(s) in state_dict: "decoder.stopnet.1.weight", "decoder.stopnet.1.bias". 

The state dict has keys for decoder.prenet but not decoder.stopnet. Is there a workaround to this other than training my own model from scratch?

404 Readme link

You might hear a sample here.

The link goes to a non-existant page.

make a high quality public domain training set using mozilla deepspeech and librivox (idea\enhancement)

As I understand it, the difference between Google's model and the pretrained available here is the quality and size of the training set.

Would it be possible to take a high quality long librivox recording and use mozilla's STT model to pin point the timing of each spoken word (we already have the ground truth text from librivox, so it's only a matter of timing it)?

We could get some tens of hours of single person recording this way.

Does it make sense? How easy is this to accomplish? I could have a go if it's not hard, haven't messed with deepspeech yet, and haven't looked at how the dataset is encoded yet, so I don't know how hard or important it is.

Testing

I want use pretrained network so I used the Notebook under notebooks folder.
but:
ModuleNotFoundError: No module named 'torchviz'
I use conda and pip for install 'torchviz' but :
"Could not find a version that satisfies the requirement torchviz (from versions: )
No matching distribution found for torchviz"

solve:

pip install git+https://github.com/szagoruyko/pytorchviz

any experience of unstable tacotron?

first, thank you for this amazing source and hard work!
I really love it.

I met some bad case like skipping word, unstopped case, repeat word, any insight or trying on this topic?

why use attention smoothing?

I saw this:
alignment = torch.sigmoid(alignment) / torch.sigmoid(alignment).sum(dim=1).unsqueeze(1)

any experiment on this, is it improving anything?

CPU training compatible

For some poor people, we have to using CPU to train. I found this line codes not work on CPU:

 # forward pass
        mel_output, linear_output, alignments, stop_tokens = torch.nn.parallel.data_parallel(
            model, (text_input, mel_input, mask))

Just only this code. How to change it to make it run on CPU?

prenet dropout

I was using another repo previously, and now I am switching to mozilla TTS;

according to my experience, the dropout in decoder prenet also used in inference, without dropout in inference, the quality is bad(tacotron 2), which is hard to understand,

do you get similar experience and why?

Tacotron2 + WaveRNN experiments

Tacotron2: https://arxiv.org/pdf/1712.05884.pdf
WaveRNN: https://github.com/erogol/WaveRNN forked from https://github.com/fatchord/WaveRNN

The idea is to add Tacotron2 as another alternative if it is really useful then the current model.

  • Code boilerplate tracotron2 architecture.
  • Train Tacotron2 and compare results (Baseline)
  • Train TTS current model in a comparable size with T2. (Current TTS model has 7M and Tacotron2 has 28M parameters)
  • Add TTS specific architectural changes to T2 and compare with the baseline.
  • Train WaveRNN a vocoder on generated spectrograms
  • Train a better stopnet. Stopnet sometimes misses the prediction that leads to unstable predictions. Maybe it is better to use a RNN as previous TTS version.
  • Release LJspeech Tacotron 2 model. (soon)
  • Release LJSpeech WaveRNN model. (https://github.com/erogol/WaveRNN)

Best result so far: https://soundcloud.com/user-565970875/ljspeech-logistic-wavernn

Some findings:

  • Adding an entropy loss for the attention seems to improve the cases hard to learn the alignment. It forces network to learn more sparse and noise free alignment weights.
entropy = torch.distributions.Categorical(probs=alignments).entropy()
entropy_loss = (entropy / np.log(alignments.shape[1])).mean()
loss += 1e-4 * entropy_loss

Here is the alignment with entropy loss. However, if you keep the loss weight high, then it degrades the model's generalization for new words.
image

  • Replacing Prenet with a BatchNorm version ehnace the performance quite a lot.
  • A network with BN Prenet is harder to learn the attention. It looks like the network needs a level of noise onto autoregressive connection to relate encoder output to network output. Otwerwise, in teacher forcing mode, network does not need encoder output since it finds previous prediction frame enough to generate the next frame.
  • Forward attention seems more robust to longer sequences and faster to align. (https://arxiv.org/abs/1807.06736)

Min DB and Ref DB

Neither the Tacotron 2 or Tacotron paper mentioned anything about any decibel normalization, can you help me understand why this is necessary?

Relevant config:

 "min_level_db": -100,
  "ref_level_db": 20,

Tacotron: Trying r < 5

Expecting better fidelity with r=2, which is also the setting used by the original paper.

Our previous runs use r=5 for the benefit of faster training.

synthesizer wav length

Hi,
while testing with ljspeech model I found that a maximum wav file length is 12s, generated by synthesizer. Is it limited by trained dataset initially or has any options to change it?

No module named 'TTS'

Python version: 3.6.6
requirements.txt installation successful.
python3.6 setup.py develop successfully completed.

But TTS module not generated.

>>> from TTS.models.tacotron import Tacotron
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'TTS'

Are there any paths to be set? I am not using virtualenv.

Tried the same with miniconda as well, but no luck.

Error with 'initial_lr' parameter

Environment:

Python 3.6
PyToch 0.4.1
Cuda 9.1

I encountered the following error while trying to train a model from LJSpeech following the steps in README.md:

Traceback (most recent call last):
File "train.py", line 493, in
main(args)
File "train.py", line 433, in main
scheduler = AnnealLR(optimizer, warmup_steps=c.warmup_steps, last_epoch=args.restore_step)
File "/workspace/TTS/utils/generic_utils.py", line 148, in init
super(AnnealLR, self).init(optimizer, last_epoch)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 20, in init
"in param_groups[{}] when resuming an optimizer".format(i))
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

After digging around a bit, it looks like the problem is with the 'last_epoch=args.restore_step' argument to AnnealLR() call. This argument is set in train.py to zero when not using a checkpoint on line 425:

args.restore_step = 0

However, the lr_scheduler.py module expects "-1" for the initial epoch. I changed zero to -1 in line 425

args.restore_step = -1

and the training from scratch seems to be working now.

Inference time metrics?

Hello, I am very interested in this project, I am looking for a pytorch implementation of Tacotron/Tacotron2/WaveNet and may wish to contribute. Do you have any metrics on forward-pass time for inference on new text? I am looking to export a PyTorch model into Caffe2 and run it on a mobile platform.

Where to find metadata_val.csv

I downloaded the dataset from https://keithito.com/LJ-Speech-Dataset/ by click on Download button. Assuming it to be enough I ran python train.py --config_path config.json by modifying the config file according to my own computer.

First it complained about missing metadata_train.csv and then about missing metadata_val.csv .

In the readme, there is no mention whether I need to run anything else to do preprocessing. So maybe something I am missing.

To try to fix, I copied metadata.csv into metadata_train.csv and metadata_val.csv files and gave it a run and got following error:

 > Git Hash: 186a81c
 > Experiment folder: /Users/manish/Work/TTS/experiments/July-11-2018_04:51PM-best-model-186a81c
 > Reading LJSpeech from - /Users/manish/Downloads/LJSpeech-1.1/wavs
 | > Number of instances : 13100
 | > Max length sequence 187
 | > Min length sequence 5
 | > Avg length sequence 98.34648854961831
 | > 0 instances are ignored by min_seq_len (0)
 > Reading LJSpeech from - /Users/manish/Downloads/LJSpeech-1.1/wavs
 | > Number of instances : 13100
 | > Max length sequence 187
 | > Min length sequence 5
 | > Avg length sequence 98.34648854961831
 | > 0 instances are ignored by min_seq_len (0)
 | > Number of characters : 149

 > Starting a new training
 | > Model has 7385090 parameters
 | > Epoch 0/1000
 ! Run is removed from /Users/manish/Work/TTS/experiments/July-11-2018_04:51PM-best-model-186a81c
Traceback (most recent call last):
  File "train.py", line 434, in <module>
    main(args)
  File "train.py", line 424, in main
    model, criterion, criterion_st, train_loader, optimizer, optimizer_st, epoch)
  File "train.py", line 112, in train
    model.forward(text_input, mel_input)
  File "/Users/manish/Work/TTS/models/tacotron.py", line 31, in forward
    encoder_outputs, mel_specs)
  File "/Users/manish/miniconda3/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/manish/Work/TTS/layers/tacotron.py", line 242, in forward
    memory = memory.view(B, memory.size(1) // self.r, -1)
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensor.cpp:280

Notebook Sample Generation results in a RuntimeError

I have begun training TTS on the en_UK corpus released by M-AILABS. However, I doubt that the behaviour I experienced is related to the latter.

Essentially, I followed the notebooks given (and placed it into a python script generate.py) and both have resulted in the following error:

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAIntTensor instead (while checking arguments for embedding)

It seems to trace back to either the model or the create_speech function.

Are the notebooks just outdated or is this unusual behaviour?

NameError: name 'mode' is not defined

Hi,
When I am running the benchmark with checkpoint_272976.pth.tar I am getting this error:

`` | > Number of characters : 149
Traceback (most recent call last):
File "ttsbechmark.py", line 46, in
model = Tacotron(CONFIG.embedding_size, CONFIG.num_freq, CONFIG.num_mels, CONFIG.r)
File "/media/hamza/Local Disk/Projects/Untitled Folder/TTS/models/tacotron.py", line 20, in init
self.decoder = Decoder(256, mel_dim, r)
File "/media/hamza/Local Disk/Projects/Untitled Folder/TTS/layers/tacotron.py", line 203, in init
self.mode = mode
NameError: name 'mode' is not defined

Thanks

KeyError: ((1, 1, 1000), '|u1') && tensorboradX error

Traceback (most recent call last):
  File "/home/jackie/anaconda3/envs/tts/lib/python3.6/site-packages/PIL/Image.py", line 2460, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 1000), '|u1')

During handling of the above exception, another exception occurred:


Traceback (most recent call last):
  File "train.py", line 477, in <module>
    main(args)
  File "train.py", line 468, in main
    val_loss = evaluate(model, criterion, criterion_st, val_loader, current_step)
  File "train.py", line 310, in evaluate
    tb.add_image('ValVisual/Reconstruction', const_spec, current_step)
  File "/home/jackie/anaconda3/envs/tts/lib/python3.6/site-packages/tensorboardX/writer.py", line 412, in add_image
    self.file_writer.add_summary(image(tag, img_tensor), global_step, walltime)
  File "/home/jackie/anaconda3/envs/tts/lib/python3.6/site-packages/tensorboardX/summary.py", line 205, in image
    image = make_image(tensor, rescale=rescale)
  File "/home/jackie/anaconda3/envs/tts/lib/python3.6/site-packages/tensorboardX/summary.py", line 243, in make_image
    image = Image.fromarray(tensor)
  File "/home/jackie/anaconda3/envs/tts/lib/python3.6/site-packages/PIL/Image.py", line 2463, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

Thx in advance! I guess it is because of the different version of tensorboardX?

librosa.util.exceptions.ParameterError: Target size (38) must be at least input size (1100)

After running the server( python3 server/server.py -c server/conf.json),
I tried to test text through web browser.

But, Message "librosa.util.exceptions.ParameterError: Target size (38) must be at least input size (1100)" has occurred

Below is the full message.

File "tts/utils/audio.py", line 110, in _griffin_lim
angles = np.exp(1j * np.angle(self._stft(y)))
File "tts/utils/audio.py", line 124, in _stft
y=y, n_fft=self.n_fft, hop_length=self.hop_length, win_length=self.win_length)
File "tts/lib/python3.6/site-packages/librosa-0.5.1-py3.6.egg/librosa/core/spectrum.py", line 152, in stft
fft_window = util.pad_center(fft_window, n_fft)
File "tts/lib/python3.6/site-packages/librosa-0.5.1-py3.6.egg/librosa/util/utils.py", line 287, in pad_center
'at least input size ({:d})').format(size, n))
librosa.util.exceptions.ParameterError: Target size (38) must be at least input size (1100)

Where is the newest model

I download the 272976 iter model, and run notebooks synthesis.py got error:

RuntimeError: Error(s) in loading state_dict for Tacotron:
	Missing key(s) in state_dict: "encoder.cbhg.cbhg.conv1d_banks.0.conv1d.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.weight", "encoder.cbhg.cbhg.conv1d_banks.0.bn.bias", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_mean", "encoder.cbhg.cbhg.conv1d_banks.0.bn.running_var", "encoder.cbhg.cbhg.conv1d_banks.1

Tacotron: Using randomized teacher-forcing

Assuming only using teacher-forcing would lead overfitting, it might be better to select randomly between real network output and the ground-truth for each iteration of decoder.

Tacotron or Tacotron2?

Hi,

Curious question. This repository is named "tacotron"; therefore, are you implementing tacotron or tacotron-2. From my understanding, it's easier to implement the tacotron-2 architecture and it is of higher quality!

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.