rishikksh20 / fastspeech2 Goto Github PK

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech

License: Apache License 2.0

Python 13.94% Jupyter Notebook 86.06%

fastspeech fastspeech2 pytorch text-to-speech tts tts-engines

fastspeech2's Introduction

Hi there, I'm Rishikesh, Speech and Computer Vision Researcher👋

Hi friends, I'm Rishikesh, Co-founder and CTO of Dubpro.ai (formely known as DeepSync Technologies). I graduated from NIT Silchar and immediately after my graduation I joined my first organisation, Nucleus Software as Full Stack Developer role. I have a keen interest in machine learning and deep learning research, especially in a field of speech synthesis and computer vision.

🔭 I’m currently working on Speech Synthesis and End to End Text to Speech (TTS) engines.
🌱 I love to code and contribute to Open Source.
💬 Ask me anything regarding my work, code and research here (Please tag me @rishikksh20 in your comment.).
📫 How to reach me: [email protected]

Connect with me:

Languages and Tools:

fastspeech2's People

Contributors

Stargazers

Watchers

fastspeech2's Issues

Tried in other language

I tried to train FastSpeech2 in other language. In that step using MFA to get alignment, i dont know how to apply for my language?

hparams.py is missing in colabs

when running the code in colabs, hparams.py is missing.

`def synthesis(text, path):
"""Decode with E2E-TTS model."""

print("TTS synthesis")
# read training config
idim = hp.symbol_len 
odim = hp.num_mels
model = FeedForwardTransformer(idim, odim)`

GriffinLim. RuntimeError: Given transposed=1, weight of size [1026, 1, 1024], expected input[1, 160, 181] to have 1026 channels, but got 160 channels instead

Run "python inference.py"

2021-05-25 11:41:42.188662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-25 11:41:44.038910: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-25 11:41:44.067997: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-25 11:41:44.068584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-05-25 11:41:44.068651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-25 11:41:44.071265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-25 11:41:44.071371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-25 11:41:44.073321: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-25 11:41:44.073801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-25 11:41:44.073918: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2021-05-25 11:41:44.074408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-25 11:41:44.074614: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-25 11:41:44.074639: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-25 11:41:44.074929: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-25 11:41:44.075066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-25 11:41:44.075084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
Starting
Text :  С трево+жным чу+вством беру+сь я+ за+ перо+
Checkpoint :  checkpoints/sova_fix/sova_fix_fastspeech_7788502_61k_steps.pyt
2021-05-25 11:42:00.754573: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-05-25 11:42:00.754979: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2199995000 Hz
TTS synthesis
predicting
Traceback (most recent call last):
  File "inference.py", line 259, in <module>
    main(sys.argv[1:])
  File "inference.py", line 217, in main
    wav = griffin_lim(m, stft, 30)
  File "/content/drive/MyDrive/FastSpeech2-1/dataset/audio_processing.py", line 239, in griffin_lim
    signal = stft_fn.inverse(magnitudes, angles).squeeze(1)
  File "/content/drive/MyDrive/FastSpeech2-1/utils/stft.py", line 122, in inverse
    padding=0
RuntimeError: Given transposed=1, weight of size [1026, 1, 1024], expected input[1, 160, 181] to have 1026 channels, but got 160 channels instead

I want to synthesize speech using the griffin-lim algorithm, but an error comes out.
I have set in default.yaml "melgan_vocoder: True".

I don’t understand what’s wrong. What is the error?
How to fix?

failure to open demo in Jupyter

I would like to try the demo in this notebook. However, I am encountering this error.

What should I do to resolve this in order to run the demo?

demo_fastspeech2.jpynb miss ',' in line 505

Calculated padded input size per channel: (8). Kernel size: (9). Kernel size can't be greater than actual input size

When I try to run inference using a checkpoint I created:
python .\inference.py -c .\config\default.yaml -p .\checkpoints\output\output_fastspeech_d7ef3cf_1k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

I get the following error:
RuntimeError: Calculated padded input size per channel: (8). Kernel size: (9). Kernel size can't be greater than actual input size

I trained using the following setting in the Default.yaml file:
positionwise_conv_kernel_size : 9

When I attempt to train with positionwise_conv_kernel_size : 8 instead of 9, I get a training error. Any help would be appreciated.

Attempting!

Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data

I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs.
You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?

Cheers

Noisy output when using the provided checkpoints

Hello, Thanks for providing this repo. I get a very noisy output when using the checkpoints you provided (https://drive.google.com/drive/folders/1Fh7zr8zoTydNpD6hTNBPKUGN_s93Bqrs) to do the synthesis (using the synthesis.py code). I have also trained a fastspeech2 model using your code and I am getting a noisy output using my checkpoints as well. I have attached the output when using the checkpoints you've provided (checkpoint_model_150k_steps.pyt).
output.zip. I would be grateful if I could know what causes the difference between the attached generated output and the outputs provided in the sample folder.

AttributeError: 'NoneType' object has no attribute 'T' while trying synthesis.py

Trained the model. while doing evaluation getting below error
AttributeError: 'NoneType' object has no attribute 'T'

Tensorboard ? :D

Do you have any tensorboard :D, an audio samples sound good except background noise, maybe training longer will solve this problem haha :D, great job :D.

Train with Muti GPUs

Hi, it great job. But i meet some problem when i train with Mutiple GPUs. when i use 4 gpus to train and set the batch size to 256 in train.yaml. the Pitch Loss get 200~400 and the tone is not the same as the speaker. Using single gpu(batch size set 16) to train is ok. Do you have any idear for this problem? do i need change any other parameter when i change the batch size.

Question about duration predictor.

Hello @rishikksh20 ,

First of all, thank you for sharing your awesome works such as FastSpeech2, VocGAN, AdaSpeech and etcs.
It helped me a lot.

I leave this question due to my curiosity about duration predictor improvements.
A few months ago, evaluation performance of duration predictor seemed to be not good due to overfitting. (train error: below 0.01, but eval error: 0.5~0.6.)

But, now it has been drastically improved (eval error 0.5~0.6 -> 0.06 ~ 0.08).

If you don't mind, could you tell me what was the problem of your previous version of duration predictor?

Always appreciate,

rishikksh20 / fastspeech2 Goto Github PK

fastspeech2's Introduction

Hi there, I'm Rishikesh, Speech and Computer Vision Researcher👋

Connect with me:

Languages and Tools:

fastspeech2's People

Contributors

Stargazers

Watchers

Forkers

fastspeech2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs