GithubHelp home page GithubHelp logo

rishikksh20 / fastspeech2 Goto Github PK

View Code? Open in Web Editor NEW
223.0 223.0 51.0 11.87 MB

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech

License: Apache License 2.0

Python 13.94% Jupyter Notebook 86.06%
fastspeech fastspeech2 pytorch text-to-speech tts tts-engines

fastspeech2's Introduction

Hi there, I'm Rishikesh, Speech and Computer Vision Researcher👋

Hi friends, I'm Rishikesh, Co-founder and CTO of Dubpro.ai (formely known as DeepSync Technologies). I graduated from NIT Silchar and immediately after my graduation I joined my first organisation, Nucleus Software as Full Stack Developer role. I have a keen interest in machine learning and deep learning research, especially in a field of speech synthesis and computer vision.

  • 🔭 I’m currently working on Speech Synthesis and End to End Text to Speech (TTS) engines.
  • 🌱 I love to code and contribute to Open Source.
  • 💬 Ask me anything regarding my work, code and research here (Please tag me @rishikksh20 in your comment.).
  • 📫 How to reach me: [email protected]

Connect with me:

ai_rishikesh | Twitter


Languages and Tools:

Python

PyTorch

Github

Visual Studio Code

AWS

Azure

Github

Github

fastspeech2's People

Contributors

0xflotus avatar carankt avatar karan-deepsync avatar pranjalya avatar rishikksh20 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastspeech2's Issues

Tried in other language

I tried to train FastSpeech2 in other language. In that step using MFA to get alignment, i dont know how to apply for my language?

hparams.py is missing in colabs

when running the code in colabs, hparams.py is missing.

`def synthesis(text, path):
"""Decode with E2E-TTS model."""

print("TTS synthesis")
# read training config
idim = hp.symbol_len 
odim = hp.num_mels
model = FeedForwardTransformer(idim, odim)`

GriffinLim. RuntimeError: Given transposed=1, weight of size [1026, 1, 1024], expected input[1, 160, 181] to have 1026 channels, but got 160 channels instead

Run "python inference.py"

2021-05-25 11:41:42.188662: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-25 11:41:44.038910: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-25 11:41:44.067997: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-25 11:41:44.068584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-05-25 11:41:44.068651: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-25 11:41:44.071265: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-25 11:41:44.071371: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-25 11:41:44.073321: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-25 11:41:44.073801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-25 11:41:44.073918: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2021-05-25 11:41:44.074408: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-25 11:41:44.074614: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-25 11:41:44.074639: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-25 11:41:44.074929: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-25 11:41:44.075066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-25 11:41:44.075084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      
Starting
Text :  С трево+жным чу+вством беру+сь я+ за+ перо+
Checkpoint :  checkpoints/sova_fix/sova_fix_fastspeech_7788502_61k_steps.pyt
2021-05-25 11:42:00.754573: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-05-25 11:42:00.754979: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2199995000 Hz
TTS synthesis
predicting
Traceback (most recent call last):
  File "inference.py", line 259, in <module>
    main(sys.argv[1:])
  File "inference.py", line 217, in main
    wav = griffin_lim(m, stft, 30)
  File "/content/drive/MyDrive/FastSpeech2-1/dataset/audio_processing.py", line 239, in griffin_lim
    signal = stft_fn.inverse(magnitudes, angles).squeeze(1)
  File "/content/drive/MyDrive/FastSpeech2-1/utils/stft.py", line 122, in inverse
    padding=0
RuntimeError: Given transposed=1, weight of size [1026, 1, 1024], expected input[1, 160, 181] to have 1026 channels, but got 160 channels instead

I want to synthesize speech using the griffin-lim algorithm, but an error comes out.
I have set in default.yaml "melgan_vocoder: True".

I don’t understand what’s wrong. What is the error?
How to fix?

failure to open demo in Jupyter

Screenshot 2021-09-06 at 12 00 52 AM

I would like to try the demo in this notebook. However, I am encountering this error.

What should I do to resolve this in order to run the demo?

Calculated padded input size per channel: (8). Kernel size: (9). Kernel size can't be greater than actual input size

When I try to run inference using a checkpoint I created:
python .\inference.py -c .\config\default.yaml -p .\checkpoints\output\output_fastspeech_d7ef3cf_1k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

I get the following error:
RuntimeError: Calculated padded input size per channel: (8). Kernel size: (9). Kernel size can't be greater than actual input size

I trained using the following setting in the Default.yaml file:
positionwise_conv_kernel_size : 9

When I attempt to train with positionwise_conv_kernel_size : 8 instead of 9, I get a training error. Any help would be appreciated.

Attempting!

Hi

Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data

I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs.
You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?

Cheers

Noisy output when using the provided checkpoints

Hello, Thanks for providing this repo. I get a very noisy output when using the checkpoints you provided (https://drive.google.com/drive/folders/1Fh7zr8zoTydNpD6hTNBPKUGN_s93Bqrs) to do the synthesis (using the synthesis.py code). I have also trained a fastspeech2 model using your code and I am getting a noisy output using my checkpoints as well. I have attached the output when using the checkpoints you've provided (checkpoint_model_150k_steps.pyt).
output.zip. I would be grateful if I could know what causes the difference between the attached generated output and the outputs provided in the sample folder.

Tensorboard ? :D

Do you have any tensorboard :D, an audio samples sound good except background noise, maybe training longer will solve this problem haha :D, great job :D.

Train with Muti GPUs

Hi, it great job. But i meet some problem when i train with Mutiple GPUs. when i use 4 gpus to train and set the batch size to 256 in train.yaml. the Pitch Loss get 200~400 and the tone is not the same as the speaker. Using single gpu(batch size set 16) to train is ok. Do you have any idear for this problem? do i need change any other parameter when i change the batch size.

Question about duration predictor.

Hello @rishikksh20 ,

First of all, thank you for sharing your awesome works such as FastSpeech2, VocGAN, AdaSpeech and etcs.
It helped me a lot.

I leave this question due to my curiosity about duration predictor improvements.
A few months ago, evaluation performance of duration predictor seemed to be not good due to overfitting. (train error: below 0.01, but eval error: 0.5~0.6.)

But, now it has been drastically improved (eval error 0.5~0.6 -> 0.06 ~ 0.08).

If you don't mind, could you tell me what was the problem of your previous version of duration predictor?

Always appreciate,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.