GithubHelp home page GithubHelp logo

cnn_vocoder's Introduction

CNNVocoder

A CNN-based vocoder.

This work is inspired from m-cnn model described in Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks. The authors show that even a simple upsampling networks is enough to synthesis waveform from spectrogram/mel-spectrogram.

In this repo, I use spectrogram feature for training model because it contains more information than mel-spectrogram feature. However, because the transformation from spectrogram to mel-spectrogram is just a linear projection, so basically, you can train a simple network predict spectrogram from mel-spectrogram. You also can change parameters to be able to train a vocoder from mel-spectrogram feature too.

Architecture notes

Compare with m-cnn, my proposed network have some differences:

  • I use Upsampling + Conv layers instead of TransposedConv layer. This helps to prevent checkerboard artifacts.
  • The model use a lot of residual blocks pre/after the upsampling module to make network larger/deeper.
  • I only used l1 loss between log-scale STFT-magnitude of predicted and target waveform. Evaluation loss on log space is better than on raw STFT-magnitude because it's closer to human sensation about loudness. I tried to compute loss on spectrogram feature, but it didn't help much.

Install requirements

$ pip install -r requirements.txt

Training vocoder

1. Prepare dataset

I use LJSpeech dataset for my experiment. If you don't have it yet, please download dataset and put it somewhere.

After that, you can run command to generate dataset for our experiment:

$ python preprocessing.py --samples_per_audio 20 \ 
--out_dir ljspeech \
--data_dir path/to/ljspeech/dataset \
--n_workers 4

2. Train vocoder

$ python train.py --out_dir ${output_directory}

For more training options, please run:

$ python train.py --help

Generate audio from spectrogram

  • Generate spectrogram from audio
$ python gen_spec.py -i sample.wav -o out.npz
  • Generate audio from spectrogram
$ python synthesis.py --model_path path/to/checkpoint \
                      --spec_path out.npz \
                      --out_path out.wav

Pretrained model

You can get my pre-trained model here.

Acknowledgements

This implementation uses code from NVIDIA, Ryuichi Yamamoto, Keith Ito as described in my code.

License

MIT

cnn_vocoder's People

Contributors

tuan3w avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.