GithubHelp home page GithubHelp logo

ncnn-hifi-gan's Introduction

ncnn-hifi-GAN

img

VULKAN support ...

HiFi-GAN - GAN-based high-speed Neural Vocoder for Efficient and High Fidelity Speech Synthesis in TTS pipeline and Realistic Voice Conversion.

HiFi-GAN has improved the shortcomings of poor voice quality in previous GAN-based works.

The experimental results prove that HiFi-GAN can generate 22.05 kHz speech 13.4 times faster than autoregressive models.

In TTS based on deep learning, there are two stages to generate speech from text:

  1. generate mel-spec from text, typically such as Tacotron and FastSpeech ,
  2. generate speech from mel-spec, such as WaveNet and WaveRNN .

The performance of WaveNet is almost the same as that of human speech, but the generation speed is too slow. Recently, GAN-based Vocoder, such as MelGAN, tries to further increase the speed of speech generation. However, this type of model sacrifices quality while improving efficiency. Therefore, researchers hope to have a Vocoder with both efficiency and quality, this is HiFi-GAN.

melgram_flipped

output.mp4

How to use.

  1. Download model hifivoice and place it in /models folder.
  2. hifivoice.exe -i melgram_flipped.jpg
  3. The input range of the mel-spectrogram for the vocoder is approximately from -11 to 2. For example, we take a mel-spectrogram saved in a regular jpg file with a magnitude range of 0..255. To use mel-spectrogram from a picture, the values need to be scaled. Mel_Image = Mel_Image * (1/255) * 13 - 11 = we get a range of values from -11 to 2.
  4. Input Mel spectrogram paramters:
    • n_fft = 1024
    • num_mels = 80
    • sampling_rate = 22050
    • hop_size = 256
    • win_size = 1024
    • fmin = 0
    • fmax = 8000

NCNN is a high-performance neural network.

HiFi-GAN Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.

ncnn-hifi-gan's People

Contributors

magicse avatar

Stargazers

 avatar 赵东阳 avatar  avatar  avatar nnn avatar  avatar XianyanLin avatar Prof Syd Xu avatar PixLab | Symisc Systems avatar  avatar 佰阅 avatar  avatar Xiaomin Tang avatar Sandalots avatar 爱可可-爱生活 avatar Yuan-Man avatar zyser avatar  avatar Justin John avatar MaxMax avatar Alex James avatar Dicky avatar

Watchers

 avatar Kostas Georgiou avatar  avatar Justin John avatar

ncnn-hifi-gan's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.