GithubHelp home page GithubHelp logo

Slow inference time? about wavernn HOT 5 CLOSED

fatchord avatar fatchord commented on July 17, 2024
Slow inference time?

from wavernn.

Comments (5)

fatchord avatar fatchord commented on July 17, 2024

@jbecke Thank you for your kind words. Have you tried the batched inference mode with WaveRNN? That should be around realtime on a decent GPU (note that audio quality suffers a little bit - play around with target and overlaps settings if you're interested in tinkering with it).

You can change the default setting for it in hparams.voc_gen_batched.

from wavernn.

jbecke avatar jbecke commented on July 17, 2024

Thanks! But unfortunately, in my use-case I do not have the whole spectrogram before vocoding i.e. chunking and vocoding the spectrogram in parallel will not work. In my application the target spectrogram is being synthesized in real time, then I need to vocode that in real time as well.

I suppose I should implement and train the reference WaveRNN instead of this modified model? Or do you think FFTNet would be better?

Also, were you able to match the paper's results using the reference WaveRNN model?

from wavernn.

fatchord avatar fatchord commented on July 17, 2024

@jbecke Well it's up to you I guess. All I'll say is that the spectrogram generation + wavernn vocoding in this repo is pretty damn fast and combined is often faster than realtime on my trusty old 1080GTX. I'm just wondering - why are you chunking the spectrogram in your application? I've found tacotron models to be pretty fast at generating output.

I haven't had great results with the exact model described in the WaveRNN paper. Keep in mind though that that model was incomplete since they didn't describe the conditioning network in any detail. Other open source attempts at recreating it haven't convinced me either from a sound quality point of view - it'll work ok on ljspeech but sound terrible on lower register male voices.

from wavernn.

jbecke avatar jbecke commented on July 17, 2024

I'm working on real-time voice conversion. So it's not that I'm chunking the spectrogram but that inherently it is impossible to have the whole spectrogram (at time t only have spectrogram for [0,t]). Thanks for your advice! I'm going to try throwing a couple V100s at the issue and see if I can replicate their results...

from wavernn.

MorganCZY avatar MorganCZY commented on July 17, 2024

@jbecke Hi, may I ask what method or framework do you use to implement real-time voice conversion?

from wavernn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.