Thanks for this great implementation <a class="user-mention notranslate" data-hovercar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Slow inference time? about wavernn HOT 5 CLOSED

fatchord commented on July 17, 2024

Slow inference time?

from wavernn.

Comments (5)

fatchord commented on July 17, 2024

@jbecke Thank you for your kind words. Have you tried the batched inference mode with WaveRNN? That should be around realtime on a decent GPU (note that audio quality suffers a little bit - play around with target and overlaps settings if you're interested in tinkering with it).

You can change the default setting for it in hparams.voc_gen_batched.

from wavernn.

jbecke commented on July 17, 2024

Thanks! But unfortunately, in my use-case I do not have the whole spectrogram before vocoding i.e. chunking and vocoding the spectrogram in parallel will not work. In my application the target spectrogram is being synthesized in real time, then I need to vocode that in real time as well.

I suppose I should implement and train the reference WaveRNN instead of this modified model? Or do you think FFTNet would be better?

Also, were you able to match the paper's results using the reference WaveRNN model?

from wavernn.

fatchord commented on July 17, 2024

@jbecke Well it's up to you I guess. All I'll say is that the spectrogram generation + wavernn vocoding in this repo is pretty damn fast and combined is often faster than realtime on my trusty old 1080GTX. I'm just wondering - why are you chunking the spectrogram in your application? I've found tacotron models to be pretty fast at generating output.

I haven't had great results with the exact model described in the WaveRNN paper. Keep in mind though that that model was incomplete since they didn't describe the conditioning network in any detail. Other open source attempts at recreating it haven't convinced me either from a sound quality point of view - it'll work ok on ljspeech but sound terrible on lower register male voices.

from wavernn.

jbecke commented on July 17, 2024

I'm working on real-time voice conversion. So it's not that I'm chunking the spectrogram but that inherently it is impossible to have the whole spectrogram (at time t only have spectrogram for [0,t]). Thanks for your advice! I'm going to try throwing a couple V100s at the issue and see if I can replicate their results...

from wavernn.

MorganCZY commented on July 17, 2024

@jbecke Hi, may I ask what method or framework do you use to implement real-time voice conversion?

from wavernn.

Recommend Projects

Slow inference time? about wavernn HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs