Hi, I'm thinking of building WaveRNN as part of a masters project and I was just wonde

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

What is the training time? about wavernn HOT 19 CLOSED

fatchord commented on July 17, 2024

What is the training time?

from wavernn.

Comments (19)

fatchord commented on July 17, 2024 3

@room101b Hi, this model is very slow to train, it will take around a week with the implementation in this repo. The most obvious way, I think, to speed up training is to create an optimised CUDA kernel for the model, and even then I wouldn't be sure that that would speed up training that much. Add to that, the 1060 is slower than the card I've used (1080) so I wouldn't recommend this particular model unless you want to achieve real-time inference on a mobile app or something like that.

Instead I would recommend checking out FFTNet. I implemented it yesterday and it trains like a beast, without conditioning I was getting 7.5 batches/second - that's 10 times than WaveRNN/Wavenet! Also, it wasn't too hard on gpu memory either - much lighter than Wavenet although not as compact as WaveRNN.

Link: http://gfx.cs.princeton.edu/pubs/Jin_2018_FAR/fftnet-jin2018.pdf

As for conditioning, in my experience these vocoder models are quite robust to conditioning sites. Wavenet and FFTNet papers both have details on how to condition so those are a good start.

from wavernn.

lifeiteng commented on July 17, 2024 2

@fatchord like this

N = 2048
x = batch * M * C
padding = batch * N * C
z = [padding x] -> M + N
while N > 1:
   z = W_L * z[:, :M + N / 2] + W_R * z[:, N / 2:]
   z = relu(W * relu(z))
   N = N / 2
z = relu(Wz)
logits = Wz  -> 0, 1 , ... M-1, M

from wavernn.

james-wynne-dev commented on July 17, 2024

@fatchord Thanks for the response, its very helpful. My project will be doing something similar to NSynth, but generating abstract, music concrete style, noises. If you have any further advice it would be appreciated. Thanks.

from wavernn.

fatchord commented on July 17, 2024

@room101b You're very welcome and your project sounds very interesting. By the way, I've uploaded what I've done so far with FFTNet if you wanna check it out:
https://github.com/fatchord/FFTNet

from wavernn.

lifeiteng commented on July 17, 2024

As CNN can be easily paralleled on GPU, the training of CNN-like model (WaveNet, FFTNet) is fast.

model	training	inference
WaveNet	faster	slowest
WaveRNN	slower	faster
FFTNet	fastest	slower
Parallel WaveNet	fast	fastest

@fatchord awesome！
I also implement FFTNet in yesterday and got positive result (dump from training; cached Inference is hard to implement in TensorFlow, WIP):

step 10k

step 130k

from wavernn.

fatchord commented on July 17, 2024

@lifeiteng That looks pretty darn good! One thing in the paper had me scratching my head and I'd love to get your input on it.

In section 2.3.2 they say to zero pad by N (they don't explicitly define N but I strongly got the impression it was the receptive field for any given layer in the stack):

z[0:M] = W_L ∗ x[-N:M-N] + W_R ∗ x[0:M]

But if the previous equation (without zero padding) was:

z = W_L ∗ x[0:N/2] + W_R ∗ x[N/2:N]

Wouldn't that mean that the equation from 2.3.2 should read:

z[0:M] = W_L ∗ x[-N/2:M-N/2] + W_R ∗ x[0:M]

Am I missing something?

from wavernn.

fatchord commented on July 17, 2024

@lifeiteng Thanks! Yeah that was what I was doing initially but the tensor output has an extra N steps in the output if you do it that way - just chop it off before backprop?

from wavernn.

lifeiteng commented on July 17, 2024

logits = Wz -> 0, 1 , ... M-1, M just one extra step output M, yes drop it.

from wavernn.

fatchord commented on July 17, 2024

@lifeiteng My bad, I was padding inside the layer (like a bloody idiot!). Thanks again.

from wavernn.

lifeiteng commented on July 17, 2024

@fatchord I have sent you a gitter invitation for more in-depth communication.

from wavernn.

fatchord commented on July 17, 2024

@lifeiteng Thanks, I'll make an account on Gitter now so.

from wavernn.

iovdin commented on July 17, 2024

@fatchord your 12k iteration sample sounds good.
If WaveRNN is just very tuned RNN then training nn.GRU with 1024 hidden units on mu-law in-out after 12k should produce slightly worse sample but comparable. But is far far from that.
Any idea why is that?

from wavernn.

fatchord commented on July 17, 2024

@iovdin "Far far from that" as in good or bad? Can you post a sample from your experiment?

from wavernn.

iovdin commented on July 17, 2024

@fatchord Okay with weight decay and lower learning rate it seems to sound better ("Far from that" meant really bad)
https://lera.ai/s/3318a1

from wavernn.

fatchord commented on July 17, 2024

@iovdin It doesn't sound too bad - especially considering it's so early in training. Also the 16bits in wavernn makes a big difference when it comes to noise reduction and dynamic range - mu law can only do so much to reduce noise at lower bit depths.

from wavernn.

iovdin commented on July 17, 2024

@fatchord it shows 10ths of steps i.e. it is 100k steps of 128 seq_len, comparable to your 12k steps with 960 seq_len.

from wavernn.

fatchord commented on July 17, 2024

@iovdin That sounds like too small a number of time steps for training. Even at a low samplerate of 16kHz, the lowest audible frequency starts around 30Hz which is ~500 steps. I would recommend upping it too around 1000 steps.

from wavernn.

iovdin commented on July 17, 2024

@fatchord Guys from DeepsSound trained SampleRNN with 128 BPTT steps http://deepsound.io/samplernn_first.html

from wavernn.

fatchord commented on July 17, 2024

@iovdin Cool link though - thanks!

I'm not too familiar with SampleRNN (although it's a very interesting model), so I can't really comment on it much.

Actually - doesn't SampleRNN operate on frames of samples? Perhaps it's 128 frames of 16 samples? Again, haven't read that paper yet so I could be wrong on that.

from wavernn.

What is the training time? about wavernn HOT 19 CLOSED

Comments (19)

step 10k

step 130k

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs