Comments (19)
@room101b Hi, this model is very slow to train, it will take around a week with the implementation in this repo. The most obvious way, I think, to speed up training is to create an optimised CUDA kernel for the model, and even then I wouldn't be sure that that would speed up training that much. Add to that, the 1060 is slower than the card I've used (1080) so I wouldn't recommend this particular model unless you want to achieve real-time inference on a mobile app or something like that.
Instead I would recommend checking out FFTNet. I implemented it yesterday and it trains like a beast, without conditioning I was getting 7.5 batches/second - that's 10 times than WaveRNN/Wavenet! Also, it wasn't too hard on gpu memory either - much lighter than Wavenet although not as compact as WaveRNN.
Link: http://gfx.cs.princeton.edu/pubs/Jin_2018_FAR/fftnet-jin2018.pdf
As for conditioning, in my experience these vocoder models are quite robust to conditioning sites. Wavenet and FFTNet papers both have details on how to condition so those are a good start.
from wavernn.
@fatchord like this
N = 2048
x = batch * M * C
padding = batch * N * C
z = [padding x] -> M + N
while N > 1:
z = W_L * z[:, :M + N / 2] + W_R * z[:, N / 2:]
z = relu(W * relu(z))
N = N / 2
z = relu(Wz)
logits = Wz -> 0, 1 , ... M-1, M
from wavernn.
@fatchord Thanks for the response, its very helpful. My project will be doing something similar to NSynth, but generating abstract, music concrete style, noises. If you have any further advice it would be appreciated. Thanks.
from wavernn.
@room101b You're very welcome and your project sounds very interesting. By the way, I've uploaded what I've done so far with FFTNet if you wanna check it out:
https://github.com/fatchord/FFTNet
from wavernn.
As CNN can be easily paralleled on GPU, the training of CNN-like model (WaveNet, FFTNet) is fast.
model | training | inference |
---|---|---|
WaveNet | faster | slowest |
WaveRNN | slower | faster |
FFTNet | fastest | slower |
Parallel WaveNet | fast | fastest |
@fatchord awesome!
I also implement FFTNet
in yesterday and got positive result (dump from training; cached Inference is hard to implement in TensorFlow, WIP):
step 10k
step 130k
from wavernn.
@lifeiteng That looks pretty darn good! One thing in the paper had me scratching my head and I'd love to get your input on it.
In section 2.3.2 they say to zero pad by N (they don't explicitly define N but I strongly got the impression it was the receptive field for any given layer in the stack):
z[0:M] = W_L ∗ x[-N:M-N] + W_R ∗ x[0:M]
But if the previous equation (without zero padding) was:
z = W_L ∗ x[0:N/2] + W_R ∗ x[N/2:N]
Wouldn't that mean that the equation from 2.3.2 should read:
z[0:M] = W_L ∗ x[-N/2:M-N/2] + W_R ∗ x[0:M]
Am I missing something?
from wavernn.
@lifeiteng Thanks! Yeah that was what I was doing initially but the tensor output has an extra N steps in the output if you do it that way - just chop it off before backprop?
from wavernn.
logits = Wz -> 0, 1 , ... M-1, M
just one extra step output M
, yes drop it.
from wavernn.
@lifeiteng My bad, I was padding inside the layer (like a bloody idiot!). Thanks again.
from wavernn.
@fatchord I have sent you a gitter invitation for more in-depth communication.
from wavernn.
@lifeiteng Thanks, I'll make an account on Gitter now so.
from wavernn.
@fatchord your 12k iteration sample sounds good.
If WaveRNN is just very tuned RNN then training nn.GRU with 1024 hidden units on mu-law in-out after 12k should produce slightly worse sample but comparable. But is far far from that.
Any idea why is that?
from wavernn.
@iovdin "Far far from that" as in good or bad? Can you post a sample from your experiment?
from wavernn.
@fatchord Okay with weight decay and lower learning rate it seems to sound better ("Far from that" meant really bad)
https://lera.ai/s/3318a1
from wavernn.
@iovdin It doesn't sound too bad - especially considering it's so early in training. Also the 16bits in wavernn makes a big difference when it comes to noise reduction and dynamic range - mu law can only do so much to reduce noise at lower bit depths.
from wavernn.
@fatchord it shows 10ths of steps i.e. it is 100k steps of 128 seq_len, comparable to your 12k steps with 960 seq_len.
from wavernn.
@iovdin That sounds like too small a number of time steps for training. Even at a low samplerate of 16kHz, the lowest audible frequency starts around 30Hz which is ~500 steps. I would recommend upping it too around 1000 steps.
from wavernn.
@fatchord Guys from DeepsSound trained SampleRNN with 128 BPTT steps http://deepsound.io/samplernn_first.html
from wavernn.
@iovdin Cool link though - thanks!
I'm not too familiar with SampleRNN (although it's a very interesting model), so I can't really comment on it much.
Actually - doesn't SampleRNN operate on frames of samples? Perhaps it's 128 frames of 16 samples? Again, haven't read that paper yet so I could be wrong on that.
from wavernn.
Related Issues (20)
- Using wavernn pretrained model, loss stuck at 5.6
- Can I use pretrained models with different hparams settings?
- sentence long problem
- Train WaveRnn AttributeError HOT 5
- ValueError - gen_tacotron.py HOT 1
- Error During Computing Consensus Step HOT 1
- adding support for windows sapi5
- why do you minus 2 in preprocessing ?
- AttributeError: module 'librosa' has no attribute 'output' HOT 4
- data\\dataset.pkl isssue HOT 1
- [feature request] dynamic batch size during WaveRNN training depending on free/total GPU memory
- Tacotron to Onnx HOT 1
- Where is the audio file for which itis generating the text? HOT 2
- (Solved, but can be useful to someone) Problems getting the project working for the first time
- spectrogram (image_-to-wav HOT 1
- Help
- Is it possible to generate music using WaveRNN?
- ModuleNotFoundError: No module named 'numba.decorators' When Running quick_start.py HOT 2
- Failed to build wheel for llvmlite
- [CONTRIBUTION] Speech Dataset Generator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wavernn.