Comments (14)
Here is generated 80k samples, primed with 8k sample audio from other database.
generated_l2_primed.wav.zip
Soundwave looks reasonably OK (green - generated audio)
Notes:
- used af4c58e
- trained for ~20k steps with learning rate of 0.01 and continued for ~60k steps with 0.001
- @lelayf it is TitanX GPU I'm using
- used l2 regularization
- disabled silence trimming because of #59
- there was a bug in WaveNet.decode, which resulted to all-zeros output. I think bug is still here in fc5417d
from tensorflow-wavenet.
I've observed the same things. I looked at the code to see what might be hanging and didn't find any red flags. I thought the hang might be related to my setup: CUDA 8.0rc (required for Pascal support), cuDNN 5.1, and tensorflow built from source (git master from 9/20)
from tensorflow-wavenet.
The hanging is probably caused by the background audio processing crashing. (Especially if if the CPU/GPU are idle once it stops).
Usually, there should be a backtrace that can help us find the reason it crashed.
Which commit did you observe the problem with?
There was a bug where we simply stop processing audio once we've seen every file once.
It might be that you're on an older commit that had this problem.
I've been trying to find a solution to the gradient jumping to large values at large step numbers, but don't have any amazing solutions at the moment.
It seems to be related to the ReLU activations in the last few layers of the network.
I've tried clipping the gradients, which didn't have an effect on this problem.
Replacing the ReLU activations with Tanh seems to fix it completely, but the network doesn't converge quite as quickly as with ReLU.
from tensorflow-wavenet.
@ibab I'm experiencing the stalling with the latest commit.
@r-zemblys if you resume training at the checkpoint right before gradients implosion with a lower learning rate does it still behave the same ?
from tensorflow-wavenet.
@lelayf i've used learning rate of 0.01 to get that loss curve above. Train saver only stores last 5 checkpoints so I'm not able to try lowering learning rate right before gradient implosion.
@ibab I was indeed using older commit. Latest one does not have stalling problem.
Here is loss curve with l2 regularization added; orange - learning rate 0.01 (~20k steps), blue - 0.001 (~60k steps)
Gradient implosion problem is gone, but it seems network is not learning anymore after first epoch. Will try to generate some audio later today.
from tensorflow-wavenet.
@r-zemblys are you training on GPU or CPU ?
from tensorflow-wavenet.
@r-zemblys: Excellent, did you use the default wavenet_params.json
?
I've also linked some of my results in #47.
from tensorflow-wavenet.
Forgot to add. This is configuration I've used:
{
"filter_width": 2,
"quantization_steps": 256,
"sample_rate": 16000,
"dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512],
"residual_channels": 32,
"dilation_channels": 16,
"use_biases": false
}
But as I've mention in the beginning, there is no difference (at least in loss curve) if using default configuration.
from tensorflow-wavenet.
@r-zemblys: Did you train on the entire dataset, or a specific speaker?
from tensorflow-wavenet.
@ibab: entire VCTK corpus. And then primed generation with a recording from LibriSpeech ASR corpus.
from tensorflow-wavenet.
That's very cool. I think mixing together all different speakers explains the voice difference between your sample and mine.
Would you be interested in contributing the l2 regularization in a pull request?
from tensorflow-wavenet.
I'm using python 2.7 and as r-zemplys mentioned above as "..there was a bug in WaveNet.decode, which resulted to all-zeros output", I obtained the generated.wav file with all-zeros.
After fixing the last line of "wavenet_ops.py" like below, I am now getting the speech-like waveform output.
magnitude = (1 / mu) * ((1 + mu)**abs(signal) - 1)
--> magnitude = (1. / mu) * ((1. + mu)**abs(signal) - 1)
Hope someone reflect it to the code if necessary.
from tensorflow-wavenet.
@hoonyoung: This should be fixed on master now. I've also enabled travis to run the tests with Python 2.
from tensorflow-wavenet.
I commented out silence trimming and now training does not stall anymore, using 88e77bf.
from tensorflow-wavenet.
Related Issues (20)
- how dialated convolution actually work ?
- How to stop and resume training HOT 2
- Problem on runing it on colab HOT 2
- generate.py very slow with GPU HOT 1
- TypeError: cast() missing 1 required positional argument: 'dtype'
- tensorboard result: the generated audio of generate.py is 0 seconds
- Understanding convolution kernels in dilation layers HOT 4
- TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: uint8, int32, int64 HOT 1
- I failed to download the dataset, how should I resolve the voice HOT 1
- My loss function fluctuates like crazy.
- Colab problem: continue previous training HOT 4
- problem on generate only noise HOT 5
- testing much worse than training?
- QUESTION How long does it take to generate one sample? HOT 1
- Module 'tensorflow' has no attribute 'placeholder' HOT 8
- Why is there no activation function applied to the 1x1 conv that produces the dense output?
- ModuleNotFoundError: No module named 'tensorflow.contrib' HOT 1
- about loading VCTK_Corpus dataset?
- Project dependencies may have API risk issues
- Training wavenet to rap?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-wavenet.