Has anybody got loss lower than ~2? Tried couple of configurations (default, 3 and 4 s

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Forgot to add. This is configuration I've used: <div class="snippet-clipboard-cont

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

add regularization, dropout and batch norm? about tensorflow-wavenet HOT 14 CLOSED

ibab commented on May 4, 2024 1

add regularization, dropout and batch norm?

from tensorflow-wavenet.

Comments (14)

r-zemblys commented on May 4, 2024 1

Here is generated 80k samples, primed with 8k sample audio from other database.
generated_l2_primed.wav.zip

Soundwave looks reasonably OK (green - generated audio)

Notes:

used af4c58e
trained for ~20k steps with learning rate of 0.01 and continued for ~60k steps with 0.001
@lelayf it is TitanX GPU I'm using
used l2 regularization
disabled silence trimming because of #59
there was a bug in WaveNet.decode, which resulted to all-zeros output. I think bug is still here in fc5417d

from tensorflow-wavenet.

dnuffer commented on May 4, 2024

I've observed the same things. I looked at the code to see what might be hanging and didn't find any red flags. I thought the hang might be related to my setup: CUDA 8.0rc (required for Pascal support), cuDNN 5.1, and tensorflow built from source (git master from 9/20)

from tensorflow-wavenet.

ibab commented on May 4, 2024

The hanging is probably caused by the background audio processing crashing. (Especially if if the CPU/GPU are idle once it stops).
Usually, there should be a backtrace that can help us find the reason it crashed.
Which commit did you observe the problem with?
There was a bug where we simply stop processing audio once we've seen every file once.
It might be that you're on an older commit that had this problem.

I've been trying to find a solution to the gradient jumping to large values at large step numbers, but don't have any amazing solutions at the moment.
It seems to be related to the ReLU activations in the last few layers of the network.
I've tried clipping the gradients, which didn't have an effect on this problem.
Replacing the ReLU activations with Tanh seems to fix it completely, but the network doesn't converge quite as quickly as with ReLU.

from tensorflow-wavenet.

lelayf commented on May 4, 2024

@ibab I'm experiencing the stalling with the latest commit.
@r-zemblys if you resume training at the checkpoint right before gradients implosion with a lower learning rate does it still behave the same ?

from tensorflow-wavenet.

r-zemblys commented on May 4, 2024

@lelayf i've used learning rate of 0.01 to get that loss curve above. Train saver only stores last 5 checkpoints so I'm not able to try lowering learning rate right before gradient implosion.

@ibab I was indeed using older commit. Latest one does not have stalling problem.

Here is loss curve with l2 regularization added; orange - learning rate 0.01 (~20k steps), blue - 0.001 (~60k steps)

Gradient implosion problem is gone, but it seems network is not learning anymore after first epoch. Will try to generate some audio later today.

from tensorflow-wavenet.

lelayf commented on May 4, 2024

@r-zemblys are you training on GPU or CPU ?

from tensorflow-wavenet.

ibab commented on May 4, 2024

@r-zemblys: Excellent, did you use the default wavenet_params.json?
I've also linked some of my results in #47.

from tensorflow-wavenet.

r-zemblys commented on May 4, 2024

Forgot to add. This is configuration I've used:


{
    "filter_width": 2,
    "quantization_steps": 256,
    "sample_rate": 16000,
    "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512],
    "residual_channels": 32,
    "dilation_channels": 16,
    "use_biases": false
}

But as I've mention in the beginning, there is no difference (at least in loss curve) if using default configuration.

from tensorflow-wavenet.

ibab commented on May 4, 2024

@r-zemblys: Did you train on the entire dataset, or a specific speaker?

from tensorflow-wavenet.

r-zemblys commented on May 4, 2024

@ibab: entire VCTK corpus. And then primed generation with a recording from LibriSpeech ASR corpus.

from tensorflow-wavenet.

ibab commented on May 4, 2024

That's very cool. I think mixing together all different speakers explains the voice difference between your sample and mine.
Would you be interested in contributing the l2 regularization in a pull request?

from tensorflow-wavenet.

hoonyoung commented on May 4, 2024

I'm using python 2.7 and as r-zemplys mentioned above as "..there was a bug in WaveNet.decode, which resulted to all-zeros output", I obtained the generated.wav file with all-zeros.

After fixing the last line of "wavenet_ops.py" like below, I am now getting the speech-like waveform output.

magnitude = (1 / mu) * ((1 + mu)**abs(signal) - 1)
--> magnitude = (1. / mu) * ((1. + mu)**abs(signal) - 1)

Hope someone reflect it to the code if necessary.

from tensorflow-wavenet.

ibab commented on May 4, 2024

@hoonyoung: This should be fixed on master now. I've also enabled travis to run the tests with Python 2.

from tensorflow-wavenet.

lelayf commented on May 4, 2024

I commented out silence trimming and now training does not stall anymore, using 88e77bf.

from tensorflow-wavenet.

add regularization, dropout and batch norm? about tensorflow-wavenet HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs