GithubHelp home page GithubHelp logo

Comments (14)

r-zemblys avatar r-zemblys commented on May 4, 2024 1

Here is generated 80k samples, primed with 8k sample audio from other database.
generated_l2_primed.wav.zip

Soundwave looks reasonably OK (green - generated audio)
soundwave

Notes:

  • used af4c58e
  • trained for ~20k steps with learning rate of 0.01 and continued for ~60k steps with 0.001
  • @lelayf it is TitanX GPU I'm using
  • used l2 regularization
  • disabled silence trimming because of #59
  • there was a bug in WaveNet.decode, which resulted to all-zeros output. I think bug is still here in fc5417d

from tensorflow-wavenet.

dnuffer avatar dnuffer commented on May 4, 2024

I've observed the same things. I looked at the code to see what might be hanging and didn't find any red flags. I thought the hang might be related to my setup: CUDA 8.0rc (required for Pascal support), cuDNN 5.1, and tensorflow built from source (git master from 9/20)

from tensorflow-wavenet.

ibab avatar ibab commented on May 4, 2024

The hanging is probably caused by the background audio processing crashing. (Especially if if the CPU/GPU are idle once it stops).
Usually, there should be a backtrace that can help us find the reason it crashed.
Which commit did you observe the problem with?
There was a bug where we simply stop processing audio once we've seen every file once.
It might be that you're on an older commit that had this problem.

I've been trying to find a solution to the gradient jumping to large values at large step numbers, but don't have any amazing solutions at the moment.
It seems to be related to the ReLU activations in the last few layers of the network.
I've tried clipping the gradients, which didn't have an effect on this problem.
Replacing the ReLU activations with Tanh seems to fix it completely, but the network doesn't converge quite as quickly as with ReLU.

from tensorflow-wavenet.

lelayf avatar lelayf commented on May 4, 2024

@ibab I'm experiencing the stalling with the latest commit.
@r-zemblys if you resume training at the checkpoint right before gradients implosion with a lower learning rate does it still behave the same ?

from tensorflow-wavenet.

r-zemblys avatar r-zemblys commented on May 4, 2024

@lelayf i've used learning rate of 0.01 to get that loss curve above. Train saver only stores last 5 checkpoints so I'm not able to try lowering learning rate right before gradient implosion.

@ibab I was indeed using older commit. Latest one does not have stalling problem.

Here is loss curve with l2 regularization added; orange - learning rate 0.01 (~20k steps), blue - 0.001 (~60k steps)
l2norm

Gradient implosion problem is gone, but it seems network is not learning anymore after first epoch. Will try to generate some audio later today.

from tensorflow-wavenet.

lelayf avatar lelayf commented on May 4, 2024

@r-zemblys are you training on GPU or CPU ?

from tensorflow-wavenet.

ibab avatar ibab commented on May 4, 2024

@r-zemblys: Excellent, did you use the default wavenet_params.json?
I've also linked some of my results in #47.

from tensorflow-wavenet.

r-zemblys avatar r-zemblys commented on May 4, 2024

Forgot to add. This is configuration I've used:


{
    "filter_width": 2,
    "quantization_steps": 256,
    "sample_rate": 16000,
    "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512],
    "residual_channels": 32,
    "dilation_channels": 16,
    "use_biases": false
}

But as I've mention in the beginning, there is no difference (at least in loss curve) if using default configuration.

from tensorflow-wavenet.

ibab avatar ibab commented on May 4, 2024

@r-zemblys: Did you train on the entire dataset, or a specific speaker?

from tensorflow-wavenet.

r-zemblys avatar r-zemblys commented on May 4, 2024

@ibab: entire VCTK corpus. And then primed generation with a recording from LibriSpeech ASR corpus.

from tensorflow-wavenet.

ibab avatar ibab commented on May 4, 2024

That's very cool. I think mixing together all different speakers explains the voice difference between your sample and mine.
Would you be interested in contributing the l2 regularization in a pull request?

from tensorflow-wavenet.

hoonyoung avatar hoonyoung commented on May 4, 2024

I'm using python 2.7 and as r-zemplys mentioned above as "..there was a bug in WaveNet.decode, which resulted to all-zeros output", I obtained the generated.wav file with all-zeros.

After fixing the last line of "wavenet_ops.py" like below, I am now getting the speech-like waveform output.

magnitude = (1 / mu) * ((1 + mu)**abs(signal) - 1)
--> magnitude = (1. / mu) * ((1. + mu)**abs(signal) - 1)

Hope someone reflect it to the code if necessary.

from tensorflow-wavenet.

ibab avatar ibab commented on May 4, 2024

@hoonyoung: This should be fixed on master now. I've also enabled travis to run the tests with Python 2.

from tensorflow-wavenet.

lelayf avatar lelayf commented on May 4, 2024

I commented out silence trimming and now training does not stall anymore, using 88e77bf.

from tensorflow-wavenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.