ibab / tensorflow-wavenet Goto Github PK

A TensorFlow implementation of DeepMind's WaveNet paper

License: MIT License

Python 99.55% Shell 0.45%

tensorflow-wavenet's Issues

Bug with trimming

I train the network with default hyper-parameters:

Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "/home/jesse/tensorflow_workspace/tensorflow-wavenet/audio_reader.py", line 87, in thread_main audio = trim_silence(audio[:, 0]) File "/home/jesse/tensorflow_workspace/tensorflow-wavenet/audio_reader.py", line 47, in trim_silence return audio[indices[0]:indices[-1]] IndexError: index 0 is out of bounds for axis 0 with size 0

The error occurs at the last line of trim_silence method:

return audio[indices[0]:indices[-1]]

It seems that some audio consists of only silence and is blank after trimming.

Generating good audio samples

Let's discuss strategies for producing audio samples.
When running over the entire dataset, I've so far only managed to reproduce recording noise and clicks.

Some ideas I've had to improve on this:

We should limit ourselves to a single speaker for now. That will allow us to perform multiple epochs on the train dataset. We could also try overfitting the dataset a little, which should result in the network reproducing pieces of the train dataset.
Remove silence from the recordings. Many of the recordings have periods of recording noise before and after the speakers. It might be worth removing these with librosa.

Linguistic features for p280 speaker

Hi,

I generated the linguistic features as mentioned in the WaveNet paper for p280 speaker. If anyone is interested to use them for conditioning in WaveNet, please download via https://users.aalto.fi/~bollepb1/binary_labels_p280.zip. Each frame or row corresponds to 5ms of speech.

Feeding raw audio waveform into the first layer

We've discussed the fact that one-hot encoding the input to the network is kind of weird, and that it would be more natural to use the waveform as a single-channel floating point tensor instead.
Does anyone have experience with running our implementation in this way?
Should we switch to this method?

u-law encoding

Paper indicates wavenet encodes waveform in u-law encoding. :)

https://github.com/ritheshkumar95/WaveNet/blob/master/dataset.py#L177

Add fast wavenet generation.

@ibab I noticed you saw the efficient wavenet generation implementation I wrote with my friends:

https://github.com/tomlepaine/fast-wavenet

Can we help you add it to tensorflow-wavenet?

OOM on GTX 1080

Hi Igor, I'm getting OOM on GTX1080. Reduced sample size to one directory (175 files) and still getting this error. Do you have any ideas how to fit a tensor in memory on 8Gb cards?

I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: 
Limit:                  7690878976
InUse:                  7325787648
MaxInUse:               7465885184
NumAllocs:                    9793
MaxAllocSize:           2779725056

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****_********************_*****************************************************************xxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 928.75MiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:940] Resource exhausted: OOM when allocating tensor with shape[256,256,1,3715]
W tensorflow/core/common_runtime/bfc_allocator.cc:213] Ran out of memory trying to allocate 1.33GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
Traceback (most recent call last):
  File "train.py", line 151, in <module>
    main()
  File "train.py", line 136, in main
    summary, loss_value, _ = sess.run([summaries, loss, optim])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[256,256,1,3715]
     [[Node: gradients/dilated_stack/layer4/conv_f_grad/Conv2DBackpropInput = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/dilated_stack/layer4/conv_f_grad/Shape, dilated_stack/layer4/Variable/read, gradients/dilated_stack/layer4/conv_f/BatchToSpace_grad/SpaceToBatch)]]
Caused by op u'gradients/dilated_stack/layer4/conv_f_grad/Conv2DBackpropInput', defined at:

Thank you in advance

Bootstrapping the generation with existing audio

Currently, we start the generation with a randomly picked waveform sample.
I wonder what kind of effect that has, considering that the dilated convolutions won't be able to reach backwards beyond the beginning of the generated sample.
Maybe we should start off with one of the audio recordings.

Generated samples are non-negative

For my case, all the generated audio samples seem to be positive integer values.

I used the following architecture for the network. (wavenet_params.json)
"dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512],

I wonder if this only happens to me.. and want to know how to fix it. :)

Calculation of the loss

In each step, the calculation of the loss is based on the cross entropy of the input value and the predicted value. Each data point in the predicted value is predicted based on its receptive field in the input value. During prediction, the input value is padded on the left with many 0s to implement the casual convolution. Thus, the beginning predicted data points are actually predicted based on those padding 0s, and may not be the same as the input values even when the model is trained for many steps. Is this the reason that why the loss drops to around 2 and cannot drop lower? Maybe it is more reasonable to calculate the loss not from the beginning data point, but from the N-th point where N is the receptive field.

Training error in main.py

Getting the following error when I try to train the network - any idea what this is?

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:01:00.0
Total memory: 12.00GiB
Free memory: 11.53GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
  File "main.py", line 129, in <module>
    main()
  File "main.py", line 83, in main
    loss = net.loss(audio_batch)
  File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 97, in loss
    raw_output = self._create_network(encoded)
  File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 67, in _create_network
    dilation=dilation)
  File "/home/seth/Development/tensorflow-wavenet/wavenet.py", line 23, in _create_dilation_layer
    name="conv_f")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 168, in atrous_conv2d
    in_height = int(value_shape[1])
TypeError: __int__ returned non-int (type NoneType)

Singing

It could be really interesting to train with singing solo tracks and Lilypond. But probably could be too hard to collect a datest handing copyrights issues.

Error training multiple epochs

Training on a truncated sample of the VCTK corpus causes it to hang with no error if it hits the end of the corpus before the number of steps is complete. It looks like this is due to iterate_through_vctk not resetting once it hits the end. It looks like this is also happening in #65 on the full corpus, since it's stalling at 44256 steps, which is close to the samples in the VCTK corpus (109 speakers x ~400 samples/speaker).

I've tested this with an expanded file list (to iterate through multiple epochs) and it no longer stops at the end of the corpus.

I suggest we implement a tf.train.string_input_producer to iterate through epochs. That way we can also shuffle input, as has been mentioned in the discussion in #47.

Excessive memory consumption

The network currently runs into out of memory issues at a low number of layers.
This seems to be a problem with TensorFlow's atrous_conv2d operation.
If I set the dilation factor to 1, which means atrous_conv2d simply calls conv2d, I can easily run with 10s of layers.
It could just be the additional batch_to_space and space_to_batch operations, in which case I can write a single C++ op for atrous_conv2d.

AudioReader.stop_threads

There're two issues with this method:

The definition is incorrect, it lacks a self argument, it should be:def stop_threads(self):
The implementation is problematic, since Python thread object doesn't have a stop() method, there's no official way to stop a thread in Python.

Seems to me we should remove this method all together...

Number of seconds per step

Please, I have a simple question.

I'm training WaveNet with the VCTK corpus dataset in a CPU only machine (Intel i5) running to 20 sec/step. At this rate, I can reach about 4000 steps per day, 30.000 at week. Question is: ¿How much steps do we need to get a loss from about 2? And, another question: Can anybody tell me what is the seconds per step rate that a good GPU can reach? By the way, what's the rate in which each of you are working on guys?

Regards,
Samu.

Note. I'm using the default params:
{
"filter_width": 2,
"sample_rate": 16000,
"dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256,
1, 2, 4, 8, 16, 32, 64, 128, 256],
"residual_channels": 32,
"dilation_channels":16,
"quantization_channels": 256,
"skip_channels": 256,
"use_biases": false
}

What should output wave file sound like?

From the model of mine trained 1999 steps(It might be so little steps to sound normally),
It sounds just like noises.

It would be better to give well-trained example output for understanding desired output.

Cannot create output from generated model - tries loading wrong variables

I am trying to generate.py using python generate.py --samples 16000 ./model.ckpt-3999

However, the loading process fails with the following error:

Caused by op u'save/restore_slice', defined at:
  File "generate.py", line 174, in <module>
    main()
  File "generate.py", line 113, in main
    saver = tf.train.Saver(variables_to_restore)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 986, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1015, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 620, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 357, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 270, in restore_op
    preferred_shard=preferred_shard))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/io_ops.py", line 204, in _restore_slice
    preferred_shard, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 359, in _restore_slice
    preferred_shard=preferred_shard, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 749, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2380, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1298, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Tensor name "wavenet/causal_layer/Variable" not found in checkpoint files ./model.ckpt-3999
     [[Node: save/restore_slice = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
     [[Node: save/restore_slice_48/_117 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_36_save/restore_slice_48", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

The tensors in my checkpoint file have names that look deliberate, for example, //wavenet/causal_layer/filter/Adam

However, some println debugging shows that the saver in generate.py is trying to load very generic variable names like this:

wavenet/causal_layer/Variable
wavenet/dilated_stack/layer0/Variable
wavenet/dilated_stack/layer0/Variable_1
wavenet/dilated_stack/layer0/Variable_2
wavenet/dilated_stack/layer0/skip
wavenet/dilated_stack/layer1/Variable

I'm using tensorflow master compiled against Cuda 8.0 RC.

Not sure if this is an issue with my setup or a bug. Any help is greatly appreciated.

Change learning rate during training

Would it be helpful to add ability to manually change learning rate during training? I have naive implementation of this feature here.

Does anyone has some pre-trained model which we can download it?

My computer is not very powerful and I don't have the resources available in my university, can someone or could the readme.md give us a link where we can download a pre-trained model?

Testing the network on music datasets

I've started to play around with the MagnaTagATune dataset.
There's a small change that needs to be made to the code when training on this dataset:
Because it uses mp3 instead of wav, the pattern in wavenet/audio_reader.py needs to be adjusted.
It would be nice to write a MagnaReader class that inherits from the AudioReader (or contains one), and that's able to filter the content by genre using the provided metadata.

Make sure all variables have names

Some of the variables don't have a name, which makes it harder to debug. PR's welcome.

Rewrite the input pipeline using a proper Python audio library

Unfortunately, there's currently no easy way to disable the verbose logging output from ffmpeg that results from loading the .wav files into TensorFlow.
(I've recompiled TensorFlow to disable the output while developing the network).

I can use a different library to decode the .wav files, but that would add an extra dependency (and some extra code).

Separate 1x1 convolution for skip connection

One of the authors mentioned that the skip connections are connected to a separate 1x1 convolution than the one which output goes into the add block.

Bug with _causal_dilated_conv?

I test _causal_dilated_conv in wavenet.py with the following toy example:
batch_size: 1
height: 1
width: 20
in_channel: 1
out_channel: 1
dilation: 4

The “value” tensor is set with shape [1, 1, 20, 1], and the values are from 1 to 20. The “filter” is set with shape [1, 2, 1, 1], and both values are 1.

The ”out“ tensor is:
[[[[ 3.], [ 5.], [ 7.], [ 9.], [ 13.], [ 15.], [ 17.], [ 19.], [ 23.], [ 25.], [ 27.], [ 29.], [ 33.], [ 35.], [ 37.], [ 39.]]]]

I suppose the correct ”out“ should be:
[[[[ 6.], [ 8.], [ 10.], [ 12.], [ 14.], [ 16.], [ 18.], [ 20.], [ 22.], [ 24.], [ 26.], [ 28.], [ 30.], [ 32.], [ 34.], [ 36.]]]]

In this example, the "reshaped" tensor is:
[[[[ 1.], [ 2.], [ 3.], [ 4.], [ 5.]]],
[[[ 6.], [ 7.], [ 8.], [ 9.], [ 10.]]],
[[[ 11.],[ 12.],[ 13.],[ 14.],[ 15.]]],
[[[ 16.],[ 17.],[ 18.],[ 19.],[ 20.]]]]

I guess it should be
[[[[ 1.], [ 5.], [ 9.], [ 13.], [ 17.]]],
[[[ 2.], [ 6.], [ 10.], [ 14.], [ 18.]]],
[[[ 3.], [ 7.], [ 11.], [ 15.], [ 19.]]],
[[[ 4.], [ 8.], [ 12.], [ 16.], [ 20.]]]]
and then the result should be correct.

Initialization of variables

Are the magic numbers for weight initialization chosen for particular reasons (e.g. stddev=0.3)? Would we prefer something like tf.contrib.layers.xavier_initializer?

add regularization, dropout and batch norm?

Has anybody got loss lower than ~2? Tried couple of configurations (default, 3 and 4 stacks of 10 dilation layers), but loss does not get lower, suggesting the network is not learning anymore.

Also, there is what happened happened after ~30k steps:

I believe this is the same problem as reported in #30. There is what happens with weights:

Now running the same network with l2 norm regularization added.

And one more note: training just stops after 44256 steps (already happened twice) without any warnings or errors, despite of num_steps=50000

Fast generation

I'm trying to understand the differences between the implementation in wavenet.py in this repository and the implementation in @tomlepaine's fast-wavenet. I think I understand the insight in fast-wavenet but are there drawbacks associated with it that mean this repository wouldn't want to just adopt it as the main implementation? Why keep separate "fast" and (presumably) "slow" models?

Can't generate with biases

There's no way to turn off fast generation that I can find, and a model using biases needs to have it switched off in order to generate. The --fast_generation command line arg is always True no matter what you do.

Global condition and Local conditioning

In the white paper, they mention conditioning to a particular speaker as an input they condition globally, and the TTS component as an up-sampled (deconvolution) conditioned locally. For the latter, they also mention that they tried just repeating the values, but found it worked less well than doing the deconvolutions.

Is there effort underway to implement either of these? Practically speaking, implementing the local conditioning would allow us to begin to have this implementation speak recognizable words.

How to use TensorBoard for "The generated waveform can be played back using TensorBoard."

How to use TensorBoard?

Channel size

Here you say that you will quantize the channels to 256 possible amplitude values (as is mentioned on page 3 in the original paper). When you run the quantization in the preprocessing step you cast the data from the mu-law companding transformation to a tf.int32 which can take on 4294967296 different values. Am I mistaken or should this be cast to a tf.int8 instead?

SyntaxError: non-keyword arg after keyword arg

When I try to run train.py, even with the --help flag, I recieve the error:
File "train.py", line 40 parser.add_argument('--num_steps', type=int, default-NUM_STEPS, SyntaxError: non-keyword arg after keyword arg
and am thus unable to train the network.
All dependencies (tensorflow, FFmpeg, etc.) are installed.

more fine-grained handling of audio corpora

this is related to #104.

i'm thinking ahead perhaps to when this repo supports conditioning on labels. since most interesting audio sets may not be so neatly organized as VCTK, you can potentially trim your audio and create labels via analysis.

even without labels, it's useful to be able to train on a subset of a folder, or even subsets of individual audio files. one workflow I am developing is to isolate a segment of interest, then find audio chunks which are similar to it (using mel bands or other DSP stats useful in music information retrieval). it can be a bit heavy-handed but it can help weed out noise or other undesirable material.

it might be beyond the scope of what this repo is aiming for, in which case I'll just keep developing it separately. but one useful feature that I'd propose to start is simply letting you specify your training set in a text file rather than just a directory. maybe like a JSON file with paths to audio files and a list of subsegments to pull out. some supporting scripts can be used to generate it, e.g. a function which takes an audio file and time interval as input, and produces a JSON of T seconds of audio segments from a directory which are most similar to the input.

Training not converged

I train the network with more steps, and found that the loss increases around about 9k steps, as follows:

Since the VCTK-Corpus has about 44k wave files, and the batch size is 1, the loss increases during the first epoch.

Improve GPU performance

The performance of the network on GPUs seems to be lagging behind the CPU performance.
I suspect that this is because the 2D convolution isn't designed to work efficiently if the height of the input is 1.
It shouldn't be too difficult to write some custom code to perform an efficient 1D convolution.
For example, fft could be used for this.

Better silence trimming

The current implementation computes the short-time RMSE amplitude, applies a given threshold on its value (the units are raw amplitude), and trims the audio from the beginning until a frame above the threshold is found, and from the last frame above the threshold until the end. If the result is empty, that file is discarded.

At some point, we might want to implement a better algorithm, such as using a threshold relative to the maximum amplitude in the example or applying a smoothing filter. The algorithm should also be configurable.

Continuity on trained model

It would be great continue learning from the someone's trained model or paused mine.

Generated silence

Hi,

I followed all the instructions on training a model from the README(from the default dataset). Then I used the generation script to generate a few seconds of sound, again, according to the instructions, but unfortunately I got silence.

Any idea of what I might got wrong?

GPU OOM

Running train.py on Nvidia GRID K2, 1536 cores, 3.5 GB memory.

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GRID K2 major: 3 minor: 0 memoryClockRate (GHz) 0.745 pciBusID 0000:00:04.0 Total memory: 3.50GiB Free memory: 3.45GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K2, pci bus id: 0000:00:04.0) Trying to restore saved checkpoints from ./logdir/train/2016-09-20T19-23-04 ... No checkpoint found. step 0 - loss = 7.461, (13.180 sec/step) Storing checkpoint to ./logdir/train/2016-09-20T19-23-04 ... Done.

Then the BFC allocator runs out of memory:

W tensorflow/core/common_runtime/bfc_allocator.cc:270] **************************************************************************************__**********xx W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 194.79MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:940] Resource exhausted: OOM when allocating tensor with shape[199462,256] I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 3463 get requests, put_count=3233 evicted_count=1000 eviction_rate=0.30931 and unsatisfied allo cation rate=0.38406

Do you know if there is a setting I can specify to work around GPU memory limitations ?

Clearer output

Comparing to https://github.com/basveeling/wavenet example output, their implementation sounds much more clear.

Has anybody managed to produce clear output?

Consider cropping samples to a fixed maximum length

It's not ideal that the length of the samples varies wildly.
This could be fixed by cropping (or even padding) them to a fixed size.

CLI Flags, JSON configuration files, and default values

The current setup with some parameters specified as CLI flags and others in the JSON file is not optimal. There are two competing needs:

We would like to group related parameters in a JSON file so that it's easy to share model configurations with the community. Attaching the JSON file to a saved checkpoint makes it trivial to replicate experiments and test networks that others with more resources have trained, thus helping democratize the platform.
Changing individual parameters during development should be very, very easy.

We should find a way to accommodate both. I propose:

All parameters are defined as flags.
A list of JSON files are read sequentially and merged, and finally the supplied flags are merged with the combined JSON dictionary for manual overrides.
We would have a JSON file with model parameters, a JSON file with training parameters (learning rate, etc.), and users could add another one with their specific overrides.

It still doesn't feel optimal, though. Any suggestions?

Larger mini batch instead of long input frame

What is the desirable size for input frame? I think mini batch would be beneficial so we can decrease frame size to get larger mini batch.

Isn't it necessary?

Add unit tests

I'd like to add unit tests before extending the implementation further.
That should prevent us from pushing non-working versions of the code in the future.
TensorFlow has a nice API for tests, which I'd like to use for this: https://www.tensorflow.org/versions/r0.10/api_docs/python/test.html#testing

check prerequisites

is it necessary to write a script doing something like checking prerequisites to avoid the import error, such as "ImportError: No module named librosa".

RuntimeError: ('Coordinator stopped with threads still running: %s', 'Thread-4 Thread-6 Thread-8')

Traceback (most recent call last):
File "train.py", line 171, in
main()
File "train.py", line 167, in main
coord.join(threads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 325, in join
" ".join(stragglers))
RuntimeError: ('Coordinator stopped with threads still running: %s', 'Thread-4 Thread-6 Thread-8')

I tried with the solution in http://stackoverflow.com/questions/36210162/tensorflow-stopping-threads-via-coordinator-seems-not-to-work, but now work.

I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
Restoring model from model.ckpt-250
Traceback (most recent call last):
  File "generate.py", line 86, in <module>
    main()
  File "generate.py", line 66, in main
    feed_dict={samples: window})
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Output dimensions must be positive
         [[Node: wavenet/dilated_stack/layer1/conv_filter/BatchToSpace = BatchToSpace[T=DT_FLOAT, block_size=2, _device="/job:localhost/replica:0/task:0/gpu:0"](wavenet/dilated_stack/layer1/conv_filter, wavenet/dilated_stack/layer1/conv_filter/BatchToSpace/crops)]]
Caused by op u'wavenet/dilated_stack/layer1/conv_filter/BatchToSpace', defined at:
  File "generate.py", line 86, in <module>
    main()
  File "generate.py", line 51, in main
    next_sample = net.predict_proba(samples)
  File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 154, in predict_proba
    raw_output = self._create_network(encoded)
  File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 112, in _create_network
    self.dilation_channels)
  File "/home/ubuntu/jupyter_base/project/tensorflow-wavenet/wavenet.py", line 51, in _create_dilation_layer
    name="conv_filter")
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 228, in atrous_conv2d
    block_size=rate)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 308, in batch_to_space
    block_size=block_size, name=name)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/ubuntu/jupyter_base/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
    self._traceback = _extract_stack()```

ibab / tensorflow-wavenet Goto Github PK

tensorflow-wavenet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs