I train the network on Pascal VOC dataset.
The training progresses well at the beginning. But an error occurs after thousands of iterations.
If I restart training, the same error will occur again after thousands of iterations.
It seems that it is caused by the gradient or loss.
Could you please take a look and help me fix it up?
Thanks.
2017-10-10 11:53:49.895995: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.896694: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.897436: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.901151: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.901662: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.915369: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.915371: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.915423: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.915459: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] 2017-10-10 11:53:49.916251: W tensorflow/core/kernels/queue_base.cc:295] _3_shuffle_batch/random_shuffle_queue: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916323: W tensorflow/core/kernels/queue_base.cc:295] _3_shuffle_batch/random_shuffle_queue: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916366: W tensorflow/core/kernels/queue_base.cc:295] _0_parallel_read/filenames: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916420: W tensorflow/core/kernels/queue_base.cc:295] _2_parallel_read/common_queue: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916468: W tensorflow/core/kernels/queue_base.cc:295] _3_shuffle_batch/random_shuffle_queue: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916490: W tensorflow/core/kernels/queue_base.cc:295] _3_shuffle_batch/random_shuffle_queue: Skipping cancelled enqueue attempt with queue not closed 2017-10-10 11:53:49.916524: W tensorflow/core/kernels/queue_base.cc:295] _2_parallel_read/common_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Enqueue operation was cancelled [[Node: parallel_read/filenames/filenames_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/filenames, parallel_read/filenames/RandomShuffle)]] File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
[INFO]: Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Enqueue operation was cancelled [[Node: parallel_read/filenames/filenames_EnqueueMany = QueueEnqueueManyV2[Tcomponents=[DT_STRING], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/filenames, parallel_read/filenames/RandomShuffle)]] return fn(*args) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn status, run_metadata) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/contextlib.py", line 66, in __exit__ next(self.gen) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] [[Node: train_op/control_dependency/_879 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_18944 _train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred: Traceback (most recent call last): File "training_person_edge.py", line 364, in <module> tf.app.run() File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "training_person_edge.py", line 348, in main train(dataset, net, net_config) File "training_person_edge.py", line 292, in train train_loss, acc, iou, _, lr = sess.run([train_op, train_acc, mean_iou, update_mean_iou, learning_rate]) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] [[Node: train_op/control_dependency/_879 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_18944 _train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'gradients/TopKV2_grad/Reshape', defined at: File "training_person_edge.py", line 364, in <module> tf.app.run() File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "training_person_edge.py", line 348, in main train(dataset, net, net_config) File "training_person_edge.py", line 253, in train summarize_gradients=True) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 440, in create_train_op check_numerics=check_numerics) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/contrib/training/python/training/training.py", line 439, in create_train_op colocate_gradients_with_ops=colocate_gradients_with_ops) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 542, in gradients grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 348, in _MaybeCompile return grad_fn() # Exit early File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 542, in <lambda> grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/nn_grad.py", line 707, in _TopKGrad ind_2d = array_ops.reshape(op.outputs[1], array_ops.stack([-1, ind_lastdim])) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2619, in reshape name=name) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'TopKV2', defined at: File "training_person_edge.py", line 364, in <module> tf.app.run() [elided 1 identical lines from previous traceback] File "training_person_edge.py", line 348, in main train(dataset, net, net_config) File "training_person_edge.py", line 219, in train seg_logits, seg_gt, edge_logits, edge_gt, dataset, config) File "training_person_edge.py", line 153, in objective detection_loss(location, confidence, refine_ph, classes_ph, pos_mask) File "training_person_edge.py", line 106, in detection_loss number_of_negatives) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1949, in top_k return gen_nn_ops._top_kv2(input, k=k, sorted=sorted, name=name) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2577, in _top_kv2 name=name) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/gongke/anaconda2/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero [[Node: gradients/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](TopKV2/_835, gradients/TopKV2_grad/stack)]] [[Node: train_op/control_dependency/_879 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_18944 _train_op/control_dependency", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]