I'm having some problems getting hptuning to run successfully; but doing local and --distributed seems to cause an error I haven't quite tracked down. Some elementary googling suggests it might be leaking between checkpoint files, but I've definitely blown away output and can't find anywhere else the distributed workers might be writing checkpoints.
(test-env) Todds-MacBook-Pro:hptuning todd$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
(test-env) Todds-MacBook-Pro:hptuning todd$ git pull
Already up-to-date.
(test-env) Todds-MacBook-Pro:hptuning todd$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
(test-env) Todds-MacBook-Pro:hptuning todd$ # Clear the output from any previous local run.
(test-env) Todds-MacBook-Pro:hptuning todd$ rm -rf output/
(test-env) Todds-MacBook-Pro:hptuning todd$ # Train locally.
(test-env) Todds-MacBook-Pro:hptuning todd$ gcloud beta ml local train \
> --package-path=trainer \
> --module-name=trainer.task \
> --distributed \
> -- \
> --train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz \
> --eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz \
> --output_path=output
INFO:root:Original job data: {u'args': [u'--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz', u'--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz', u'--output_path=output'], u'job_name': u'trainer.task'}
INFO:root:Original job data: {u'args': [u'--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz', u'--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz', u'--output_path=output'], u'job_name': u'trainer.task'}
INFO:root:setting eval batch size to 100
INFO:root:setting eval batch size to 100
INFO:root:Starting parameter server 0
INFO:root:Starting worker/0
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job master -> {0 -> localhost:27182}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> localhost:27183, 1 -> localhost:27184}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:27185, 1 -> localhost:27186}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job master -> {0 -> localhost:27182}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> localhost:27183, 1 -> localhost:27184}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:27185, 1 -> localhost:27186}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:27183
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:27185
INFO:root:Original job data: {u'args': [u'--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz', u'--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz', u'--output_path=output'], u'job_name': u'trainer.task'}
INFO:root:setting eval batch size to 100
INFO:root:Starting worker/1
INFO:root:Original job data: {u'args': [u'--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz', u'--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz', u'--output_path=output'], u'job_name': u'trainer.task'}
INFO:root:setting eval batch size to 100
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job master -> {0 -> localhost:27182}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> localhost:27183, 1 -> localhost:27184}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:27185, 1 -> localhost:27186}
INFO:root:Starting master/0
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:27186
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job master -> {0 -> localhost:27182}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> localhost:27183, 1 -> localhost:27184}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:27185, 1 -> localhost:27186}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:27182
INFO:root:Original job data: {u'args': [u'--train_data_paths=gs://cloud-ml-data/mnist/train.tfr.gz', u'--eval_data_paths=gs://cloud-ml-data/mnist/eval.tfr.gz', u'--output_path=output'], u'job_name': u'trainer.task'}
INFO:root:setting eval batch size to 100
INFO:root:Starting parameter server 1
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job master -> {0 -> localhost:27182}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> localhost:27183, 1 -> localhost:27184}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:27185, 1 -> localhost:27186}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:27184
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py:210 in run_training.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py:264 in merge_all_summaries.: merge_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py:344 in __init__.: __init__ (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
WARNING:tensorflow:From /Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py:344 in __init__.: __init__ (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
I tensorflow/core/distributed_runtime/master_session.cc:993] Start master session f9163a4c7e96e286 with config:
device_filters: "/job:ps"
device_filters: "/job:worker/task:0"
I tensorflow/core/distributed_runtime/master_session.cc:993] Start master session 8ca80d82bc56a2d7 with config:
device_filters: "/job:ps"
device_filters: "/job:worker/task:1"
INFO:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: None, ready: Variables not initialized: fully_connected/biases, fully_connected_1/biases, fully_connected_2/biases, Variable, Variable_2
INFO:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: None, ready: Variables not initialized: fully_connected/biases, fully_connected_1/biases, fully_connected_2/biases, Variable, Variable_2
INFO:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: None, ready: Variables not initialized: fully_connected/biases, fully_connected_1/biases, fully_connected_2/biases, Variable, Variable_2
INFO:tensorflow:Waiting for model to be ready. Ready_for_local_init_op: None, ready: Variables not initialized: fully_connected/biases, fully_connected_1/biases, fully_connected_2/biases, Variable, Variable_2
I tensorflow/core/distributed_runtime/master_session.cc:993] Start master session c74f300d20e21df5 with config:
device_filters: "/job:ps"
device_filters: "/job:master/task:0"
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]
Caused by op u'fully_connected_2/weights/Assign', defined at:
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 559, in <module>
tf.app.run()
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 322, in main
run(model, argv)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 453, in run
dispatch(args, model, cluster, task)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 494, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 193, in run_training
self.args.batch_size)
File "trainer/model.py", line 133, in build_train_graph
return self.build_graph(data_paths, batch_size, is_training=True)
File "trainer/model.py", line 99, in build_graph
logits = inference(parsed['images'], self.hidden1, self.hidden2)
File "trainer/model.py", line 228, in inference
return layers.fully_connected(hidden2, NUM_CLASSES, activation_fn=None)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1346, in fully_connected
trainable=trainable)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 244, in model_variable
caching_device=caching_device, device=device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 208, in variable
caching_device=caching_device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 677, in _get_single_variable
expected_shape=shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 360, in _init_from_args
validate_shape=validate_shape).op
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]
Caused by op u'fully_connected_2/weights/Assign', defined at:
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 559, in <module>
tf.app.run()
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 322, in main
run(model, argv)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 453, in run
dispatch(args, model, cluster, task)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 494, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 193, in run_training
self.args.batch_size)
File "trainer/model.py", line 133, in build_train_graph
return self.build_graph(data_paths, batch_size, is_training=True)
File "trainer/model.py", line 99, in build_graph
logits = inference(parsed['images'], self.hidden1, self.hidden2)
File "trainer/model.py", line 228, in inference
return layers.fully_connected(hidden2, NUM_CLASSES, activation_fn=None)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1346, in fully_connected
trainable=trainable)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 244, in model_variable
caching_device=caching_device, device=device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 208, in variable
caching_device=caching_device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 677, in _get_single_variable
expected_shape=shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 360, in _init_from_args
validate_shape=validate_shape).op
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 559, in <module>
tf.app.run()
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 322, in main
run(model, argv)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 453, in run
dispatch(args, model, cluster, task)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 494, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 232, in run_training
with self.sv.managed_session(target, config=config) as session:
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 974, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 802, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 963, in managed_session
start_standard_services=start_standard_services)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 720, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 233, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]
Caused by op u'fully_connected_2/weights/Assign', defined at:
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 559, in <module>
tf.app.run()
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 322, in main
run(model, argv)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 453, in run
dispatch(args, model, cluster, task)
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 494, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/Users/todd/git/cloudml-samples/mnist/hptuning/trainer/task.py", line 193, in run_training
self.args.batch_size)
File "trainer/model.py", line 133, in build_train_graph
return self.build_graph(data_paths, batch_size, is_training=True)
File "trainer/model.py", line 99, in build_graph
logits = inference(parsed['images'], self.hidden1, self.hidden2)
File "trainer/model.py", line 228, in inference
return layers.fully_connected(hidden2, NUM_CLASSES, activation_fn=None)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1346, in fully_connected
trainable=trainable)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 244, in model_variable
caching_device=caching_device, device=device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
return func(*args, **current_args)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 208, in variable
caching_device=caching_device)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 677, in _get_single_variable
expected_shape=shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 360, in _init_from_args
validate_shape=validate_shape).op
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/todd/miniconda2/envs/test-env/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [8,34] rhs shape= [32,10]
[[Node: fully_connected_2/weights/Assign = Assign[T=DT_FLOAT, _class=["loc:@fully_connected_2/weights"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](fully_connected_2/weights, fully_connected_2/weights/Initializer/random_uniform)]]
[[Node: init/NoOp_S2 = _Recv[client_terminated=false, recv_device="/job:master/replica:0/task:0/cpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=-7490249123101218299, tensor_name="edge_57_init/NoOp", tensor_type=DT_FLOAT, _device="/job:master/replica:0/task:0/cpu:0"]()]]