I am using tf-nightly 1.14.1-dev20190307.
I am trying to run command :
bazel-bin/lingvo/trainer --run_locally=gpu --mode=sync --model=lm.one_billion_wds.WordLevelOneBwdsSimpleSampledSoftmax --logdir=/tmp/lm1b/log --logtostderr
Error:
Waiting for 12.19 seconds before retrying.
I0417 09:53:59.435111 139839395571456 trainer.py:456] Probably the expected race on global_step: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
I am trying this command on sing machine. it exists after waiting some seconds.
Full error-log:
I0417 09:53:42.698106 139839395571456 retry.py:68] Retry: caught exception: _WaitTillInit while running FailedPreconditionError: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
. Call failed at (most recent call last):
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 421, in Start
self._RunLoop('trainer', self._Loop)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 196, in _RunLoop
loop_func(*loop_args)
Traceback for above exception (most recent call last):
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 454, in _WaitTillInit
global_step = sess.run(self._model.global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
Waiting for 3.47 seconds before retrying.
I0417 09:53:42.699461 139839395571456 trainer.py:456] Probably the expected race on global_step: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
I0417 09:53:46.173445 139839395571456 retry.py:68] Retry: caught exception: _WaitTillInit while running FailedPreconditionError: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
. Call failed at (most recent call last):
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 421, in Start
self._RunLoop('trainer', self._Loop)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 196, in _RunLoop
loop_func(*loop_args)
Traceback for above exception (most recent call last):
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 454, in _WaitTillInit
global_step = sess.run(self._model.global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
Waiting for 5.24 seconds before retrying.
I0417 09:53:46.174993 139839395571456 trainer.py:456] Probably the expected race on global_step: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
2019-04-17 09:53:49.916693: W tensorflow/core/framework/op_kernel.cc:1408] OP_REQUIRES failed at constant_op.cc:76 : Invalid argument: Cannot parse tensor from tensor_proto.
2019-04-17 09:53:49.916769: E tensorflow/core/common_runtime/executor.cc:636] Executor failed to create kernel. Invalid argument: Cannot parse tensor from tensor_proto.
[[{{node 1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const}}]]
2019-04-17 09:53:50.831555: W tensorflow/core/framework/op_kernel.cc:1408] OP_REQUIRES failed at constant_op.cc:76 : Invalid argument: Cannot parse tensor from proto: dtype: DT_FLOAT
tensor_shape {
dim {
size: 99184
}
dim {
size: 512
}
}
float_val: 1
2019-04-17 09:53:50.831626: E tensorflow/core/common_runtime/executor.cc:636] Executor failed to create kernel. Invalid argument: Cannot parse tensor from proto: dtype: DT_FLOAT
tensor_shape {
dim {
size: 99184
}
dim {
size: 512
}
}
float_val: 1
[[{{node 1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const}}]]
I0417 09:53:51.035936 139839403964160 base_runner.py:236] controller done (fatal error).
I0417 09:53:51.038496 139839403964160 base_runner.py:115] controller exception: Cannot parse tensor from proto: dtype: DT_FLOAT
tensor_shape {
dim {
size: 99184
}
dim {
size: 512
}
}
float_val: 1
[[node 1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const (defined at tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/optimizer.py:60) ]]
Original stack trace for u'1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const':
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1554, in
tf.app.run(main)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1550, in main
RunnerManager(FLAGS.model).Start()
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1543, in Start
self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir))
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1311, in CreateRunners
trial)
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1265, in _CreateRunner
return self.Controller(cfg, *common_args)
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 196, in init
self._model.ConstructFPropBPropGraph()
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 1229, in ConstructFPropBPropGraph
self._task.BProp()
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 500, in BProp
self._BPropForVariables(vs)
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 691, in _BPropForVariables
var_update_op = self.optimizer.Apply(lr, self._var_grads)
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/optimizer.py", line 63, in Apply
var_update_op = _Apply()
File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/optimizer.py", line 60, in _Apply
[(g, v) for (v, g) in var_grad.Flatten()], name='meta_backprop')
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 577, in apply_gradients
self._create_slots(var_list)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py", line 80, in _create_slots
"accumulator", self._name)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 1114, in _get_or_make_slot_with_initializer
var, initializer, shape, dtype, op_name)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 164, in create_slot_with_initializer
dtype)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 74, in _create_slot_var
validate_shape=validate_shape)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1502, in get_variable
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1243, in get_variable
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 567, in get_variable
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 519, in _true_getter
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 934, in _get_single_variable
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 212, in call
return cls._variable_v1_call(*args, **kwargs)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 175, in _variable_v1_call
aggregation=aggregation)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 154, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 2519, in default_variable_creator
expected_shape=expected_shape, import_scope=import_scope)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 216, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1443, in init
constraint=constraint)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1551, in _init_from_args
initial_value(), name="initial_value", dtype=dtype)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 906, in
shape.as_list(), dtype=dtype, partition_info=partition_info)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py", line 247, in call
self.value, dtype=dtype, shape=shape, verify_shape=verify_shape)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 179, in constant_v1
allow_broadcast=False)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 289, in _constant_impl
name=name).outputs[0]
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3479, in create_op
op_def=op_def)
File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1961, in init
self._traceback = tf_stack.extract_stack()
E0417 09:53:51.039324 139839403964160 base_runner.py:243] Traceback (most recent call last):
E0417 09:53:51.039395 139839403964160 base_runner.py:243] File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 196, in _RunLoop
E0417 09:53:51.039463 139839403964160 base_runner.py:243] loop_func(*loop_args)
E0417 09:53:51.039511 139839403964160 base_runner.py:243] File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 252, in _Loop
E0417 09:53:51.039556 139839403964160 base_runner.py:243] self._RestoreIfNeeded(sess)
E0417 09:53:51.039599 139839403964160 base_runner.py:243] File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 314, in _RestoreIfNeeded
E0417 09:53:51.039643 139839403964160 base_runner.py:243] sess.run([self._initialize_all])
E0417 09:53:51.039685 139839403964160 base_runner.py:243] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 930, in run
E0417 09:53:51.039726 139839403964160 base_runner.py:243] run_metadata_ptr)
E0417 09:53:51.039766 139839403964160 base_runner.py:243] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
E0417 09:53:51.039814 139839403964160 base_runner.py:243] feed_dict_tensor, options, run_metadata)
E0417 09:53:51.039856 139839403964160 base_runner.py:243] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
E0417 09:53:51.039897 139839403964160 base_runner.py:243] run_metadata)
E0417 09:53:51.039937 139839403964160 base_runner.py:243] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
E0417 09:53:51.039978 139839403964160 base_runner.py:243] raise type(e)(node_def, op, message)
E0417 09:53:51.040018 139839403964160 base_runner.py:243] InvalidArgumentError: Cannot parse tensor from proto: dtype: DT_FLOAT
E0417 09:53:51.040071 139839403964160 base_runner.py:243] tensor_shape {
E0417 09:53:51.040110 139839403964160 base_runner.py:243] dim {
E0417 09:53:51.040149 139839403964160 base_runner.py:243] size: 99184
E0417 09:53:51.040189 139839403964160 base_runner.py:243] }
E0417 09:53:51.040227 139839403964160 base_runner.py:243] dim {
E0417 09:53:51.040266 139839403964160 base_runner.py:243] size: 512
E0417 09:53:51.040306 139839403964160 base_runner.py:243] }
E0417 09:53:51.040344 139839403964160 base_runner.py:243] }
E0417 09:53:51.040385 139839403964160 base_runner.py:243] float_val: 1
E0417 09:53:51.040424 139839403964160 base_runner.py:243]
E0417 09:53:51.040462 139839403964160 base_runner.py:243] [[node 1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const (defined at tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/optimizer.py:60) ]]
E0417 09:53:51.040510 139839403964160 base_runner.py:243]
E0417 09:53:51.040550 139839403964160 base_runner.py:243] Original stack trace for u'1bwds_word_level_lm/lm/softmax/weight_0/var/Adagrad/Initializer/Const':
E0417 09:53:51.040591 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1554, in
E0417 09:53:51.040630 139839403964160 base_runner.py:243] tf.app.run(main)
E0417 09:53:51.040668 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
E0417 09:53:51.040708 139839403964160 base_runner.py:243] _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
E0417 09:53:51.040747 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
E0417 09:53:51.040806 139839403964160 base_runner.py:243] _run_main(main, args)
E0417 09:53:51.040848 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
E0417 09:53:51.040889 139839403964160 base_runner.py:243] sys.exit(main(argv))
E0417 09:53:51.040927 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1550, in main
E0417 09:53:51.041194 139839403964160 base_runner.py:243] RunnerManager(FLAGS.model).Start()
E0417 09:53:51.041244 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1543, in Start
E0417 09:53:51.041289 139839403964160 base_runner.py:243] self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir))
E0417 09:53:51.041330 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1311, in CreateRunners
E0417 09:53:51.041371 139839403964160 base_runner.py:243] trial)
E0417 09:53:51.041426 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 1265, in _CreateRunner
E0417 09:53:51.041467 139839403964160 base_runner.py:243] return self.Controller(cfg, *common_args)
E0417 09:53:51.041507 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 196, in init
E0417 09:53:51.041548 139839403964160 base_runner.py:243] self._model.ConstructFPropBPropGraph()
E0417 09:53:51.041589 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 1229, in ConstructFPropBPropGraph
E0417 09:53:51.041630 139839403964160 base_runner.py:243] self._task.BProp()
E0417 09:53:51.041670 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 500, in BProp
E0417 09:53:51.041711 139839403964160 base_runner.py:243] self._BPropForVariables(vs)
E0417 09:53:51.041764 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/base_model.py", line 691, in _BPropForVariables
E0417 09:53:51.041842 139839403964160 base_runner.py:243] var_update_op = self.optimizer.Apply(lr, self._var_grads)
E0417 09:53:51.041887 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/optimizer.py", line 63, in Apply
E0417 09:53:51.041929 139839403964160 base_runner.py:243] var_update_op = _Apply()
E0417 09:53:51.041970 139839403964160 base_runner.py:243] File "tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/optimizer.py", line 60, in _Apply
E0417 09:53:51.042010 139839403964160 base_runner.py:243] [(g, v) for (v, g) in var_grad.Flatten()], name='meta_backprop')
E0417 09:53:51.042052 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 577, in apply_gradients
E0417 09:53:51.042092 139839403964160 base_runner.py:243] self._create_slots(var_list)
E0417 09:53:51.042146 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adagrad.py", line 80, in _create_slots
E0417 09:53:51.042186 139839403964160 base_runner.py:243] "accumulator", self._name)
E0417 09:53:51.042226 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 1114, in _get_or_make_slot_with_initializer
E0417 09:53:51.042272 139839403964160 base_runner.py:243] var, initializer, shape, dtype, op_name)
E0417 09:53:51.042314 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 164, in create_slot_with_initializer
E0417 09:53:51.042354 139839403964160 base_runner.py:243] dtype)
E0417 09:53:51.042392 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 74, in _create_slot_var
E0417 09:53:51.042433 139839403964160 base_runner.py:243] validate_shape=validate_shape)
E0417 09:53:51.042473 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1502, in get_variable
E0417 09:53:51.042511 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.042551 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1243, in get_variable
E0417 09:53:51.042591 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.042630 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 567, in get_variable
E0417 09:53:51.042670 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.042709 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 519, in _true_getter
E0417 09:53:51.042747 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.042823 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 934, in _get_single_variable
E0417 09:53:51.042869 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.042910 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 212, in call
E0417 09:53:51.042949 139839403964160 base_runner.py:243] return cls._variable_v1_call(*args, **kwargs)
E0417 09:53:51.042993 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 175, in _variable_v1_call
E0417 09:53:51.043035 139839403964160 base_runner.py:243] aggregation=aggregation)
E0417 09:53:51.043088 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 154, in
E0417 09:53:51.043128 139839403964160 base_runner.py:243] previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
E0417 09:53:51.043167 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 2519, in default_variable_creator
E0417 09:53:51.043206 139839403964160 base_runner.py:243] expected_shape=expected_shape, import_scope=import_scope)
E0417 09:53:51.043246 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 216, in call
E0417 09:53:51.043284 139839403964160 base_runner.py:243] return super(VariableMetaclass, cls).call(*args, **kwargs)
E0417 09:53:51.043324 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1443, in init
E0417 09:53:51.043364 139839403964160 base_runner.py:243] constraint=constraint)
E0417 09:53:51.043402 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 1551, in _init_from_args
E0417 09:53:51.043442 139839403964160 base_runner.py:243] initial_value(), name="initial_value", dtype=dtype)
E0417 09:53:51.043482 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 906, in
E0417 09:53:51.043520 139839403964160 base_runner.py:243] shape.as_list(), dtype=dtype, partition_info=partition_info)
E0417 09:53:51.043560 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/init_ops.py", line 247, in call
E0417 09:53:51.043600 139839403964160 base_runner.py:243] self.value, dtype=dtype, shape=shape, verify_shape=verify_shape)
E0417 09:53:51.043638 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 179, in constant_v1
E0417 09:53:51.043678 139839403964160 base_runner.py:243] allow_broadcast=False)
E0417 09:53:51.043718 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.py", line 289, in _constant_impl
E0417 09:53:51.043756 139839403964160 base_runner.py:243] name=name).outputs[0]
E0417 09:53:51.043818 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
E0417 09:53:51.043859 139839403964160 base_runner.py:243] return func(*args, **kwargs)
E0417 09:53:51.043900 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3479, in create_op
E0417 09:53:51.043941 139839403964160 base_runner.py:243] op_def=op_def)
E0417 09:53:51.043981 139839403964160 base_runner.py:243] File "usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1961, in init
E0417 09:53:51.044020 139839403964160 base_runner.py:243] self._traceback = tf_stack.extract_stack()
E0417 09:53:51.044060 139839403964160 base_runner.py:243]
E0417 09:53:51.044100 139839403964160 base_runner.py:243]
I0417 09:53:51.420242 139839395571456 retry.py:68] Retry: caught exception: _WaitTillInit while running FailedPreconditionError: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
. Call failed at (most recent call last):
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 421, in Start
self._RunLoop('trainer', self._Loop)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 196, in _RunLoop
loop_func(*loop_args)
Traceback for above exception (most recent call last):
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 454, in _WaitTillInit
global_step = sess.run(self._model.global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
Waiting for 7.94 seconds before retrying.
I0417 09:53:51.421612 139839395571456 trainer.py:456] Probably the expected race on global_step: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
I0417 09:53:59.368693 139839395571456 retry.py:68] Retry: caught exception: _WaitTillInit while running FailedPreconditionError: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]
. Call failed at (most recent call last):
File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 421, in Start
self._RunLoop('trainer', self._Loop)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/base_runner.py", line 196, in _RunLoop
loop_func(*loop_args)
Traceback for above exception (most recent call last):
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/core/retry.py", line 50, in wrapper
return func(*args, **kwargs)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/main/lingvo/trainer.py", line 454, in _WaitTillInit
global_step = sess.run(self._model.global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
Waiting for 12.19 seconds before retrying.
I0417 09:53:59.435111 139839395571456 trainer.py:456] Probably the expected race on global_step: Attempting to use uninitialized value global_step
[[{{node _send_global_step_0}}]]