tensorflow / models Goto Github PK
View Code? Open in Web Editor NEWModels and examples built with TensorFlow
License: Other
Models and examples built with TensorFlow
License: Other
The code (inception_train.py) says:
# With 8 Tesla K40's and a batch size = 256, the following setup achieves
# precision@1 = 73.5% after 100 hours and 100K steps (20 epochs)
I'm hoping to reproduce Inception-v3 with similar precision to the the original paper.
Is 73.5% the best precision this code can achieve?
If no, what does it take to reach a higher precision? Just more iterations?
If yes, why not release a code that actually matches the paper?
Thanks
I fixed the scaling issue in VariationalAutoencoderRunner.py (See #23). However, running the default example causes following:
Epoch: 0001 cost= 1114.439753835
Epoch: 0002 cost= 662.529461080
Epoch: 0003 cost= 594.752329830
Epoch: 0004 cost= 569.599913920
Epoch: 0005 cost= 556.361018750
Epoch: 0006 cost= 545.052694460
Epoch: 0007 cost= 537.334268253
Epoch: 0008 cost= 530.251896875
Epoch: 0009 cost= 523.817275994
Epoch: 0010 cost= 519.874919247
Epoch: 0011 cost= 514.975155966
Epoch: 0012 cost= 510.715168395
Epoch: 0013 cost= 506.326094318
Epoch: 0014 cost= 502.172605824
Epoch: 0015 cost= 498.612383310
Epoch: 0016 cost= 495.592024787
Epoch: 0017 cost= 493.580289986
Epoch: 0018 cost= 490.370449006
Epoch: 0019 cost= 489.957028977
Epoch: 0020 cost= 486.818214844
W tensorflow/core/common_runtime/executor.cc:1102] 0x27f47b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
[[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
W tensorflow/core/common_runtime/executor.cc:1102] 0x542b0b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
[[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
[[Node: range_1/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_226_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
W tensorflow/core/common_runtime/executor.cc:1102] 0x542b0b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
[[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
[[Node: add_1/_27 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_225_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
File "VariationalAutoencoderRunner.py", line 53, in <module>
print "Total cost: " + str(autoencoder.calc_total_cost(X_test))
It seems that fixing the gaussian_sample_size
causes an error everytime we want to evaluate a batch of data where gaussian_sample_size != batch_size
.
I could not run download_and_preprocess_flowers.sh .
It gave an error on the last line
(portenv)dhcp-ccc-3940:DashCamAnalytics aub3$ ./models/inception/data/download_and_preprocess_flowers.sh ~/temp/flowers2/
Skipping download of flower data.
./models/inception/data/download_and_preprocess_flowers.sh: line 93: ./models/inception/data/download_and_preprocess_flowers.sh.runfiles/inception/build_image_data: No such file or directory
After I manually echoed the command and ran it, it worked fine.
# Build the TFRecords version of the image data.
cd "${CURRENT_DIR}"
BUILD_SCRIPT="${WORK_DIR}/build_image_data"
OUTPUT_DIRECTORY="${DATA_DIR}"
echo "${BUILD_SCRIPT}"
echo "${CURRENT_DIR}"
echo "python build_image_data.py --train_directory=${TRAIN_DIRECTORY} --validation_directory=${VALIDATION_DIRECTORY} --output_directory=${OUTPUT_DIRECTORY} --labels_file=${LABELS_FILE}"
Getting the following error with the latest tensorflow (head version)
Traceback (most recent call last):
File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/my_train.py", line 41, in <module>
tf.app.run()
File "/home/gabe/anaconda2/envs/tf-inception/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/my_train.py", line 37, in main
inception_train.train(dataset)
File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/inception_train.py", line 269, in train
if grad:
File "/home/gabe/anaconda2/envs/tf-inception/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 475, in __nonzero__
raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use the logical TensorFlow ops to test the value of a tensor.
https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L269
Changing to if grad is not None
solved the issue.
I want to use multi gpu,when I use tf.device specify which gpu to use,got an error.
with tf.Graph().as_default(),tf.device('/cpu:0'):
opt,lr_op,global_step=optimizer(INITIAL_LEARNING_RATE,tf.get_variable('global_step',[],initializer=tf.constant_initializer(0),trainable=False))
classify_opt,classify_lr_op,classify_global_step=optimizer(CLASSIFY_INITIAL_LEARNING_RATE,tf.get_variable('classify_global_step',[],initializer=tf.constant_initializer(0),trainable=False))
tower_grads=[]
classify_tower_grads=[]
gpu_num=len(os.environ.get('CUDA_VISIBLE_DEVICES','').split(','))
if gpu_num==0:raise RuntimeError
for i in range(gpu_num):
with tf.device('/gpu:%d'%i):
with tf.name_scope('gpu_%d'%i) as scope:
loss_op,classify_loss_op=tower_loss(scope)
tf.get_variable_scope().reuse_variables()
grads=opt.compute_gradients(loss_op)
classify_grads=classify_opt.compute_gradients(classify_loss_op)
tower_grads.append(grads)
classify_tower_grads.append(classify_grads)
classify_grads=average_gradients(classify_tower_grads)
grads=average_gradients(tower_grads)
Traceback (most recent call last):
File "cnn_inc_v4.py", line 404, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_inc_v4.py", line 401, in main
train()
File "cnn_inc_v4.py", line 377, in train
_,loss_value,lr_value=sess.run([train_op,loss_op,lr_op])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'gpu_0/LearnedUnigramCandidateSampler': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available
[[Node: gpu_0/LearnedUnigramCandidateSampler = LearnedUnigramCandidateSampler[num_sampled=100, num_true=1, range_max=195327, seed=0, seed2=0, unique=true, _device="/device:GPU:0"](gpu_0/ToInt64)]]
Caused by op u'gpu_0/LearnedUnigramCandidateSampler', defined at:
File "cnn_inc_v4.py", line 404, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_inc_v4.py", line 401, in main
train()
File "cnn_inc_v4.py", line 340, in train
loss_op,classify_loss_op=tower_loss(scope)
File "cnn_inc_v4.py", line 282, in tower_loss
loss_model(logits,yy_node,y_embeddings)
File "cnn_inc_v4.py", line 204, in loss_model
neg_ids, _, _ = tf.nn.learned_unigram_candidate_sampler(true_classes=tf.to_int64(tf.reshape(yy,[BATCH_SIZE,1])),num_true=1,num_sampled=NEG_NUM,unique=True,range_max=NUM_LABELS)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/candidate_sampling_ops.py", line 192, in learned_unigram_candidate_sampler
seed2=seed2, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_candidate_sampling_ops.py", line 251, in _learned_unigram_candidate_sampler
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
I'm having little luck getting the flowers sample to train. The most number of steps I've achieved has been 1400 with a batch size of 40 and a learning rate of .00005. I have tried may combinations of these two parameters with little luck. I'm curious if you might have suggestions; my machine is significantly more modest then the machine referenced in the source.
I'm running on a Ubuntu 14.04 based machine with two GTX 980 Ti GPUs and a i7-5930K processor with 64GB or RAM.
the flags are:
fine_tune=True
batch_size=40
num_gpus=2
input_queue_memory_factor=1
Sincerely,
Bob
INFO: Found 65 targets and 12 test targets...
ERROR: /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/external/png_archive/BUILD:23:1: Executing genrule @png_archive//:configure failed: namespace-sandbox failed: error executing command /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/_bin/namespace-sandbox ... (remaining 5 argument(s) skipped).
/home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/external/png_archive/libpng-1.2.53 /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet
/tmp/tmp.59Q48wThIm /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/external/png_archive/libpng-1.2.53 /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet
do not know why....
The Syntaxnet README tells users to
check your protobuf version with pip freeze | grep protobuf1
That should probably be pip freeze | grep protobuf
(without the 1
at the end).
I am trying to create a new inception model using:
bazel-bin/inception/flowers_train --train_dir="${TRAIN_DIR}" --data_dir="${FLOWERS_DATA_DIR}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1
and getting this error:
W tensorflow/core/framework/op_kernel.cc:896] Invalid argument: indices[0] = [0,148] is out of bounds: need 0 <= index < [32,6]
I have run build_image_data
to create FLOWERS_DATA_DIR
Can I create a new inception model this way?
In https://github.com/tensorflow/models/blob/master/inception/inception/slim/ops.py#L77
axis = range(len(inputs_shape) - 1)
This won't create a list in Python 3.
It works after changing to:
axis = list(range(len(inputs_shape) - 1))
The Code Here shows how to set each replica which has a single tower that uses one GPU. I'm wondering if there is a way changing this code a little bit to make use of multiple GPU on one machine like that example.
The way I currently used for using all GPU on a worker machine is starting the number of workers that equal to the number of GPUs. then the workers can communicate to each other as if they are not on one machine. That is slower than if I can start a woker that control more than one GPU.
After build syntaxnet
, running the demo.sh I've got
macbookproloreto:syntaxnet admin$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
Traceback (most recent call last):
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/parser_eval.py", line 28, in <module>
from syntaxnet import graph_builder
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/graph_builder.py", line 20, in <module>
import syntaxnet.load_parser_ops
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'
Traceback (most recent call last):
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/conll2tree.runfiles/syntaxnet/conll2tree.py", line 22, in <module>
import syntaxnet.load_parser_ops
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/conll2tree.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'
Traceback (most recent call last):
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/parser_eval.py", line 28, in <module>
from syntaxnet import graph_builder
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/graph_builder.py", line 20, in <module>
import syntaxnet.load_parser_ops
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'
Some tests failed during the build
FAIL: //syntaxnet:lexicon_builder_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log).
FAIL: //syntaxnet:graph_builder_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log).
FAIL: //syntaxnet:reader_ops_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log).
FAIL: //syntaxnet:beam_reader_ops_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log).
FAIL: //syntaxnet:text_formats_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log).
FAIL: //syntaxnet:parser_trainer_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log).
INFO: Elapsed time: 2115,987s, Critical Path: 1370,47s
//syntaxnet:arc_standard_transitions_test PASSED in 0,3s
//syntaxnet:parser_features_test PASSED in 0,2s
//syntaxnet:sentence_features_test PASSED in 0,4s
//syntaxnet:shared_store_test PASSED in 5,2s
//syntaxnet:tagger_transitions_test PASSED in 0,4s
//util/utf8:unicodetext_unittest PASSED in 0,1s
//syntaxnet:beam_reader_ops_test FAILED in 1,1s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log
//syntaxnet:graph_builder_test FAILED in 3,1s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log
//syntaxnet:lexicon_builder_test FAILED in 3,1s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log
//syntaxnet:parser_trainer_test FAILED in 1,2s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log
//syntaxnet:reader_ops_test FAILED in 3,0s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log
//syntaxnet:text_formats_test FAILED in 1,2s
/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log
My bazel version is
bazel version
Build label: 0.2.2b
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Apr 25 08:11:19 2016 (1461571879)
Build timestamp: 1461571879
Build timestamp as int: 1461571879
When trying to build tests, I get the following error.
bazel test syntaxnet/... util/utf8/...
ERROR: /Users/Mayank/nonjunk/tensorflow/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
File "/Users/Mayank/nonjunk/tensorflow/models/syntaxnet/WORKSPACE", line 11
check_version("0.2.0")
File "/private/var/tmp/_bazel_Mayank/62ffc1b5b4632e1ac9339b416e2ec109/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
_parse_bazel_version(native.bazel_version)
File "/private/var/tmp/_bazel_Mayank/62ffc1b5b4632e1ac9339b416e2ec109/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: package contains errors: util/utf8.
ERROR: no such package 'external': Package 'external' contains errors.
INFO: Elapsed time: 0.129s
ERROR: Couldn't start the build. Unable to run tests.
I had to disable version checking from tensorflow/models/syntaxnet/WORKSPACE to get this working.
load("@tf//tensorflow:tensorflow.bzl", "check_version")
> #check_version("0.2.0")
I'm trying to get out predictions with my fine-tuned model using a script similar to inception_eval.py
.
I don't have true labels for it so I want to save both initial filenames and logit predictions.
I modified image_processing
so it returns filenames too and I can use it like that:
images, labels, filenames = image_processing.inputs(dataset)
The problem is when I add this to the while-loop:
while step < num_iter and not coord.should_stop():
paths, targets, preds = sess.run([filenames, labels, logits])
for path in paths:
print(path)
I got about 0.1% duplicated values (and 0.1% missing thereby).
Seems like thread-safety issue somewhere in the modules or I'm missing something.
when solve the issue here I use the same start script to start my two machines. Then this error occured:
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
...
I find the error occurs because framework assign all the memory on the first gpu, /gpu:0
. finally I solve the problem by this commit.
According to the tensorflow source code gpu_device.cc line 553, the framework create all the GPU device local avaliable for each worker. So all workers allocate memory on first GPU, causing OUT_OF_MEMORY (however I just want assign 1 GPU for 1 worker).
So I'm wondering whether inception_distributed_train.py
call the API in a correct way. If we want to achieve the goal that
Multiple worker tasks can run on the same machine with multiple GPUs so machine_A with 2 GPUs may have 2 workers while machine_B with 1 GPU just has 1 worker.
(discribed in the document)
After successfully downloading and generating training and validation splits (I had to install coreutils and replace shuf with gshuf on OS X ). I encountered following error immediately.
(portenv)dhcp-ccc-3940:inception aub3$ bazel-bin/inception/flowers_train --train_dir="${TRAIN_DIR}" --data_dir="${FLOWERS_DATA_DIR}" --pretrained_model_checkpoint_path="${MODEL_PATH}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1
/Users/aub3/portenv/lib/python2.7/site-packages/tensorflow/python/ops/image_ops.py:586: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
if width == new_width_const and height == new_height_const:
Traceback (most recent call last):
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/flowers_train.py", line 41, in <module>
tf.app.run()
File "/Users/aub3/portenv/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/flowers_train.py", line 37, in main
inception_train.train(dataset)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_train.py", line 235, in train
loss = _tower_loss(images, labels, num_classes, scope)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_train.py", line 110, in _tower_loss
scope=scope)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_model.py", line 90, in inference
scope=scope)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/inception_model.py", line 88, in inception_v3
scope='conv0')
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(*args, **current_args)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/ops.py", line 184, in conv2d
restore=restore)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(*args, **current_args)
File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/variables.py", line 224, in variable
trainable=trainable, collections=collections)
TypeError: get_variable() got an unexpected keyword argument 'regularizer'
Working on Download_and_preprocess for Imagenet:
& running the following:
However, I'm wondering if it should be bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}" in lieu of bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}$"
no test was done.
i just followed the following commands,
git clone --recursive https://github.com/tensorflow/models.git (ok)
cd models/syntaxnet/tensorflow
./configure (press n to tensorflow using gpu)
bazel test syntaxnet/... util/utf8/... (nothing happened)
why?
thanks
I get this error when running bazel
test.
macbookproloreto:syntaxnet admin$ bazel test --linkopt=-headerpad_max_install_names syntaxnet/... util/utf8/...
ERROR: /Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/WORKSPACE", line 11
check_version("0.2.0")
File "/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
_parse_bazel_version(native.bazel_version)
File "/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: package contains errors: syntaxnet.
ERROR: no such package 'external': Package 'external' contains errors.
INFO: Elapsed time: 0,179s
ERROR: Couldn't start the build. Unable to run tests.
Will there be support for this technology on Android at some point in the future?
Is this repository structure really sustainable for publishing such complex sub projects and wouldn't a structure like e.g. a separate tensorflow-models
organization and one repository per model be more suitable?
Things will get overly complex and issues, pull requests (and maybe releases?) will be mixed up and will have to be re-separated manually.
I followed the instructions to install SyntaxNet on VM with Lubuntu 64bit. This is what I am getting when I try it out:
ilya@ilya-VirtualBox:~/dev/models/syntaxnet$ bazel test syntaxnet/... util/utf8/...
/home/ilya/bin/bazel: line 86: /home/ilya/.bazel/bin/bazel-real: cannot execute binary file: Exec format error
/home/ilya/bin/bazel: line 86: /home/ilya/.bazel/bin/bazel-real: Success
ilya@ilya-VirtualBox:~/dev/models/syntaxnet$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
syntaxnet/demo.sh: line 31: bazel-bin/syntaxnet/parser_eval: No such file or directory
syntaxnet/demo.sh: line 43: bazel-bin/syntaxnet/parser_eval: No such file or directory
syntaxnet/demo.sh: line 55: bazel-bin/syntaxnet/conll2tree: No such file or directory
There is a missing validation when the [data dir] ends with /, some concatenations fails during the process.
shuf is named gshuf when installed through homebrew.
On Ubuntu 16.04 LTS I'm getting the following error when executing
bazel test syntaxnet/... util/utf8/...
/usr/local/bin/bazel: line 86: /usr/local/lib/bazel/bin/bazel-real: cannot execute binary file: Exec format error
/usr/local/bin/bazel: line 86: /usr/local/lib/bazel/bin/bazel-real: Success
Does this mean that Bazel has to built from the source files?
I just built a copy of the most recent version of Bazel and it threw the following error when I build:
WARNING: /home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/WORKSPACE:1: Workspace name in /home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/WORKSPACE (@__main__) does not match the name given in the repository's definition (@tf); this will cause a build error in future versions.
ERROR: /home/zv/Development/tflow/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
File "/home/zv/Development/tflow/models/syntaxnet/WORKSPACE", line 11
check_version("0.2.0")
File "/home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
_parse_bazel_version(native.bazel_version)
File "/home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: Loading of target '@bazel_tools//tools/cpp:toolchain' failed; build aborted: error loading package 'external': Package 'external' contains errors.
ERROR: Loading failed; build aborted.
INFO: Elapsed time: 0.098s
ERROR: Couldn't start the build. Unable to run tests.
This problem is addressed by removing the version check on line 11 of syntaxnet/WORKSPACE
or converting to Bazel's ccheck
syntax.
I have installed bazel 0.2.2. But when I run bazel test syntaxnet/... util/utf8/... ,there are 6 test are not passed, how to fix it ? thanks
cc1plus: warning: unrecognized command line option "-Wno-self-assign" [enabled by default]
FAIL: //syntaxnet:reader_ops_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log).
FAIL: //syntaxnet:beam_reader_ops_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log).
FAIL: //syntaxnet:graph_builder_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log).
FAIL: //syntaxnet:lexicon_builder_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log).
FAIL: //syntaxnet:text_formats_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log).
FAIL: //syntaxnet:parser_trainer_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log).
INFO: Elapsed time: 341.488s, Critical Path: 294.82s
//syntaxnet:arc_standard_transitions_test (cached) PASSED in 0.0s
//syntaxnet:parser_features_test (cached) PASSED in 0.1s
//syntaxnet:sentence_features_test (cached) PASSED in 0.1s
//syntaxnet:shared_store_test (cached) PASSED in 0.4s
//syntaxnet:tagger_transitions_test (cached) PASSED in 0.1s
//util/utf8:unicodetext_unittest (cached) PASSED in 0.0s
//syntaxnet:beam_reader_ops_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log
//syntaxnet:graph_builder_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log
//syntaxnet:lexicon_builder_test FAILED in 0.5s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log
//syntaxnet:parser_trainer_test FAILED in 0.6s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log
//syntaxnet:reader_ops_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log
//syntaxnet:text_formats_test FAILED in 0.5s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log
Executed 6 out of 12 tests: 6 tests pass and 6 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
After installing the correct deps (protobuf, swig, bazel, asciitree), tried running bazel test syntaxnet/... util/utf8/...
, but after a lote of notes
and warnings
, the final output I got this:
ERROR: /home/lerax/.cache/bazel/_bazel_lerax/cae2b1799296ff1923cddf0854a31846/external/tf/tensorflow/core/kernels/BUILD:856:1: C++ compilation of rule '@tf//tensorflow/core/kernels:argmax_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 72 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
INFO: Elapsed time: 552.118s, Critical Path: 428.91s
//syntaxnet:arc_standard_transitions_test NO STATUS
//syntaxnet:beam_reader_ops_test NO STATUS
//syntaxnet:graph_builder_test NO STATUS
//syntaxnet:lexicon_builder_test NO STATUS
//syntaxnet:parser_features_test NO STATUS
//syntaxnet:parser_trainer_test NO STATUS
//syntaxnet:reader_ops_test NO STATUS
//syntaxnet:sentence_features_test NO STATUS
//syntaxnet:shared_store_test NO STATUS
//syntaxnet:tagger_transitions_test NO STATUS
//syntaxnet:text_formats_test NO STATUS
//util/utf8:unicodetext_unittest NO STATUS
Executed 0 out of 12 tests: 12 were skipped.
Any ideas?
feature | version |
---|---|
gcc | 4.8.4 |
python | 3.4.3 |
bazel | 0.22 |
swig | 2.0.11 |
protobuf | 3.0.0b2 |
asciitree | 0.3.1 |
OS | GNU/Linux, Distro Ubuntu 14.0 |
Hi there, thank you for releasing the slim library which is so elegant and useful. I found it extremely useful in my research. I just noticed from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py#L709 that slim RNN has been used. Will it be possible to release the slim RNN? I hope I am not asking too much...
Hi, thank you for releasing the v3 training model!! I am wondering are there any plans for releasing the v4 model?
Thanks,
Upon running the command: bazel test syntaxnet/... util/utf8/...
Any Ideas? Here is my output
bazel test syntaxnet/... util/utf8/...
/home/vertical-3/.cache/bazel/_bazel_vertical-3/0e0937a190400fc5db42694f75128f7d/external/tf/google/protobuf/BUILD:534:1: C++ compilation of rule '@tf//google/protobuf:pyext/_message.so' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 49 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/tf/google/protobuf/python/google/protobuf/pyext/repeated_composite_container.cc:34:0:
external/tf/google/protobuf/python/google/protobuf/pyext/repeated_composite_container.h:37:20: fatal error: Python.h: No such file or directory
#include <Python.h>
^
compilation terminated.
INFO: Elapsed time: 14.267s, Critical Path: 14.07s
In inception_train.py
, model checkpoint was periodically saved in train_dir
.
Unfortunately, next time I fine tune a chechpoint model , train_dir
was deleted first.
Then I found
if tf.gfile.Exists(FLAGS.train_dir):
tf.gfile.DeleteRecursively(FLAGS.train_dir)
However, I lost all of my checkpoins before I read the codes.
I run the distribute version of the example
on two machine, each of them has 4 GPUs. if I start one worker each, its OK. However, when I want to run multiple workers on each machine, "InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [2]" exist.
Both of my remote server's operating system are CentOS 7
# machine A with 4 GPU
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--job_name='ps' \
-task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'
# machine B with 4 GPU
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'
Error log
INFO:tensorflow:Waiting for model to be ready: Attempting to use uninitialized value mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage
[[Node: mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage/read = Identity[T=DT_FLOAT, _class=["loc:@mixed_35x35x256a/branch3x3dbl/Conv/weights"], _device="/job:ps/replica:0/task:0/cpu:0"](mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage)]]
Caused by op u'mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage/read', defined at:
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
inception_distributed_train.train(server.target, dataset, cluster_spec)
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 220, in train
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/sync_replicas_optimizer.py", line 427, in apply_gradients
self._variables_to_average)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 282, in apply
colocate_with_primary=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 86, in create_slot
return _create_slot_var(primary, val, scope)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 50, in _create_slot_var
slot = variables.Variable(val, name=scope, trainable=False)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 206, in __init__
dtype=dtype)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 275, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 609, in identity
return _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
Traceback (most recent call last):
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
inception_distributed_train.train(server.target, dataset, cluster_spec)
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 260, in train
sess = sv.prepare_or_wait_for_session(target, config=sess_config)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 674, in prepare_or_wait_for_session
config=config, init_feed_dict=self._init_feed_dict)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 158, in prepare_session
max_wait_secs=max_wait_secs, config=config)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 214, in recover_session
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1090, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [2]
[[Node: save/Assign_103 = Assign[T=DT_INT64, _class=["loc:@local_steps"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](local_steps, save/restore_slice_103)]]
[[Node: save/restore_all/NoOp_S4 = _Recv[client_terminated=false, recv_device="/job:worker/replica:0/task:0/gpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=1831303354831316628, tensor_name="edge_1174_save/restore_all/NoOp", tensor_type=DT_FLOAT, _device="/job:worker/replica:0/task:0/gpu:0"]()]]
Caused by op u'save/Assign_103', defined at:
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
inception_distributed_train.train(server.target, dataset, cluster_spec)
File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 233, in train
saver = tf.train.Saver()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
restore_sequentially=restore_sequentially)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 502, in build
filename_tensor, vars_to_save, restore_sequentially, reshape)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 268, in _AddRestoreOps
validate_shape=validate_shape))
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 40, in assign
use_locking=use_locking, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
When running the VAE example VariationalAutoencoderRunner.py
in models/autoencoder , I get following output:
Epoch: 0001 cost= nan
Epoch: 0002 cost= nan
Epoch: 0003 cost= nan
Epoch: 0004 cost= nan
Epoch: 0005 cost= nan
Epoch: 0006 cost= nan
Epoch: 0007 cost= nan
Epoch: 0008 cost= nan
Epoch: 0009 cost= nan
Epoch: 0010 cost= nan
Epoch: 0011 cost= nan
Epoch: 0012 cost= nan
Epoch: 0013 cost= nan
Epoch: 0014 cost= nan
Epoch: 0015 cost= nan
Epoch: 0016 cost= nan
Epoch: 0017 cost= nan
Epoch: 0018 cost= nan
Epoch: 0019 cost= nan
Epoch: 0020 cost= nan
Machine: AWS EC2 GPU instance with image ami-77e0da1d
Followed "How to Fine-Tune a Pre-Trained Model on a New Task "Getting Started" part, I create "flowers-data" directory in my home directory, and input these in terminal:
FLOWERS_DATA_DIR=$HOME/flowers-data
bazel build -c opt inception/download_and_preprocess_flowers
bazel-bin/inception/download_and_preprocess_flowers "${FLOWERS_DATA_DIR}$"
I want to ask should "${FLOWERS_DATA_DIR}$" be "${FLOWERS_DATA_DIR}"?
no test was done.
i just followed the following commands,
git clone --recursive https://github.com/tensorflow/models.git (ok)
cd models/syntaxnet/tensorflow
./configure (press n to tensorflow using gpu)
bazel test syntaxnet/... util/utf8/... (nothing happened)
why?
thanks
Following installation instructions on OS X 10.11.4. GCC is from homebrew, version 5.3.0.
This happened:
$ bazel test --linkopt=-headerpad_max_install_names syntaxnet/... util/utf8/...
...........
INFO: Found 65 targets and 12 test targets...
ERROR: /Users/pdarragh/Programming/Projects/tensorflow-models/syntaxnet/third_party/utf/BUILD:3:1: C++ compilation of rule '//third_party/utf:utf' failed: osx_cc_wrapper.sh failed: error executing command external/local_config_cc/osx_cc_wrapper.sh -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 36 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
gcc: error: unrecognized command line option '-Wthread-safety'
gcc: error: unrecognized command line option '-Wself-assign'
INFO: Elapsed time: 111.199s, Critical Path: 0.37s
//syntaxnet:arc_standard_transitions_test NO STATUS
//syntaxnet:beam_reader_ops_test NO STATUS
//syntaxnet:graph_builder_test NO STATUS
//syntaxnet:lexicon_builder_test NO STATUS
//syntaxnet:parser_features_test NO STATUS
//syntaxnet:parser_trainer_test NO STATUS
//syntaxnet:reader_ops_test NO STATUS
//syntaxnet:sentence_features_test NO STATUS
//syntaxnet:shared_store_test NO STATUS
//syntaxnet:tagger_transitions_test NO STATUS
//syntaxnet:text_formats_test NO STATUS
//util/utf8:unicodetext_unittest NO STATUS
Executed 0 out of 12 tests: 12 were skipped.
Looked at file syntaxnet/bazel-syntaxnet/external/local_config_cc/osx_cc_wrapper.sh
after attempting to install, since the error seems to indicate the problem happened there, but I can't see anything wrong with it. Attempted changing the gcc call to use OS X's gcc (line 56), but that had no effect on the error.
Any ideas?
In the losses section of Readme.md of the Slim lib for the inception model there seems to be many references to functions that are nowhere in the code. The losses.py file seems to be an older version and the Readme newer.
losses.py has three functions, l1_loss
, l2_loss
, cross_entropy_loss
while the Readme mentions slim.losses.ClassificationLoss
, slim.losses.SumOfSquaresLoss
.
Is the code somewhere else on Github, or is this an error, referring to non-published code ?
Quick question: How can I freeze/save my fine-tune model? (To use it later on android)
I'm trying to save it during training checkpoints using graph_util.convert_variables_to_constants(), however, it is not clear with the name of the output classification layer in the retrained graph.
https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L349
if step % 5000 == 0 or (step + 1) == FLAGS.max_steps:
output_graph_def = graph_util.convert_variables_to_constants(
sess, sess.graph.as_graph_def(), ????)
Thanks in advanced.
I've run Standard Input Parsing with this sentence I said , 'what 're you ? Crazy ? said Sandowsky. I can't afford to do that
.
SyntaxNet parses can't
to ca
and n't
instead of can't
or can not
. Do I need to train the machine first?
Input: I said , 'what 're you ? Crazy ? said Sandowsky. I can't afford to do that .
Parse:
said VBD ROOT
+-- said VBD dep
| +-- I PRP nsubj
| +-- , , punct
| +-- you PRP dep
| | +-- 'what PRP nsubj
| | +-- 're VBP cop
| | +-- ? . punct
| | +-- Crazy NNP dep
| +-- ? . punct
+-- Sandowsky. NNP nsubj
| +-- afford VB ccomp
| +-- I PRP nsubj
| +-- ca MD aux
| +-- n't RB neg
| +-- do VB xcomp
| +-- to TO aux
| +-- that DT dobj
+-- . . punct
Perhaps this is not the right place to ask this, but here goes:
I'd like to use Parsey McParseface in the Unity3d game engine. It seems like a big aspect of why this will be useful is because it is making machine learning easily accessible to all developers with minimal setup. I've looked around a bit, but not sure if I'll be able to use it in Unity seeing as it uses Python. Does anyone know either how to set this up (I'm sure it would useful for many developers), or maybe the next best alternative for powerful NLU? If there is a web service I could request that is already built and ready to go, that would work too.
Thanks, and please tell me the best place to post this question if it is not appropriate here.
Hello,
First, congrats on the release of 0.8.0 and especially for the distributed code!
However, I noticed a performance regression when training inception v3 on a multi-GPU system (NVIDIA DevBox: 4x Titan X, driver 361.48).
I was able to pinpoint the performance regression at 84b58a6:
2016-04-14 20:46:34.577596: step 0, loss = 13.08 (2.0 examples/sec; 127.239 sec/batch)
2016-04-14 20:47:44.064244: step 10, loss = 13.16 (68.1 examples/sec; 3.762 sec/batch)
2016-04-14 20:48:21.515250: step 20, loss = 13.25 (68.9 examples/sec; 3.718 sec/batch)
2016-04-14 20:48:59.089292: step 30, loss = 13.27 (68.7 examples/sec; 3.728 sec/batch)
2016-04-14 20:49:36.558736: step 40, loss = 13.38 (68.2 examples/sec; 3.753 sec/batch)
2016-04-14 20:50:13.988347: step 50, loss = 13.29 (68.9 examples/sec; 3.718 sec/batch)
2016-04-14 20:50:51.547393: step 60, loss = 13.35 (67.5 examples/sec; 3.795 sec/batch)
2016-04-14 20:51:29.056161: step 70, loss = 13.30 (68.1 examples/sec; 3.761 sec/batch)
2016-04-14 20:52:06.558016: step 80, loss = 13.32 (68.4 examples/sec; 3.744 sec/batch)
2016-04-14 20:52:44.036974: step 90, loss = 12.97 (68.4 examples/sec; 3.745 sec/batch)
2016-04-14 20:53:21.528334: step 100, loss = 13.19 (68.3 examples/sec; 3.746 sec/batch)
With the commit just before, 9a1dfdf:
2016-04-14 21:19:02.731016: step 0, loss = 13.12 (2.1 examples/sec; 122.740 sec/batch)
2016-04-14 21:19:54.731369: step 10, loss = 13.73 (116.1 examples/sec; 2.204 sec/batch)
2016-04-14 21:20:16.844538: step 20, loss = 13.59 (116.4 examples/sec; 2.200 sec/batch)
2016-04-14 21:20:38.962047: step 30, loss = 13.98 (116.0 examples/sec; 2.207 sec/batch)
2016-04-14 21:21:01.137735: step 40, loss = 14.15 (114.9 examples/sec; 2.228 sec/batch)
2016-04-14 21:21:23.400314: step 50, loss = 13.68 (114.6 examples/sec; 2.235 sec/batch)
2016-04-14 21:21:45.651567: step 60, loss = 13.66 (115.9 examples/sec; 2.209 sec/batch)
2016-04-14 21:22:07.986970: step 70, loss = 13.35 (115.0 examples/sec; 2.226 sec/batch)
2016-04-14 21:22:30.290880: step 80, loss = 13.14 (115.1 examples/sec; 2.225 sec/batch)
2016-04-14 21:22:52.503986: step 90, loss = 13.15 (116.1 examples/sec; 2.205 sec/batch)
2016-04-14 21:23:14.733461: step 100, loss = 13.11 (114.9 examples/sec; 2.227 sec/batch)
I also tried 5d7612c with --num_readers=16
, with no improvement.
I'm running the inception training inside a Docker container using nvidia-docker (I'm one of the maintainer).
FROM gcr.io/tensorflow/tensorflow:0.8.0rc0-devel-gpu
WORKDIR /models
RUN git clone https://github.com/tensorflow/models.git . && \
git checkout 9a1dfdf263b358a1560b8f7ef6c76595b71ca201
WORKDIR /models/inception
RUN bazel build -c opt --config=cuda inception/imagenet_train
CMD bazel-bin/inception/imagenet_train --num_gpus=4 --batch_size=256 \
--max_steps=110 --log_device_placement=true --num_epochs_per_decay=2.0 --learning_rate_decay_factor=0.94 \
--train_dir=/data/imagenet --data_dir=/raid/tensorflow_imagenet
With --log_device_placement=true
the full log is huge, but I can share it if you want.
ping @jmchen-g, author of PR #44
Let me know how I can help!
I think it might be easier to use tensorflow as a git submodule rather than symlinks. it took me a while to figure out that I have to clone tensorflow inside models.
Also bazel ends up building entire tensor flow, which is very slow, and seems needless when you need to only use models.
Hi
Your install instructions do not mention that you need the python numpy module(s) in order to install correctly. You might want to add that to the README.
Stephen
I'm running in CPU only mode and python 2.7, but it doesn't work with python 3 on my machine either
I get this:
INFO: Elapsed time: 85.654s, Critical Path: 78.81s
//syntaxnet:arc_standard_transitions_test NO STATUS
//syntaxnet:beam_reader_ops_test NO STATUS
//syntaxnet:graph_builder_test NO STATUS
//syntaxnet:lexicon_builder_test NO STATUS
//syntaxnet:parser_features_test NO STATUS
//syntaxnet:parser_trainer_test NO STATUS
//syntaxnet:reader_ops_test NO STATUS
//syntaxnet:sentence_features_test NO STATUS
//syntaxnet:shared_store_test NO STATUS
//syntaxnet:tagger_transitions_test NO STATUS
//syntaxnet:text_formats_test NO STATUS
//util/utf8:unicodetext_unittest NO STATUS
Executed 0 out of 12 tests: 12 were skipped.
if you need more information about my system/install, let me know!
edit: downgrading from GCC 6.1 to 4.9 seems to have done something. Bazel is testing now and it's been running for about 2 hours. Is that normal? Would enabling the GPU speed that up in the future?
I am interested in training inception from scratch. I manage to build imagenet as instructed. However, the following command:
bazel-bin/inception/imagenet_train.py --num_gpus=1 --batch_size=32 --train_dir=/tmp/imagenet_train --data_dir=/tmp/imagenet_data
Results in this stack trace:
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops.py:586: FutureWarning: comparison to None
will result in an elementwise object comparison in the future.
if width == new_width_const and height == new_height_const:
Traceback (most recent call last):
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/imagenet_train.py", line 41, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/imagenet_train.py", line 37, in main
inception_train.train(dataset)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_train.py", line 235, in train
loss = _tower_loss(images, labels, num_classes, scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_train.py", line 110, in _tower_loss
scope=scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_model.py", line 90, in inference
scope=scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/inception_model.py", line 88, in inception_v3
scope='conv0')
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(_args, *_current_args)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/ops.py", line 184, in conv2d
restore=restore)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(_args, *_current_args)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/variables.py", line 224, in variable
trainable=trainable, collections=collections)
TypeError: get_variable() got an unexpected keyword argument 'regularizer'
Any idea what's happening?
I have only one GPU.
Does anyone know how to suspend the training binary while running the evaluation on the same GPU?
While trying to train my own data, by running (as in the README for training own flower data);
bazel-bin/inception/flowers_train --train_dir="${TRAIN_DIR}" --data_dir="${FLOWERS_DATA_DIR}" --pretrained_model_checkpoint_path="${MODEL_PATH}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1
I get an error:
tensorflow.python.framework.errors.DataLossError: Unable to open table file /Users/asd/workspace/inception-v3-model/inception-v3/model.ckpt-157585: Data loss: corrupted compressed block contents: perhaps your file is in a different file format and you need to use a different restore operator?
Is the model checkpoint really corrupt?
Hi,
First, thanks for open sourcing SyntaxNet. I've been looking forward to playing with it since your 2015 paper. I'm hoping to convince you to run a small experiment :).
I think it's a shame that parsers are always evaluated using gold-standard pre-processing. Dridan et al (2013) showed that error propagation from the tokenization and SBD can have a significant impact on accuracy.[1] More to the point, I speculate that some algorithms are going to be more or less sensitive to this. If we only ever evaluate on gold-standard pre-processing, we never get to measure this sensitivity.
One of the neat things about transition-based parsers is that they can do joint segmentation and parsing. You can feed in a whole document at once --- you don't need to pre-segment. I found this helped a lot when I was doing speech parsing, and I carried the trick over to my parser spaCy.
I never wrote this up, but I found that doing joint segmentation and parsing improved accuracy on the full task (i.e, from raw text) by ~1% (going from memory here...Hope I'm not wrong). I think the improvement is mostly due to seeing segmentation errors during training.
You can't easily do this with a chart or graph-based parser, and document parsing decreases the effectiveness of a beam. On balance, this produces a slight advantage for greedy models. So, I think the algorithmic implications are quite interesting. It's not just a case of finding out that everyone's score decreases by some fixed amount, depending on the pre-processing accuracy.
I have OntoNotes 5 and the EWTB data processed in .json files, in the format below. If I gave you the data, would you be interested in running some whole-document evaluations? I'm reluctant to dive into running them myself, because I think it will probably involve retraining SyntaxNet, for fair comparison.
Thanks,
Matthew Honnibal
spacy.io
[
{
"id": "dev_09_c2e_0001",
"paragraphs": [
"raw": "...",
"sentences": [
{
"tokens": [
{
"head": 1,
"dep": "compound",
"tag": "NNP",
"orth": "U.",
"ner": "-",
"id": 0
}
]
What I'm hoping is that I can convince you to run this experiment on SyntaxNet for me :).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.