tensorflow / models Goto Github PK

Models and examples built with TensorFlow

License: Other

Python 87.16% C++ 1.97% Shell 0.34% Jupyter Notebook 10.18% Dockerfile 0.04% Starlark 0.32%

models's Issues

Better Inception Training?

The code (inception_train.py) says:

# With 8 Tesla K40's and a batch size = 256, the following setup achieves
# precision@1 = 73.5% after 100 hours and 100K steps (20 epochs)

I'm hoping to reproduce Inception-v3 with similar precision to the the original paper.
Is 73.5% the best precision this code can achieve?
If no, what does it take to reach a higher precision? Just more iterations?
If yes, why not release a code that actually matches the paper?

Thanks

VariationalAutoencoderRunner.py - Fixed gaussian_sample_size causes incompatible shapes

I fixed the scaling issue in VariationalAutoencoderRunner.py (See #23). However, running the default example causes following:

Epoch: 0001 cost= 1114.439753835
Epoch: 0002 cost= 662.529461080
Epoch: 0003 cost= 594.752329830
Epoch: 0004 cost= 569.599913920
Epoch: 0005 cost= 556.361018750
Epoch: 0006 cost= 545.052694460
Epoch: 0007 cost= 537.334268253
Epoch: 0008 cost= 530.251896875
Epoch: 0009 cost= 523.817275994
Epoch: 0010 cost= 519.874919247
Epoch: 0011 cost= 514.975155966
Epoch: 0012 cost= 510.715168395
Epoch: 0013 cost= 506.326094318
Epoch: 0014 cost= 502.172605824
Epoch: 0015 cost= 498.612383310
Epoch: 0016 cost= 495.592024787
Epoch: 0017 cost= 493.580289986
Epoch: 0018 cost= 490.370449006
Epoch: 0019 cost= 489.957028977
Epoch: 0020 cost= 486.818214844
W tensorflow/core/common_runtime/executor.cc:1102] 0x27f47b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
         [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
W tensorflow/core/common_runtime/executor.cc:1102] 0x542b0b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
         [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
         [[Node: range_1/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_226_range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
W tensorflow/core/common_runtime/executor.cc:1102] 0x542b0b0 Compute status: Invalid argument: Incompatible shapes: [10000,200] vs. [128,200]
         [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Sqrt, random_normal)]]
         [[Node: add_1/_27 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_225_add_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "VariationalAutoencoderRunner.py", line 53, in <module>
    print "Total cost: " + str(autoencoder.calc_total_cost(X_test))

It seems that fixing the gaussian_sample_size causes an error everytime we want to evaluate a batch of data where gaussian_sample_size != batch_size.

Can you add license?

Issues while running download_and_preprocess_flowers.sh

I could not run download_and_preprocess_flowers.sh .
It gave an error on the last line

(portenv)dhcp-ccc-3940:DashCamAnalytics aub3$ ./models/inception/data/download_and_preprocess_flowers.sh ~/temp/flowers2/
Skipping download of flower data.
./models/inception/data/download_and_preprocess_flowers.sh: line 93: ./models/inception/data/download_and_preprocess_flowers.sh.runfiles/inception/build_image_data: No such file or directory

After I manually echoed the command and ran it, it worked fine.

# Build the TFRecords version of the image data.
cd "${CURRENT_DIR}"
BUILD_SCRIPT="${WORK_DIR}/build_image_data"
OUTPUT_DIRECTORY="${DATA_DIR}"
echo "${BUILD_SCRIPT}"
echo "${CURRENT_DIR}"

echo "python build_image_data.py  --train_directory=${TRAIN_DIRECTORY}  --validation_directory=${VALIDATION_DIRECTORY}  --output_directory=${OUTPUT_DIRECTORY} --labels_file=${LABELS_FILE}"

"if grad" TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed

Getting the following error with the latest tensorflow (head version)

Traceback (most recent call last):
  File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/my_train.py", line 41, in <module>
    tf.app.run()
  File "/home/gabe/anaconda2/envs/tf-inception/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/my_train.py", line 37, in main
    inception_train.train(dataset)
  File "/home/gabe/repos/tf-models/inception/bazel-bin/inception/my_train.runfiles/inception/inception_train.py", line 269, in train
    if grad:
  File "/home/gabe/anaconda2/envs/tf-inception/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 475, in __nonzero__
    raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed. "
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use the logical TensorFlow ops to test the value of a tensor.

https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L269

Changing to if grad is not None solved the issue.

multi gpu got error

I want to use multi gpu，when I use tf.device specify which gpu to use,got an error.

with tf.Graph().as_default(),tf.device('/cpu:0'):

        opt,lr_op,global_step=optimizer(INITIAL_LEARNING_RATE,tf.get_variable('global_step',[],initializer=tf.constant_initializer(0),trainable=False))
        classify_opt,classify_lr_op,classify_global_step=optimizer(CLASSIFY_INITIAL_LEARNING_RATE,tf.get_variable('classify_global_step',[],initializer=tf.constant_initializer(0),trainable=False))

        tower_grads=[]
        classify_tower_grads=[]
        gpu_num=len(os.environ.get('CUDA_VISIBLE_DEVICES','').split(','))
        if gpu_num==0:raise RuntimeError
        for i in range(gpu_num):
            with tf.device('/gpu:%d'%i):
                with tf.name_scope('gpu_%d'%i) as scope:
                    loss_op,classify_loss_op=tower_loss(scope)
                    tf.get_variable_scope().reuse_variables()
                    grads=opt.compute_gradients(loss_op)
                    classify_grads=classify_opt.compute_gradients(classify_loss_op)
                    tower_grads.append(grads)
                    classify_tower_grads.append(classify_grads)
        classify_grads=average_gradients(classify_tower_grads)
        grads=average_gradients(tower_grads)

Traceback (most recent call last):
  File "cnn_inc_v4.py", line 404, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "cnn_inc_v4.py", line 401, in main
    train()
  File "cnn_inc_v4.py", line 377, in train
    _,loss_value,lr_value=sess.run([train_op,loss_op,lr_op])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 340, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 564, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 637, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 659, in _do_call
    e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'gpu_0/LearnedUnigramCandidateSampler': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available
         [[Node: gpu_0/LearnedUnigramCandidateSampler = LearnedUnigramCandidateSampler[num_sampled=100, num_true=1, range_max=195327, seed=0, seed2=0, unique=true, _device="/device:GPU:0"](gpu_0/ToInt64)]]
Caused by op u'gpu_0/LearnedUnigramCandidateSampler', defined at:
  File "cnn_inc_v4.py", line 404, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "cnn_inc_v4.py", line 401, in main
    train()
  File "cnn_inc_v4.py", line 340, in train
    loss_op,classify_loss_op=tower_loss(scope)
  File "cnn_inc_v4.py", line 282, in tower_loss
    loss_model(logits,yy_node,y_embeddings)
  File "cnn_inc_v4.py", line 204, in loss_model
    neg_ids, _, _ = tf.nn.learned_unigram_candidate_sampler(true_classes=tf.to_int64(tf.reshape(yy,[BATCH_SIZE,1])),num_true=1,num_sampled=NEG_NUM,unique=True,range_max=NUM_LABELS)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/candidate_sampling_ops.py", line 192, in learned_unigram_candidate_sampler
    seed2=seed2, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_candidate_sampling_ops.py", line 251, in _learned_unigram_candidate_sampler
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
    self._traceback = _extract_stack()

Flowers retraining example on inception-v3 diverges (with loss = NaN)

I'm having little luck getting the flowers sample to train. The most number of steps I've achieved has been 1400 with a batch size of 40 and a learning rate of .00005. I have tried may combinations of these two parameters with little luck. I'm curious if you might have suggestions; my machine is significantly more modest then the machine referenced in the source.

I'm running on a Ubuntu 14.04 based machine with two GTX 980 Ti GPUs and a i7-5930K processor with 64GB or RAM.

the flags are:
fine_tune=True
batch_size=40
num_gpus=2
input_queue_memory_factor=1

Sincerely,
Bob

bazel error

INFO: Found 65 targets and 12 test targets...
ERROR: /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/external/png_archive/BUILD:23:1: Executing genrule @png_archive//:configure failed: namespace-sandbox failed: error executing command /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/_bin/namespace-sandbox ... (remaining 5 argument(s) skipped).
/home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/external/png_archive/libpng-1.2.53 /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet
/tmp/tmp.59Q48wThIm /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet/external/png_archive/libpng-1.2.53 /home/danielkimo/.cache/bazel/_bazel_danielkimo/7769f3941595a75f80309df2fbce8755/syntaxnet

do not know why....

Syntaxnet README typo (?) on protobuf version

The Syntaxnet README tells users to

check your protobuf version with pip freeze | grep protobuf1

That should probably be pip freeze | grep protobuf (without the 1 at the end).

Invalid argument: indices[0] = [0,148] is out of bounds: need 0 <= index < [32,6]

I am trying to create a new inception model using:
bazel-bin/inception/flowers_train --train_dir="${TRAIN_DIR}" --data_dir="${FLOWERS_DATA_DIR}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1

and getting this error:
W tensorflow/core/framework/op_kernel.cc:896] Invalid argument: indices[0] = [0,148] is out of bounds: need 0 <= index < [32,6]

I have run build_image_data to create FLOWERS_DATA_DIR

Can I create a new inception model this way?

Python3 issue

In https://github.com/tensorflow/models/blob/master/inception/inception/slim/ops.py#L77

axis = range(len(inputs_shape) - 1)

This won't create a list in Python 3.
It works after changing to:

axis = list(range(len(inputs_shape) - 1))

[ distribution ] How to use multiple GPU on each replica ?

The Code Here shows how to set each replica which has a single tower that uses one GPU. I'm wondering if there is a way changing this code a little bit to make use of multiple GPU on one machine like that example.

The way I currently used for using all GPU on a worker machine is starting the number of workers that equal to the number of GPUs. then the workers can communicate to each other as if they are not on one machine. That is slower than if I can start a woker that control more than one GPU.

AttributeError: 'module' object has no attribute 'load_op_library'

After build syntaxnet, running the demo.sh I've got

macbookproloreto:syntaxnet admin$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
Traceback (most recent call last):
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/parser_eval.py", line 28, in <module>
    from syntaxnet import graph_builder
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/graph_builder.py", line 20, in <module>
    import syntaxnet.load_parser_ops
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
    tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'
Traceback (most recent call last):
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/conll2tree.runfiles/syntaxnet/conll2tree.py", line 22, in <module>
    import syntaxnet.load_parser_ops
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/conll2tree.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
    tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'
Traceback (most recent call last):
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/parser_eval.py", line 28, in <module>
    from syntaxnet import graph_builder
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/graph_builder.py", line 20, in <module>
    import syntaxnet.load_parser_ops
  File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/bazel-bin/syntaxnet/parser_eval.runfiles/syntaxnet/load_parser_ops.py", line 21, in <module>
    tf.load_op_library(
AttributeError: 'module' object has no attribute 'load_op_library'

Some tests failed during the build

FAIL: //syntaxnet:lexicon_builder_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log).
FAIL: //syntaxnet:graph_builder_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log).
FAIL: //syntaxnet:reader_ops_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log).
FAIL: //syntaxnet:beam_reader_ops_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log).
FAIL: //syntaxnet:text_formats_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log).
FAIL: //syntaxnet:parser_trainer_test (see /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log).
INFO: Elapsed time: 2115,987s, Critical Path: 1370,47s
//syntaxnet:arc_standard_transitions_test                                PASSED in 0,3s
//syntaxnet:parser_features_test                                         PASSED in 0,2s
//syntaxnet:sentence_features_test                                       PASSED in 0,4s
//syntaxnet:shared_store_test                                            PASSED in 5,2s
//syntaxnet:tagger_transitions_test                                      PASSED in 0,4s
//util/utf8:unicodetext_unittest                                         PASSED in 0,1s
//syntaxnet:beam_reader_ops_test                                         FAILED in 1,1s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log
//syntaxnet:graph_builder_test                                           FAILED in 3,1s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log
//syntaxnet:lexicon_builder_test                                         FAILED in 3,1s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log
//syntaxnet:parser_trainer_test                                          FAILED in 1,2s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log
//syntaxnet:reader_ops_test                                              FAILED in 3,0s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log
//syntaxnet:text_formats_test                                            FAILED in 1,2s
  /private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log

Is there a way to work with SyntaxNet on Windows?

Unable to parse bazel version : invalid literal for int(): "2b".

My bazel version is

bazel version
Build label: 0.2.2b
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Apr 25 08:11:19 2016 (1461571879)
Build timestamp: 1461571879
Build timestamp as int: 1461571879

When trying to build tests, I get the following error.

bazel test syntaxnet/... util/utf8/...
ERROR: /Users/Mayank/nonjunk/tensorflow/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
File "/Users/Mayank/nonjunk/tensorflow/models/syntaxnet/WORKSPACE", line 11
check_version("0.2.0")
File "/private/var/tmp/_bazel_Mayank/62ffc1b5b4632e1ac9339b416e2ec109/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
_parse_bazel_version(native.bazel_version)
File "/private/var/tmp/_bazel_Mayank/62ffc1b5b4632e1ac9339b416e2ec109/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: package contains errors: util/utf8.
ERROR: no such package 'external': Package 'external' contains errors.
INFO: Elapsed time: 0.129s
ERROR: Couldn't start the build. Unable to run tests.

I had to disable version checking from tensorflow/models/syntaxnet/WORKSPACE to get this working.

load("@tf//tensorflow:tensorflow.bzl", "check_version")
> #check_version("0.2.0")

Corrupted output from inception_eval

I'm trying to get out predictions with my fine-tuned model using a script similar to inception_eval.py.

I don't have true labels for it so I want to save both initial filenames and logit predictions.

I modified image_processing so it returns filenames too and I can use it like that:

images, labels, filenames = image_processing.inputs(dataset)

The problem is when I add this to the while-loop:

while step < num_iter and not coord.should_stop():
  paths, targets, preds = sess.run([filenames, labels, logits])
  for path in paths:
    print(path)

I got about 0.1% duplicated values (and 0.1% missing thereby).

Seems like thread-safety issue somewhere in the modules or I'm missing something.

[solved] CUDA_ERROR_OUT_OF_MEMORY when distributed training

when solve the issue here I use the same start script to start my two machines. Then this error occured:

E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 2.2K (2304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
...

I find the error occurs because framework assign all the memory on the first gpu, /gpu:0. finally I solve the problem by this commit.

According to the tensorflow source code gpu_device.cc line 553, the framework create all the GPU device local avaliable for each worker. So all workers allocate memory on first GPU, causing OUT_OF_MEMORY (however I just want assign 1 GPU for 1 worker).

So I'm wondering whether inception_distributed_train.py call the API in a correct way. If we want to achieve the goal that

Multiple worker tasks can run on the same machine with multiple GPUs so machine_A with 2 GPUs may have 2 workers while machine_B with 1 GPU just has 1 worker.

(discribed in the document)

Error while training flower

After successfully downloading and generating training and validation splits (I had to install coreutils and replace shuf with gshuf on OS X ). I encountered following error immediately.

(portenv)dhcp-ccc-3940:inception aub3$ bazel-bin/inception/flowers_train   --train_dir="${TRAIN_DIR}"   --data_dir="${FLOWERS_DATA_DIR}"   --pretrained_model_checkpoint_path="${MODEL_PATH}"   --fine_tune=True   --initial_learning_rate=0.001   --input_queue_memory_factor=1
/Users/aub3/portenv/lib/python2.7/site-packages/tensorflow/python/ops/image_ops.py:586: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if width == new_width_const and height == new_height_const:
Traceback (most recent call last):
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/flowers_train.py", line 41, in <module>
    tf.app.run()
  File "/Users/aub3/portenv/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/flowers_train.py", line 37, in main
    inception_train.train(dataset)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_train.py", line 235, in train
    loss = _tower_loss(images, labels, num_classes, scope)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_train.py", line 110, in _tower_loss
    scope=scope)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/inception_model.py", line 90, in inference
    scope=scope)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/inception_model.py", line 88, in inception_v3
    scope='conv0')
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
    return func(*args, **current_args)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/ops.py", line 184, in conv2d
    restore=restore)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
    return func(*args, **current_args)
  File "/Users/aub3/models/inception/bazel-bin/inception/flowers_train.runfiles/inception/slim/variables.py", line 224, in variable
    trainable=trainable, collections=collections)
TypeError: get_variable() got an unexpected keyword argument 'regularizer'

bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}" in lieu of "${DATA_DIR}$"?

Working on Download_and_preprocess for Imagenet:

& running the following:

DATA_DIR=$HOME/imagenet-data
bazel build inception/download_and_preprocess_imagenet
bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}$"

However, I'm wondering if it should be bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}" in lieu of bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}$"

execute 0 out of 12 tests: 12 were skipped

no test was done.
i just followed the following commands,
git clone --recursive https://github.com/tensorflow/models.git (ok)
cd models/syntaxnet/tensorflow
./configure (press n to tensorflow using gpu)
bazel test syntaxnet/... util/utf8/... (nothing happened)

why?
thanks

MacOS X: ERROR: Error evaluating WORKSPACE file.

I get this error when running bazel test.

macbookproloreto:syntaxnet admin$ bazel test --linkopt=-headerpad_max_install_names syntaxnet/... util/utf8/...
ERROR: /Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
    File "/Volumes/MacHDD2/Developmemt/AI/models/syntaxnet/WORKSPACE", line 11
        check_version("0.2.0")
    File "/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
        _parse_bazel_version(native.bazel_version)
    File "/private/var/tmp/_bazel_admin/7de5e8ad26205d6f3b147001f8f93014/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
        int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: package contains errors: syntaxnet.
ERROR: no such package 'external': Package 'external' contains errors.
INFO: Elapsed time: 0,179s
ERROR: Couldn't start the build. Unable to run tests.

Android support?

Will there be support for this technology on Android at some point in the future?

Repository structure?

Is this repository structure really sustainable for publishing such complex sub projects and wouldn't a structure like e.g. a separate tensorflow-models organization and one repository per model be more suitable?

Things will get overly complex and issues, pull requests (and maybe releases?) will be mixed up and will have to be re-separated manually.

SyntaxNet parser_eval: No such file or directory

I followed the instructions to install SyntaxNet on VM with Lubuntu 64bit. This is what I am getting when I try it out:

ilya@ilya-VirtualBox:~/dev/models/syntaxnet$ bazel test syntaxnet/... util/utf8/...
/home/ilya/bin/bazel: line 86: /home/ilya/.bazel/bin/bazel-real: cannot execute binary file: Exec format error
/home/ilya/bin/bazel: line 86: /home/ilya/.bazel/bin/bazel-real: Success

ilya@ilya-VirtualBox:~/dev/models/syntaxnet$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
syntaxnet/demo.sh: line 31: bazel-bin/syntaxnet/parser_eval: No such file or directory
syntaxnet/demo.sh: line 43: bazel-bin/syntaxnet/parser_eval: No such file or directory
syntaxnet/demo.sh: line 55: bazel-bin/syntaxnet/conll2tree: No such file or directory

Errors in script "download_and_preprocess_flowers.sh" running on osx

There is a missing validation when the [data dir] ends with /, some concatenations fails during the process.
shuf is named gshuf when installed through homebrew.

" Cannot execute binary file: Exec format error" during bazel test on Ubuntu 16.04

On Ubuntu 16.04 LTS I'm getting the following error when executing
bazel test syntaxnet/... util/utf8/...

/usr/local/bin/bazel: line 86: /usr/local/lib/bazel/bin/bazel-real: cannot execute binary file: Exec format error
/usr/local/bin/bazel: line 86: /usr/local/lib/bazel/bin/bazel-real: Success

Does this mean that Bazel has to built from the source files?

Build fails with most recent version of Bazel

I just built a copy of the most recent version of Bazel and it threw the following error when I build:

WARNING: /home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/WORKSPACE:1: Workspace name in /home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/WORKSPACE (@__main__) does not match the name given in the repository's definition (@tf); this will cause a build error in future versions.
ERROR: /home/zv/Development/tflow/models/syntaxnet/WORKSPACE:11:1: Traceback (most recent call last):
        File "/home/zv/Development/tflow/models/syntaxnet/WORKSPACE", line 11
                check_version("0.2.0")
        File "/home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/tensorflow/tensorflow.bzl", line 22, in check_version
                _parse_bazel_version(native.bazel_version)
        File "/home/zv/.cache/bazel/_bazel_zv/d49f4fd45b21dc0a5977bc0da866df7a/external/tf/tensorflow/tensorflow.bzl", line 15, in _parse_bazel_version
                int(number)
invalid literal for int(): "2b".
ERROR: Error evaluating WORKSPACE file.
ERROR: Loading of target '@bazel_tools//tools/cpp:toolchain' failed; build aborted: error loading package 'external': Package 'external' contains errors.
ERROR: Loading failed; build aborted.
INFO: Elapsed time: 0.098s
ERROR: Couldn't start the build. Unable to run tests.

This problem is addressed by removing the version check on line 11 of syntaxnet/WORKSPACE or converting to Bazel's ccheck syntax.

Executed 6 out of 12 tests: 6 tests pass and 6 fail locally

I have installed bazel 0.2.2. But when I run bazel test syntaxnet/... util/utf8/... ,there are 6 test are not passed, how to fix it ? thanks

cc1plus: warning: unrecognized command line option "-Wno-self-assign" [enabled by default]
FAIL: //syntaxnet:reader_ops_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log).
FAIL: //syntaxnet:beam_reader_ops_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log).
FAIL: //syntaxnet:graph_builder_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log).
FAIL: //syntaxnet:lexicon_builder_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log).
FAIL: //syntaxnet:text_formats_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log).
FAIL: //syntaxnet:parser_trainer_test (see /home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log).
INFO: Elapsed time: 341.488s, Critical Path: 294.82s
//syntaxnet:arc_standard_transitions_test (cached) PASSED in 0.0s
//syntaxnet:parser_features_test (cached) PASSED in 0.1s
//syntaxnet:sentence_features_test (cached) PASSED in 0.1s
//syntaxnet:shared_store_test (cached) PASSED in 0.4s
//syntaxnet:tagger_transitions_test (cached) PASSED in 0.1s
//util/utf8:unicodetext_unittest (cached) PASSED in 0.0s
//syntaxnet:beam_reader_ops_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/beam_reader_ops_test/test.log
//syntaxnet:graph_builder_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/graph_builder_test/test.log
//syntaxnet:lexicon_builder_test FAILED in 0.5s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/lexicon_builder_test/test.log
//syntaxnet:parser_trainer_test FAILED in 0.6s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/parser_trainer_test/test.log
//syntaxnet:reader_ops_test FAILED in 0.4s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/reader_ops_test/test.log
//syntaxnet:text_formats_test FAILED in 0.5s
/home/shaoshu/.cache/bazel/_bazel_shaoshu/54e2b7109927b4e6e7831a50befed0b7/syntaxnet/bazel-out/local-opt/testlogs/syntaxnet/text_formats_test/test.log

Executed 6 out of 12 tests: 6 tests pass and 6 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

SyntaxNet and Bazel Tests Error [Linux]

Description

After installing the correct deps (protobuf, swig, bazel, asciitree), tried running bazel test syntaxnet/... util/utf8/..., but after a lote of notes and warnings, the final output I got this:

ERROR: /home/lerax/.cache/bazel/_bazel_lerax/cae2b1799296ff1923cddf0854a31846/external/tf/tensorflow/core/kernels/BUILD:856:1: C++ compilation of rule '@tf//tensorflow/core/kernels:argmax_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 72 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.8/README.Bugs> for instructions.
INFO: Elapsed time: 552.118s, Critical Path: 428.91s
//syntaxnet:arc_standard_transitions_test                             NO STATUS
//syntaxnet:beam_reader_ops_test                                      NO STATUS
//syntaxnet:graph_builder_test                                        NO STATUS
//syntaxnet:lexicon_builder_test                                      NO STATUS
//syntaxnet:parser_features_test                                      NO STATUS
//syntaxnet:parser_trainer_test                                       NO STATUS
//syntaxnet:reader_ops_test                                           NO STATUS
//syntaxnet:sentence_features_test                                    NO STATUS
//syntaxnet:shared_store_test                                         NO STATUS
//syntaxnet:tagger_transitions_test                                   NO STATUS
//syntaxnet:text_formats_test                                         NO STATUS
//util/utf8:unicodetext_unittest                                      NO STATUS

Executed 0 out of 12 tests: 12 were skipped.

Any ideas?

Environment

feature	version
gcc	4.8.4
python	3.4.3
bazel	0.22
swig	2.0.11
protobuf	3.0.0b2
asciitree	0.3.1
OS	GNU/Linux, Distro Ubuntu 14.0

slim RNN

Hi there, thank you for releasing the slim library which is so elegant and useful. I found it extremely useful in my research. I just noticed from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py#L709 that slim RNN has been used. Will it be possible to release the slim RNN? I hope I am not asking too much...

Inception v4 model

Hi, thank you for releasing the v3 training model!! I am wondering are there any plans for releasing the v4 model?

Thanks,

Bazel error: bazel test syntaxnet/... util/utf8/...

Upon running the command: bazel test syntaxnet/... util/utf8/...

Any Ideas? Here is my output

bazel test syntaxnet/... util/utf8/...
/home/vertical-3/.cache/bazel/_bazel_vertical-3/0e0937a190400fc5db42694f75128f7d/external/tf/google/protobuf/BUILD:534:1: C++ compilation of rule '@tf//google/protobuf:pyext/_message.so' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 49 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
In file included from external/tf/google/protobuf/python/google/protobuf/pyext/repeated_composite_container.cc:34:0:
external/tf/google/protobuf/python/google/protobuf/pyext/repeated_composite_container.h:37:20: fatal error: Python.h: No such file or directory
 #include <Python.h>
                    ^
compilation terminated.
INFO: Elapsed time: 14.267s, Critical Path: 14.07s

Why delete “train_dir” at the beginning of training?

In inception_train.py, model checkpoint was periodically saved in train_dir .
Unfortunately, next time I fine tune a chechpoint model , train_dir was deleted first.
Then I found

if tf.gfile.Exists(FLAGS.train_dir):
    tf.gfile.DeleteRecursively(FLAGS.train_dir)

However, I lost all of my checkpoins before I read the codes.

Error running multiple workers on one machine which has multiple GPUs

I run the distribute version of the example
on two machine, each of them has 4 GPUs. if I start one worker each, its OK. However, when I want to run multiple workers on each machine, "InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [2]" exist.

Both of my remote server's operating system are CentOS 7

# machine A with 4 GPU
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'

~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'

~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--job_name='ps' \
-task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'

# machine B with 4 GPU
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'

~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--task_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.29:2221,10.10.102.29:2222'

Error log

INFO:tensorflow:Waiting for model to be ready: Attempting to use uninitialized value mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage
     [[Node: mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage/read = Identity[T=DT_FLOAT, _class=["loc:@mixed_35x35x256a/branch3x3dbl/Conv/weights"], _device="/job:ps/replica:0/task:0/cpu:0"](mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage)]]
Caused by op u'mixed_35x35x256a/branch3x3dbl/Conv/weights/ExponentialMovingAverage/read', defined at:
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
    inception_distributed_train.train(server.target, dataset, cluster_spec)
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 220, in train
    apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/sync_replicas_optimizer.py", line 427, in apply_gradients
    self._variables_to_average)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 282, in apply
    colocate_with_primary=True)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 86, in create_slot
    return _create_slot_var(primary, val, scope)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 50, in _create_slot_var
    slot = variables.Variable(val, name=scope, trainable=False)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 206, in __init__
    dtype=dtype)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 275, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 609, in identity
    return _op_def_lib.apply_op("Identity", input=input, name=name)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
    self._traceback = _extract_stack()

Traceback (most recent call last):
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
    inception_distributed_train.train(server.target, dataset, cluster_spec)
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 260, in train
    sess = sv.prepare_or_wait_for_session(target, config=sess_config)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 674, in prepare_or_wait_for_session
    config=config, init_feed_dict=self._init_feed_dict)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 158, in prepare_session
    max_wait_secs=max_wait_secs, config=config)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 214, in recover_session
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1090, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
    run_metadata_ptr)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
    target_list, options, run_metadata)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
    e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [2]
     [[Node: save/Assign_103 = Assign[T=DT_INT64, _class=["loc:@local_steps"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](local_steps, save/restore_slice_103)]]
     [[Node: save/restore_all/NoOp_S4 = _Recv[client_terminated=false, recv_device="/job:worker/replica:0/task:0/gpu:0", send_device="/job:ps/replica:0/task:0/cpu:0", send_device_incarnation=1831303354831316628, tensor_name="edge_1174_save/restore_all/NoOp", tensor_type=DT_FLOAT, _device="/job:worker/replica:0/task:0/gpu:0"]()]]
Caused by op u'save/Assign_103', defined at:
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 65, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/imagenet_distributed_train.py", line 61, in main
    inception_distributed_train.train(server.target, dataset, cluster_spec)
  File "/home/models/inception/bazel-bin/inception/imagenet_distributed_train.runfiles/__main__/inception/inception_distributed_train.py", line 233, in train
    saver = tf.train.Saver()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    restore_sequentially=restore_sequentially)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 502, in build
    filename_tensor, vars_to_save, restore_sequentially, reshape)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 268, in _AddRestoreOps
    validate_shape=validate_shape))
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 40, in assign
    use_locking=use_locking, name=name)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
    self._traceback = _extract_stack()

VariationalAutoencoderRunner.py (cost= nan)

When running the VAE example VariationalAutoencoderRunner.py in models/autoencoder , I get following output:

Epoch: 0001 cost= nan
Epoch: 0002 cost= nan
Epoch: 0003 cost= nan
Epoch: 0004 cost= nan
Epoch: 0005 cost= nan
Epoch: 0006 cost= nan
Epoch: 0007 cost= nan
Epoch: 0008 cost= nan
Epoch: 0009 cost= nan
Epoch: 0010 cost= nan
Epoch: 0011 cost= nan
Epoch: 0012 cost= nan
Epoch: 0013 cost= nan
Epoch: 0014 cost= nan
Epoch: 0015 cost= nan
Epoch: 0016 cost= nan
Epoch: 0017 cost= nan
Epoch: 0018 cost= nan
Epoch: 0019 cost= nan
Epoch: 0020 cost= nan

Machine: AWS EC2 GPU instance with image ami-77e0da1d

Should "${FLOWERS_DATA_DIR}$" be "${FLOWERS_DATA_DIR}"?

Followed "How to Fine-Tune a Pre-Trained Model on a New Task "Getting Started" part, I create "flowers-data" directory in my home directory, and input these in terminal:

FLOWERS_DATA_DIR=$HOME/flowers-data
bazel build -c opt inception/download_and_preprocess_flowers
bazel-bin/inception/download_and_preprocess_flowers "${FLOWERS_DATA_DIR}$"

I want to ask should "${FLOWERS_DATA_DIR}$" be "${FLOWERS_DATA_DIR}"?

execute 0 out of 12 tests: 12 were skipped

why?
thanks

Error installing on OS X w/ Homebrew GCC 5.3.0

Following installation instructions on OS X 10.11.4. GCC is from homebrew, version 5.3.0.

This happened:

$ bazel test --linkopt=-headerpad_max_install_names syntaxnet/... util/utf8/...
...........
INFO: Found 65 targets and 12 test targets...
ERROR: /Users/pdarragh/Programming/Projects/tensorflow-models/syntaxnet/third_party/utf/BUILD:3:1: C++ compilation of rule '//third_party/utf:utf' failed: osx_cc_wrapper.sh failed: error executing command external/local_config_cc/osx_cc_wrapper.sh -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 36 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
gcc: error: unrecognized command line option '-Wthread-safety'
gcc: error: unrecognized command line option '-Wself-assign'
INFO: Elapsed time: 111.199s, Critical Path: 0.37s
//syntaxnet:arc_standard_transitions_test                             NO STATUS
//syntaxnet:beam_reader_ops_test                                      NO STATUS
//syntaxnet:graph_builder_test                                        NO STATUS
//syntaxnet:lexicon_builder_test                                      NO STATUS
//syntaxnet:parser_features_test                                      NO STATUS
//syntaxnet:parser_trainer_test                                       NO STATUS
//syntaxnet:reader_ops_test                                           NO STATUS
//syntaxnet:sentence_features_test                                    NO STATUS
//syntaxnet:shared_store_test                                         NO STATUS
//syntaxnet:tagger_transitions_test                                   NO STATUS
//syntaxnet:text_formats_test                                         NO STATUS
//util/utf8:unicodetext_unittest                                      NO STATUS

Executed 0 out of 12 tests: 12 were skipped.

Looked at file syntaxnet/bazel-syntaxnet/external/local_config_cc/osx_cc_wrapper.sh after attempting to install, since the error seems to indicate the problem happened there, but I can't see anything wrong with it. Attempted changing the gcc call to use OS X's gcc (line 56), but that had no effect on the error.

Any ideas?

Inception Slim: Inconsistence between Readme and losses.py

In the losses section of Readme.md of the Slim lib for the inception model there seems to be many references to functions that are nowhere in the code. The losses.py file seems to be an older version and the Readme newer.

losses.py has three functions, l1_loss, l2_loss, cross_entropy_loss while the Readme mentions slim.losses.ClassificationLoss, slim.losses.SumOfSquaresLoss.

Is the code somewhere else on Github, or is this an error, referring to non-published code ?

How to save a graph

Quick question: How can I freeze/save my fine-tune model? (To use it later on android)

I'm trying to save it during training checkpoints using graph_util.convert_variables_to_constants(), however, it is not clear with the name of the output classification layer in the retrained graph.

https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L349

      if step % 5000 == 0 or (step + 1) == FLAGS.max_steps:
        output_graph_def = graph_util.convert_variables_to_constants(
                sess, sess.graph.as_graph_def(), ????)

Thanks in advanced.

Sentence to word problem

I've run Standard Input Parsing with this sentence I said , 'what 're you ? Crazy ? said Sandowsky. I can't afford to do that.
SyntaxNet parses can't to ca and n't instead of can't or can not. Do I need to train the machine first?

Using Parsey McParseface in Unity3d

Perhaps this is not the right place to ask this, but here goes:

I'd like to use Parsey McParseface in the Unity3d game engine. It seems like a big aspect of why this will be useful is because it is making machine learning easily accessible to all developers with minimal setup. I've looked around a bit, but not sure if I'll be able to use it in Unity seeing as it uses Python. Does anyone know either how to set this up (I'm sure it would useful for many developers), or maybe the next best alternative for powerful NLU? If there is a web service I could request that is already built and ready to go, that would work too.

Thanks, and please tell me the best place to post this question if it is not appropriate here.

Multi-GPU performance regression with #44

Hello,

First, congrats on the release of 0.8.0 and especially for the distributed code!
However, I noticed a performance regression when training inception v3 on a multi-GPU system (NVIDIA DevBox: 4x Titan X, driver 361.48).
I was able to pinpoint the performance regression at 84b58a6:

2016-04-14 20:46:34.577596: step 0, loss = 13.08 (2.0 examples/sec; 127.239 sec/batch)
2016-04-14 20:47:44.064244: step 10, loss = 13.16 (68.1 examples/sec; 3.762 sec/batch)
2016-04-14 20:48:21.515250: step 20, loss = 13.25 (68.9 examples/sec; 3.718 sec/batch)
2016-04-14 20:48:59.089292: step 30, loss = 13.27 (68.7 examples/sec; 3.728 sec/batch)
2016-04-14 20:49:36.558736: step 40, loss = 13.38 (68.2 examples/sec; 3.753 sec/batch)
2016-04-14 20:50:13.988347: step 50, loss = 13.29 (68.9 examples/sec; 3.718 sec/batch)
2016-04-14 20:50:51.547393: step 60, loss = 13.35 (67.5 examples/sec; 3.795 sec/batch)
2016-04-14 20:51:29.056161: step 70, loss = 13.30 (68.1 examples/sec; 3.761 sec/batch)
2016-04-14 20:52:06.558016: step 80, loss = 13.32 (68.4 examples/sec; 3.744 sec/batch)
2016-04-14 20:52:44.036974: step 90, loss = 12.97 (68.4 examples/sec; 3.745 sec/batch)
2016-04-14 20:53:21.528334: step 100, loss = 13.19 (68.3 examples/sec; 3.746 sec/batch)

With the commit just before, 9a1dfdf:

2016-04-14 21:19:02.731016: step 0, loss = 13.12 (2.1 examples/sec; 122.740 sec/batch)
2016-04-14 21:19:54.731369: step 10, loss = 13.73 (116.1 examples/sec; 2.204 sec/batch)
2016-04-14 21:20:16.844538: step 20, loss = 13.59 (116.4 examples/sec; 2.200 sec/batch)
2016-04-14 21:20:38.962047: step 30, loss = 13.98 (116.0 examples/sec; 2.207 sec/batch)
2016-04-14 21:21:01.137735: step 40, loss = 14.15 (114.9 examples/sec; 2.228 sec/batch)
2016-04-14 21:21:23.400314: step 50, loss = 13.68 (114.6 examples/sec; 2.235 sec/batch)
2016-04-14 21:21:45.651567: step 60, loss = 13.66 (115.9 examples/sec; 2.209 sec/batch)
2016-04-14 21:22:07.986970: step 70, loss = 13.35 (115.0 examples/sec; 2.226 sec/batch)
2016-04-14 21:22:30.290880: step 80, loss = 13.14 (115.1 examples/sec; 2.225 sec/batch)
2016-04-14 21:22:52.503986: step 90, loss = 13.15 (116.1 examples/sec; 2.205 sec/batch)
2016-04-14 21:23:14.733461: step 100, loss = 13.11 (114.9 examples/sec; 2.227 sec/batch)

I also tried 5d7612c with --num_readers=16, with no improvement.

I'm running the inception training inside a Docker container using nvidia-docker (I'm one of the maintainer).

FROM gcr.io/tensorflow/tensorflow:0.8.0rc0-devel-gpu

WORKDIR /models
RUN git clone https://github.com/tensorflow/models.git . && \
    git checkout 9a1dfdf263b358a1560b8f7ef6c76595b71ca201

WORKDIR /models/inception

RUN bazel build -c opt --config=cuda inception/imagenet_train

CMD bazel-bin/inception/imagenet_train --num_gpus=4 --batch_size=256 \
         --max_steps=110 --log_device_placement=true --num_epochs_per_decay=2.0 --learning_rate_decay_factor=0.94 \
         --train_dir=/data/imagenet --data_dir=/raid/tensorflow_imagenet

With --log_device_placement=true the full log is huge, but I can share it if you want.

ping @jmchen-g, author of PR #44

Let me know how I can help!

Provide instructions for Bazel build

I think it might be easier to use tensorflow as a git submodule rather than symlinks. it took me a while to figure out that I have to clone tensorflow inside models.

Also bazel ends up building entire tensor flow, which is very slow, and seems needless when you need to only use models.

Need numpy to install

Your install instructions do not mention that you need the python numpy module(s) in order to install correctly. You might want to add that to the README.

Stephen

bazel test syntaxnet/... util/utf8/... fails all tests on Arch linux

I'm running in CPU only mode and python 2.7, but it doesn't work with python 3 on my machine either

I get this:

INFO: Elapsed time: 85.654s, Critical Path: 78.81s
//syntaxnet:arc_standard_transitions_test                             NO STATUS
//syntaxnet:beam_reader_ops_test                                      NO STATUS
//syntaxnet:graph_builder_test                                        NO STATUS
//syntaxnet:lexicon_builder_test                                      NO STATUS
//syntaxnet:parser_features_test                                      NO STATUS
//syntaxnet:parser_trainer_test                                       NO STATUS
//syntaxnet:reader_ops_test                                           NO STATUS
//syntaxnet:sentence_features_test                                    NO STATUS
//syntaxnet:shared_store_test                                         NO STATUS
//syntaxnet:tagger_transitions_test                                   NO STATUS
//syntaxnet:text_formats_test                                         NO STATUS
//util/utf8:unicodetext_unittest                                      NO STATUS

Executed 0 out of 12 tests: 12 were skipped.

if you need more information about my system/install, let me know!

edit: downgrading from GCC 6.1 to 4.9 seems to have done something. Bazel is testing now and it's been running for about 2 hours. Is that normal? Would enabling the GPU speed that up in the future?

TypeError: get_variable() got an unexpected keyword argument 'regularizer'

I am interested in training inception from scratch. I manage to build imagenet as instructed. However, the following command:

bazel-bin/inception/imagenet_train.py --num_gpus=1 --batch_size=32 --train_dir=/tmp/imagenet_train --data_dir=/tmp/imagenet_data

Results in this stack trace:

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/image_ops.py:586: FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if width == new_width_const and height == new_height_const:
Traceback (most recent call last):
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/imagenet_train.py", line 41, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/imagenet_train.py", line 37, in main
inception_train.train(dataset)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_train.py", line 235, in train
loss = _tower_loss(images, labels, num_classes, scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_train.py", line 110, in _tower_loss
scope=scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/inception_model.py", line 90, in inference
scope=scope)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/inception_model.py", line 88, in inception_v3
scope='conv0')
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(_args, *_current_args)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/ops.py", line 184, in conv2d
restore=restore)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/scopes.py", line 129, in func_with_args
return func(_args, *_current_args)
File "/ssd/esteva/tensorflow_master/models/inception/bazel-bin/inception/imagenet_train.runfiles/inception/slim/variables.py", line 224, in variable
trainable=trainable, collections=collections)
TypeError: get_variable() got an unexpected keyword argument 'regularizer'

Any idea what's happening?

How to suspend the training binary on GPU?

I have only one GPU.
Does anyone know how to suspend the training binary while running the evaluation on the same GPU?

inception-v3-model/inception-v3/model.ckpt-157585 corrupted

While trying to train my own data, by running (as in the README for training own flower data);
bazel-bin/inception/flowers_train --train_dir="${TRAIN_DIR}" --data_dir="${FLOWERS_DATA_DIR}" --pretrained_model_checkpoint_path="${MODEL_PATH}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1

I get an error:
tensorflow.python.framework.errors.DataLossError: Unable to open table file /Users/asd/workspace/inception-v3-model/inception-v3/model.ckpt-157585: Data loss: corrupted compressed block contents: perhaps your file is in a different file format and you need to use a different restore operator?

Is the model checkpoint really corrupt?

Evaluation of SyntaxNet on whole, unprocessed documents (no gold SBD etc)

Hi,

First, thanks for open sourcing SyntaxNet. I've been looking forward to playing with it since your 2015 paper. I'm hoping to convince you to run a small experiment :).

I think it's a shame that parsers are always evaluated using gold-standard pre-processing. Dridan et al (2013) showed that error propagation from the tokenization and SBD can have a significant impact on accuracy.[1] More to the point, I speculate that some algorithms are going to be more or less sensitive to this. If we only ever evaluate on gold-standard pre-processing, we never get to measure this sensitivity.

One of the neat things about transition-based parsers is that they can do joint segmentation and parsing. You can feed in a whole document at once --- you don't need to pre-segment. I found this helped a lot when I was doing speech parsing, and I carried the trick over to my parser spaCy.

I never wrote this up, but I found that doing joint segmentation and parsing improved accuracy on the full task (i.e, from raw text) by ~1% (going from memory here...Hope I'm not wrong). I think the improvement is mostly due to seeing segmentation errors during training.

You can't easily do this with a chart or graph-based parser, and document parsing decreases the effectiveness of a beam. On balance, this produces a slight advantage for greedy models. So, I think the algorithmic implications are quite interesting. It's not just a case of finding out that everyone's score decreases by some fixed amount, depending on the pre-processing accuracy.

I have OntoNotes 5 and the EWTB data processed in .json files, in the format below. If I gave you the data, would you be interested in running some whole-document evaluations? I'm reluctant to dive into running them myself, because I think it will probably involve retraining SyntaxNet, for fair comparison.

Thanks,
Matthew Honnibal
spacy.io

[
    {
        "id": "dev_09_c2e_0001", 
        "paragraphs": [
        "raw": "...", 
        "sentences": [
            {
                "tokens": [
                    {
                        "head": 1, 
                        "dep": "compound", 
                        "tag": "NNP", 
                        "orth": "U.", 
                        "ner": "-", 
                        "id": 0
                    }
               ]

What I'm hoping is that I can convince you to run this experiment on SyntaxNet for me :).

[1] http://dridan.com/research/papers/iwpt13-docparsing.pdf

tensorflow / models Goto Github PK

models's Issues

Description

Environment

Recommend Projects

Recommend Topics

Recommend Org

Jobs