GithubHelp home page GithubHelp logo

Comments (7)

mrnikwaws avatar mrnikwaws commented on July 25, 2024

Thanks for reporting this problem. We'll investigate and get back to you.

Does this problem occur when run locally using Tensorflow 1.15, or only on serving infrastructure?

from aws-neuron-sdk.

jeisinge avatar jeisinge commented on July 25, 2024

The exported SavedModel works on TF-Serving. After compilation, it no longer serves on Neuron.

The compilation had the following warnings:

 2020-04-24 21:43:18.973556: I bazel-out/k8-opt/genfiles/tensorflow/python/neuron/convert/segment.cc:460] There are 364 ops of 27 different types in the graph that are not compiled by neuron-cc: Tile, Assert, ExpandDims, Switch, PlaceholderWithDefault, Range, ParseExample, Const, GatherV2, NoOp, OneHot, Placeholder, HashTableV2, SquaredDifference, LookupTableFindV2, AsString, LookupTableSizeV2, SparseFillEmptyRows, SelectV2, Merge, SparseReshape, StringToHashBucketFast, Where, ArgMax, Bucketize, SparseSegmentMean, Unique, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
...
INFO:tensorflow:Number of operations in TensorFlow session: 21918
INFO:tensorflow:Number of operations after tf.neuron optimizations: 11759
INFO:tensorflow:Number of operations placed on Neuron runtime: 202

from aws-neuron-sdk.

mrnikwaws avatar mrnikwaws commented on July 25, 2024

Hi Jeisinge,

It looks like this is the same issue reported in this stack overflow post: https://stackoverflow.com/questions/44236090/how-to-keep-lookup-tables-initialized-for-prediction-and-not-just-training. Please refer there for example code snippets.

Can you please try the following and let us know if the issue is resolved?

  • Add an initializer operation when you save the model OR
  • If you are using the tf.estimator API then load the model in python, add an initializer op and re-save it as a new SavedModel

from aws-neuron-sdk.

jeisinge avatar jeisinge commented on July 25, 2024

I don't know if this is the same issue.

The poster is not exporting the SavedModel correctly --- it is not running anywhere. However, our SavedModel works well on TensorFlow Serving and in TensorFlow. Also, the high-level API appears to be very different - it is using tf.contrib.lookup in the model; we are using tf.feature_column.embedding_column in the Estimator; in particular, I believe Estimator does all of this for us as does the solution author:

NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default

Why does our Estimator SavedModel work well on TensorFlow Serving, but not compile on Neuron TF Serving?

from aws-neuron-sdk.

mrnikwaws avatar mrnikwaws commented on July 25, 2024

Are we able to get some sample code (which does something minimal with the same problem), or can you share your code?

We currently run an optimization pass called convert_variables_to_constants which may cancel some control edges. So your problem may be combination of your code and our optimizations. However we are making best guesses without some sample code.

The following script generates a model that contains a table lookup operator and it works fine with Neuron. Nonetheless the reported error can be triggered if the with statement with sess.graph.control_dependencies([table.initializer]): is not there.

# table_lookup.py
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn


with tf.Session(graph=tf.Graph()) as sess:
    keys_tensor = tf.constant([1, 2])
    vals_tensor = tf.constant([3.0, 4.0])
    input_tensor = tf.placeholder(tf.int32, [2])
    feed_dict = {input_tensor.name: [1, 5]}
    table = tf.lookup.StaticHashTable(
        tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1.0)
    with sess.graph.control_dependencies([table.initializer]):
        lookup = table.lookup(input_tensor)
    tensor = lookup + 1.0
    out = tensor - 2.0

    print(sess.run(out, feed_dict))
    model_dir = './temp_model'
    shutil.rmtree(model_dir, ignore_errors=True)
    inputs = {input_tensor.name: input_tensor}
    outputs = {out.name: out}
    tf.saved_model.simple_save(sess, model_dir, inputs, outputs)

model_dir_neuron = './temp_model_neuron'
shutil.rmtree(model_dir_neuron, ignore_errors=True)
tfn.saved_model.compile(model_dir, model_dir_neuron)

with tf.Session(graph=tf.Graph()) as sess:
    meta_graph = tf.saved_model.loader.load(sess, ['serve'], model_dir_neuron)
    input_tensor = sess.graph.get_tensor_by_name(input_tensor.name)
    out = sess.graph.get_tensor_by_name(out.name)
    print(sess.run(out, feed_dict))

Regular output:

(newenv) [test@cdd examples]$ python table_lookup.py 
WARNING:tensorflow:From table_lookup.py:6: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-04-28 20:25:44.972150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-28 20:25:44.972190: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-28 20:25:44.972209: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cdd): /proc/driver/nvidia/version does not exist
2020-04-28 20:25:44.972539: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-28 20:25:44.983396: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300050000 Hz
2020-04-28 20:25:44.986656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x36a8970 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-28 20:25:44.986683: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From table_lookup.py:9: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[ 2. -2.]
WARNING:tensorflow:From table_lookup.py:23: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2020-04-28 20:25:45.703566: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 6 ops of 5 different types in the graph that are not compiled by neuron-cc: LookupTableImportV2, LookupTableFindV2, HashTableV2, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_794c60f0eaf84c4e with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 11
INFO:tensorflow:Number of operations after tf.neuron optimizations: 12
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:Successfully converted ./temp_model to ./temp_model_neuron
WARNING:tensorflow:From table_lookup.py:30: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
[ 2. -2.]

Output after taking out with statement:

2020-04-28 20:23:45.879770: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.         
Traceback (most recent call last):                                                                                                                                            
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                
    return fn(*args)                                                                                                                                                          
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                 
    target_list, run_metadata)                                                                                                                                                
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                     
    run_metadata)                                                                                                                                                             
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.                                                                                       
         [[{{node hash_table_Lookup/LookupTableFindV2}}]]

from aws-neuron-sdk.

jeisinge avatar jeisinge commented on July 25, 2024

Unfortunately, our model is not easily extracted into sample code. If I get some time this weekend, I'll try to work up a sample example.

Also, I noticed that this example doesn't use https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column .

from aws-neuron-sdk.

mrnikwaws avatar mrnikwaws commented on July 25, 2024

As discussed offline a fix for this problem has been created and will appear in a future release. Please re-open this issue if you have concerns

from aws-neuron-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.