We have a TensorFlow Estimator SavedModel. When it is compiled and run on the serving

TF Feature Columns - embedding_column not supported? about aws-neuron-sdk HOT 7 CLOSED

aws-neuron commented on July 25, 2024

TF Feature Columns - embedding_column not supported?

from aws-neuron-sdk.

Comments (7)

mrnikwaws commented on July 25, 2024

Thanks for reporting this problem. We'll investigate and get back to you.

Does this problem occur when run locally using Tensorflow 1.15, or only on serving infrastructure?

from aws-neuron-sdk.

jeisinge commented on July 25, 2024

The exported SavedModel works on TF-Serving. After compilation, it no longer serves on Neuron.

The compilation had the following warnings:

 2020-04-24 21:43:18.973556: I bazel-out/k8-opt/genfiles/tensorflow/python/neuron/convert/segment.cc:460] There are 364 ops of 27 different types in the graph that are not compiled by neuron-cc: Tile, Assert, ExpandDims, Switch, PlaceholderWithDefault, Range, ParseExample, Const, GatherV2, NoOp, OneHot, Placeholder, HashTableV2, SquaredDifference, LookupTableFindV2, AsString, LookupTableSizeV2, SparseFillEmptyRows, SelectV2, Merge, SparseReshape, StringToHashBucketFast, Where, ArgMax, Bucketize, SparseSegmentMean, Unique, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
...
INFO:tensorflow:Number of operations in TensorFlow session: 21918
INFO:tensorflow:Number of operations after tf.neuron optimizations: 11759
INFO:tensorflow:Number of operations placed on Neuron runtime: 202

from aws-neuron-sdk.

mrnikwaws commented on July 25, 2024

Hi Jeisinge,

It looks like this is the same issue reported in this stack overflow post: https://stackoverflow.com/questions/44236090/how-to-keep-lookup-tables-initialized-for-prediction-and-not-just-training. Please refer there for example code snippets.

Can you please try the following and let us know if the issue is resolved?

Add an initializer operation when you save the model OR
If you are using the tf.estimator API then load the model in python, add an initializer op and re-save it as a new SavedModel

from aws-neuron-sdk.

jeisinge commented on July 25, 2024

I don't know if this is the same issue.

The poster is not exporting the SavedModel correctly --- it is not running anywhere. However, our SavedModel works well on TensorFlow Serving and in TensorFlow. Also, the high-level API appears to be very different - it is using tf.contrib.lookup in the model; we are using tf.feature_column.embedding_column in the Estimator; in particular, I believe Estimator does all of this for us as does the solution author:

NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default

Why does our Estimator SavedModel work well on TensorFlow Serving, but not compile on Neuron TF Serving?

from aws-neuron-sdk.

mrnikwaws commented on July 25, 2024

Are we able to get some sample code (which does something minimal with the same problem), or can you share your code?

We currently run an optimization pass called convert_variables_to_constants which may cancel some control edges. So your problem may be combination of your code and our optimizations. However we are making best guesses without some sample code.

The following script generates a model that contains a table lookup operator and it works fine with Neuron. Nonetheless the reported error can be triggered if the with statement with sess.graph.control_dependencies([table.initializer]): is not there.

# table_lookup.py
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn


with tf.Session(graph=tf.Graph()) as sess:
    keys_tensor = tf.constant([1, 2])
    vals_tensor = tf.constant([3.0, 4.0])
    input_tensor = tf.placeholder(tf.int32, [2])
    feed_dict = {input_tensor.name: [1, 5]}
    table = tf.lookup.StaticHashTable(
        tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1.0)
    with sess.graph.control_dependencies([table.initializer]):
        lookup = table.lookup(input_tensor)
    tensor = lookup + 1.0
    out = tensor - 2.0

    print(sess.run(out, feed_dict))
    model_dir = './temp_model'
    shutil.rmtree(model_dir, ignore_errors=True)
    inputs = {input_tensor.name: input_tensor}
    outputs = {out.name: out}
    tf.saved_model.simple_save(sess, model_dir, inputs, outputs)

model_dir_neuron = './temp_model_neuron'
shutil.rmtree(model_dir_neuron, ignore_errors=True)
tfn.saved_model.compile(model_dir, model_dir_neuron)

with tf.Session(graph=tf.Graph()) as sess:
    meta_graph = tf.saved_model.loader.load(sess, ['serve'], model_dir_neuron)
    input_tensor = sess.graph.get_tensor_by_name(input_tensor.name)
    out = sess.graph.get_tensor_by_name(out.name)
    print(sess.run(out, feed_dict))

Regular output:

(newenv) [test@cdd examples]$ python table_lookup.py 
WARNING:tensorflow:From table_lookup.py:6: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-04-28 20:25:44.972150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-28 20:25:44.972190: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-28 20:25:44.972209: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cdd): /proc/driver/nvidia/version does not exist
2020-04-28 20:25:44.972539: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-28 20:25:44.983396: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300050000 Hz
2020-04-28 20:25:44.986656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x36a8970 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-28 20:25:44.986683: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From table_lookup.py:9: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[ 2. -2.]
WARNING:tensorflow:From table_lookup.py:23: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2020-04-28 20:25:45.703566: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 6 ops of 5 different types in the graph that are not compiled by neuron-cc: LookupTableImportV2, LookupTableFindV2, HashTableV2, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_794c60f0eaf84c4e with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 11
INFO:tensorflow:Number of operations after tf.neuron optimizations: 12
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:Successfully converted ./temp_model to ./temp_model_neuron
WARNING:tensorflow:From table_lookup.py:30: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
[ 2. -2.]

Output after taking out with statement:

2020-04-28 20:23:45.879770: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.         
Traceback (most recent call last):                                                                                                                                            
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call                                
    return fn(*args)                                                                                                                                                          
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn                                 
    target_list, run_metadata)                                                                                                                                                
  File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun                     
    run_metadata)                                                                                                                                                             
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.                                                                                       
         [[{{node hash_table_Lookup/LookupTableFindV2}}]]

from aws-neuron-sdk.

jeisinge commented on July 25, 2024

Unfortunately, our model is not easily extracted into sample code. If I get some time this weekend, I'll try to work up a sample example.

Also, I noticed that this example doesn't use https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column .

from aws-neuron-sdk.

mrnikwaws commented on July 25, 2024

As discussed offline a fix for this problem has been created and will appear in a future release. Please re-open this issue if you have concerns

from aws-neuron-sdk.

TF Feature Columns - embedding_column not supported? about aws-neuron-sdk HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs