Comments (7)
Thanks for reporting this problem. We'll investigate and get back to you.
Does this problem occur when run locally using Tensorflow 1.15, or only on serving infrastructure?
from aws-neuron-sdk.
The exported SavedModel works on TF-Serving. After compilation, it no longer serves on Neuron.
The compilation had the following warnings:
2020-04-24 21:43:18.973556: I bazel-out/k8-opt/genfiles/tensorflow/python/neuron/convert/segment.cc:460] There are 364 ops of 27 different types in the graph that are not compiled by neuron-cc: Tile, Assert, ExpandDims, Switch, PlaceholderWithDefault, Range, ParseExample, Const, GatherV2, NoOp, OneHot, Placeholder, HashTableV2, SquaredDifference, LookupTableFindV2, AsString, LookupTableSizeV2, SparseFillEmptyRows, SelectV2, Merge, SparseReshape, StringToHashBucketFast, Where, ArgMax, Bucketize, SparseSegmentMean, Unique, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
...
INFO:tensorflow:Number of operations in TensorFlow session: 21918
INFO:tensorflow:Number of operations after tf.neuron optimizations: 11759
INFO:tensorflow:Number of operations placed on Neuron runtime: 202
from aws-neuron-sdk.
Hi Jeisinge,
It looks like this is the same issue reported in this stack overflow post: https://stackoverflow.com/questions/44236090/how-to-keep-lookup-tables-initialized-for-prediction-and-not-just-training. Please refer there for example code snippets.
Can you please try the following and let us know if the issue is resolved?
- Add an initializer operation when you save the model OR
- If you are using the
tf.estimator
API then load the model in python, add an initializer op and re-save it as a new SavedModel
from aws-neuron-sdk.
I don't know if this is the same issue.
The poster is not exporting the SavedModel correctly --- it is not running anywhere. However, our SavedModel works well on TensorFlow Serving and in TensorFlow. Also, the high-level API appears to be very different - it is using tf.contrib.lookup
in the model; we are using tf.feature_column.embedding_column
in the Estimator; in particular, I believe Estimator does all of this for us as does the solution author:
NOTE: If you are using the high level libraries (such as tf.estimator) this should be the default
Why does our Estimator SavedModel work well on TensorFlow Serving, but not compile on Neuron TF Serving?
from aws-neuron-sdk.
Are we able to get some sample code (which does something minimal with the same problem), or can you share your code?
We currently run an optimization pass called convert_variables_to_constants
which may cancel some control edges. So your problem may be combination of your code and our optimizations. However we are making best guesses without some sample code.
The following script generates a model that contains a table lookup operator and it works fine with Neuron. Nonetheless the reported error can be triggered if the with statement with sess.graph.control_dependencies([table.initializer]):
is not there.
# table_lookup.py
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn
with tf.Session(graph=tf.Graph()) as sess:
keys_tensor = tf.constant([1, 2])
vals_tensor = tf.constant([3.0, 4.0])
input_tensor = tf.placeholder(tf.int32, [2])
feed_dict = {input_tensor.name: [1, 5]}
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(keys_tensor, vals_tensor), -1.0)
with sess.graph.control_dependencies([table.initializer]):
lookup = table.lookup(input_tensor)
tensor = lookup + 1.0
out = tensor - 2.0
print(sess.run(out, feed_dict))
model_dir = './temp_model'
shutil.rmtree(model_dir, ignore_errors=True)
inputs = {input_tensor.name: input_tensor}
outputs = {out.name: out}
tf.saved_model.simple_save(sess, model_dir, inputs, outputs)
model_dir_neuron = './temp_model_neuron'
shutil.rmtree(model_dir_neuron, ignore_errors=True)
tfn.saved_model.compile(model_dir, model_dir_neuron)
with tf.Session(graph=tf.Graph()) as sess:
meta_graph = tf.saved_model.loader.load(sess, ['serve'], model_dir_neuron)
input_tensor = sess.graph.get_tensor_by_name(input_tensor.name)
out = sess.graph.get_tensor_by_name(out.name)
print(sess.run(out, feed_dict))
Regular output:
(newenv) [test@cdd examples]$ python table_lookup.py
WARNING:tensorflow:From table_lookup.py:6: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2020-04-28 20:25:44.972150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-28 20:25:44.972190: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-04-28 20:25:44.972209: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cdd): /proc/driver/nvidia/version does not exist
2020-04-28 20:25:44.972539: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-28 20:25:44.983396: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300050000 Hz
2020-04-28 20:25:44.986656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x36a8970 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-28 20:25:44.986683: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From table_lookup.py:9: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
[ 2. -2.]
WARNING:tensorflow:From table_lookup.py:23: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2020-04-28 20:25:45.703566: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 6 ops of 5 different types in the graph that are not compiled by neuron-cc: LookupTableImportV2, LookupTableFindV2, HashTableV2, NoOp, Placeholder, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_794c60f0eaf84c4e with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 11
INFO:tensorflow:Number of operations after tf.neuron optimizations: 12
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:Successfully converted ./temp_model to ./temp_model_neuron
WARNING:tensorflow:From table_lookup.py:30: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
[ 2. -2.]
Output after taking out with statement:
2020-04-28 20:23:45.879770: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at lookup_table_op.cc:809 : Failed precondition: Table not initialized.
Traceback (most recent call last):
File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/local/home/test/bert_neuron/newenv/lib64/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.
[[{{node hash_table_Lookup/LookupTableFindV2}}]]
from aws-neuron-sdk.
Unfortunately, our model is not easily extracted into sample code. If I get some time this weekend, I'll try to work up a sample example.
Also, I noticed that this example doesn't use https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/feature_column/embedding_column .
from aws-neuron-sdk.
As discussed offline a fix for this problem has been created and will appear in a future release. Please re-open this issue if you have concerns
from aws-neuron-sdk.
Related Issues (20)
- Unable to run neuron device plugin on EKS with containerd only HOT 5
- `tensorflow-inference-neuron` is working on `inf1` but failing on `inf2` instances HOT 2
- neuron scheduler extension on EKS is failing HOT 2
- What is the best practice to scale? HOT 7
- BIRCodegen does not support broadcast patterns, but found one HOT 3
- Issue on page /containers/tutorials/tutorial-oci-hook.html HOT 1
- NRT 2.16 Beta: Simple Recv Graph Doesn't Compile HOT 1
- NRT 2.16 Beta: 12B Transformer Graph Compiler Failure on optlevel=2 HOT 3
- How to compile tensorflow model with input_signature? HOT 11
- neuronx-cc 2.12.46: Access pattern out of bound HOT 4
- [inferentia2] Model produces `NaN` when using Stable Diffusion with IP-Adapter HOT 2
- NeuronXCC fails to compile ViT model HOT 12
- Unable to trace a ViT HOT 2
- sentence-transformers-transformer not supported yet with neuron backend HOT 5
- Tracing LightGlue with PyTorch NeuronX HOT 3
- MLFlow integration with Lightning + Neuron HOT 2
- OP_REQUIRES failed at tpu_execute_op.cc:266 : INTERNAL: neuronx-cc compilation failed. while moving data to CPU. HOT 4
- Missing ami image: Neuron 2.16 - Deep Learning AMI Neuron PyTorch HOT 4
- Multi-worker HF training using trainer API result in too many graph compilations after saving checkpoint (transformers>=4.35) HOT 3
- mistralai/Mistral-7B-Instruct-v0.1 produces garbage output HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-neuron-sdk.