Comments (9)
Thanks David,
I have opened an issue internally to track this error. We'll report back once we know more.
Regards,
Taylor
from aws-neuron-sdk.
Hello David,
It is definitely possible to compile ONNX models.
The particular model you are attempting to compile uncovered a few bugs on our end.
Specifically:
- If the version of ONNX used to train the model is different than the version of ONNX installed then a segfault may occur and you receive this useless error message. I have opened an internal ticket for this issue. Minimally, we will be improving our error messages in a future release.
- If you omit the ‘—io-config’ flag when attempting to compile, then you likewise receive a useless error message. I have opened another internal ticket for this issue and we will likewise be improving our error messages in a future release.
Beyond these two issues, the particular pre-trained model mentioned may have problems. I’m not sure precisely from where you downloaded this model, but the resnet18v1 model from https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.onnx appears to have incorrectly named operators and other issues (#59). Since you mentioned you just picked a random model, I did not spend too much time investigating. If using this specific model is important, could you attach the .onnx model you were using to this issue?
That being said, here’s an example of compilation using resnet50 using the model at https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50.
neuron-cc compile --framework ONNX resnet50/model.onnx --output /tmp/onnx.neff --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'
Notice how the inputs and outputs are specified. For this model, the github page above conveniently specifies the input and output names and dimensions. For a more general ONNX model, you may find the net_drawer.py script provided by ONNX useful for visualizing the network.
python3 /usr/local/lib/python3.6/dist-packages/onnx/tools/net_drawer.py --input resnet50/model.onnx --output model.dot --embed_docstring
dot -Tpng model.dot -o model.png
Hopefully this helps. Please let us know if you experience any further issues.
Regards,
Taylor
from aws-neuron-sdk.
Just got around to testing your suggested solution, and I get the same error message with the resnet50 models as well (I tested all models from the link you gave - opset3 up to opset9)
About ONNX versions - I have installed onnx 1.6.0 and onnxruntime 1.1.0
What else can I check in my environment? I'm running DLAMI 26, aws_neuron_tensorflow_p36 conda env, updated as suggested in the DLAMI with Neuron Release Notes
from aws-neuron-sdk.
Hello David,
After some debugging, it appears the issue may be related to onnx 1.6.0. I was able to reproduce the issue when using onnx 1.6.0, but compilation works fine when downgrading to 1.5.0.
python3 -m pip install neuron-cc onnx=1.5.0
wget -q https://s3.amazonaws.com/download.onnx/models/opset_9/resnet50.tar.gz
tar xvf resnet50.tar.gz
neuron-cc compile \
--framework ONNX resnet50/model.onnx \
--output onnx.neff \
--io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'
ls -la onnx.neff
I'll continue to investigate and try to figure out why onnx 1.6.0 is problematic.
-Taylor
from aws-neuron-sdk.
Hello again David,
I have some new information - the issue appears to be related to how the Onnx 1.6 binary wheel was compiled and the version of libprotobuf used. Looking at a corefile, I see the SEGFAULT coming from:
x00007f1b44b60a35 in pybind11::enum_<onnx::OpSchema::SupportType>::value(char const*, onnx::OpSchema::SupportType, char const*) ()
from /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so
Notably, this file has a dependency on libprotobuf, and I've found some other github issues that alude to this file being sensitive to protobuf version.
ldd /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so
...
libprotobuf.so.10 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.10 (0x00007f610b038000)
I'm still investigating, but in the mean time if you do a source install of onnx then you ought to be able to use 1.6.
python3 -m pip install --force-reinstall --no-binary onnx onnx
-Taylor
from aws-neuron-sdk.
Seems like the same issue: schyun9212/maskrcnn-benchmark#3
from aws-neuron-sdk.
Thanks, that seems to solve the issue and enables me to run a sanity check of my setup.
The actual model I'm trying to compile includes an Upsample op (which looks to be supported, based on ONNX supported ops) + I assume you support opset 9, since Upsample was deprecated in newer ONNX versions.
For some reason the compilation now fails with:
Error message: check_upsampling() takes at least 4 positional arguments (1 given)
I've attached the log and a visualization of one of the Upsample modules in Netron, which is very simple:
neuroncc.log
If needed, I can open an issue with AWS support and share additional data (ONNX file, compiler artifacts, etc..)
from aws-neuron-sdk.
David, could you please share the model with us? Full or a small version that contains the upsample operator. That will speed up our debug process.
Thank you,
Jindrich
from aws-neuron-sdk.
Closing
from aws-neuron-sdk.
Related Issues (20)
- Input tensor is not an XLA tensor: CPUFloatType while using crf.decode function HOT 4
- RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: PJRT_Client_Create: error condition nullptr != (args)->client->Error(): Init: error condition !(num_devices > 0): HOT 3
- BERT model implemented usiing TransformerEncoder returns all NaNs when running it torch==1.13.1 HOT 3
- PDF print on the home page is empty when the left side is collapsed HOT 1
- Quite largely increased latency with weights/neff separated HOT 1
- Input tensors not being read torch neuronx 2.1.2 HOT 4
- Is there something wrong in torch_neuronx.trace ? HOT 3
- support for aten::upsample_nearest3d HOT 1
- Is it possible to compile a model when no NeuronCores are available? HOT 2
- ECS inf1 neuron hook script fails HOT 2
- Issue on page /frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.html
- Model doesn't support task text-classification for the neuron backend
- DataParallel Support on CRF inference HOT 1
- neuron-distributed for inference HOT 1
- AWS NeuronX sdk installation HOT 2
- Issue on page /general/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.html HOT 1
- Missing example in the doc for speculative decoding beta support HOT 1
- Links broken on page /libraries/neuronx-distributed/tutorials/finetuning_llama2_7b_ptl.html
- [Runtime API] Missing `nrt_get_dmabuf_fd` Function HOT 4
- Inf1 BERT deployment using 1.13.1-neuron-py310-sdk2.19.0-ubuntu20.04
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aws-neuron-sdk.