There are mentions of this capability in some docs + list of supported ops, but there'

Seems like the same issue: <a class="issue-link js-issue-link" data-error-text="Failed

Is it possible to compile ONNX models? about aws-neuron-sdk HOT 9 CLOSED

davidas1 commented on July 25, 2024 2

Is it possible to compile ONNX models?

from aws-neuron-sdk.

Comments (9)

aws-taylor commented on July 25, 2024 2

Thanks David,

I have opened an issue internally to track this error. We'll report back once we know more.

Regards,
Taylor

from aws-neuron-sdk.

aws-taylor commented on July 25, 2024

Hello David,

It is definitely possible to compile ONNX models.

The particular model you are attempting to compile uncovered a few bugs on our end.

Specifically:

If the version of ONNX used to train the model is different than the version of ONNX installed then a segfault may occur and you receive this useless error message. I have opened an internal ticket for this issue. Minimally, we will be improving our error messages in a future release.
If you omit the ‘—io-config’ flag when attempting to compile, then you likewise receive a useless error message. I have opened another internal ticket for this issue and we will likewise be improving our error messages in a future release.

Beyond these two issues, the particular pre-trained model mentioned may have problems. I’m not sure precisely from where you downloaded this model, but the resnet18v1 model from https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.onnx appears to have incorrectly named operators and other issues (#59). Since you mentioned you just picked a random model, I did not spend too much time investigating. If using this specific model is important, could you attach the .onnx model you were using to this issue?

That being said, here’s an example of compilation using resnet50 using the model at https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50.

neuron-cc compile --framework ONNX resnet50/model.onnx --output /tmp/onnx.neff --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

Notice how the inputs and outputs are specified. For this model, the github page above conveniently specifies the input and output names and dimensions. For a more general ONNX model, you may find the net_drawer.py script provided by ONNX useful for visualizing the network.

python3 /usr/local/lib/python3.6/dist-packages/onnx/tools/net_drawer.py --input resnet50/model.onnx --output model.dot --embed_docstring
dot -Tpng model.dot -o model.png

Hopefully this helps. Please let us know if you experience any further issues.

Regards,
Taylor

from aws-neuron-sdk.

davidas1 commented on July 25, 2024

Just got around to testing your suggested solution, and I get the same error message with the resnet50 models as well (I tested all models from the link you gave - opset3 up to opset9)

About ONNX versions - I have installed onnx 1.6.0 and onnxruntime 1.1.0
What else can I check in my environment? I'm running DLAMI 26, aws_neuron_tensorflow_p36 conda env, updated as suggested in the DLAMI with Neuron Release Notes

from aws-neuron-sdk.

aws-taylor commented on July 25, 2024

Hello David,

After some debugging, it appears the issue may be related to onnx 1.6.0. I was able to reproduce the issue when using onnx 1.6.0, but compilation works fine when downgrading to 1.5.0.

python3 -m pip install neuron-cc onnx=1.5.0
wget -q https://s3.amazonaws.com/download.onnx/models/opset_9/resnet50.tar.gz
tar xvf resnet50.tar.gz
neuron-cc compile \
  --framework ONNX resnet50/model.onnx \
  --output onnx.neff \
  --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

ls -la onnx.neff

I'll continue to investigate and try to figure out why onnx 1.6.0 is problematic.

-Taylor

from aws-neuron-sdk.

aws-taylor commented on July 25, 2024

Hello again David,

I have some new information - the issue appears to be related to how the Onnx 1.6 binary wheel was compiled and the version of libprotobuf used. Looking at a corefile, I see the SEGFAULT coming from:

x00007f1b44b60a35 in pybind11::enum_<onnx::OpSchema::SupportType>::value(char const*, onnx::OpSchema::SupportType, char const*) ()
   from /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so

Notably, this file has a dependency on libprotobuf, and I've found some other github issues that alude to this file being sensitive to protobuf version.

ldd /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so
...
libprotobuf.so.10 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.10 (0x00007f610b038000)

I'm still investigating, but in the mean time if you do a source install of onnx then you ought to be able to use 1.6.

python3 -m pip install --force-reinstall --no-binary onnx onnx

-Taylor

from aws-neuron-sdk.

aws-taylor commented on July 25, 2024

Seems like the same issue: schyun9212/maskrcnn-benchmark#3

from aws-neuron-sdk.

davidas1 commented on July 25, 2024

Thanks, that seems to solve the issue and enables me to run a sanity check of my setup.

The actual model I'm trying to compile includes an Upsample op (which looks to be supported, based on ONNX supported ops) + I assume you support opset 9, since Upsample was deprecated in newer ONNX versions.

For some reason the compilation now fails with:
Error message: check_upsampling() takes at least 4 positional arguments (1 given)

I've attached the log and a visualization of one of the Upsample modules in Netron, which is very simple:
neuroncc.log

If needed, I can open an issue with AWS support and share additional data (ONNX file, compiler artifacts, etc..)

from aws-neuron-sdk.

aws-zejdaj commented on July 25, 2024

David, could you please share the model with us? Full or a small version that contains the upsample operator. That will speed up our debug process.

Thank you,
Jindrich

from aws-neuron-sdk.

awsrjh commented on July 25, 2024

Closing

from aws-neuron-sdk.

Is it possible to compile ONNX models? about aws-neuron-sdk HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs