GithubHelp home page GithubHelp logo

Comments (9)

aws-taylor avatar aws-taylor commented on July 25, 2024 2

Thanks David,

I have opened an issue internally to track this error. We'll report back once we know more.

Regards,
Taylor

from aws-neuron-sdk.

aws-taylor avatar aws-taylor commented on July 25, 2024

Hello David,

It is definitely possible to compile ONNX models.

The particular model you are attempting to compile uncovered a few bugs on our end.

Specifically:

  • If the version of ONNX used to train the model is different than the version of ONNX installed then a segfault may occur and you receive this useless error message. I have opened an internal ticket for this issue. Minimally, we will be improving our error messages in a future release.
  • If you omit the ‘—io-config’ flag when attempting to compile, then you likewise receive a useless error message. I have opened another internal ticket for this issue and we will likewise be improving our error messages in a future release.

Beyond these two issues, the particular pre-trained model mentioned may have problems. I’m not sure precisely from where you downloaded this model, but the resnet18v1 model from https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.onnx appears to have incorrectly named operators and other issues (#59). Since you mentioned you just picked a random model, I did not spend too much time investigating. If using this specific model is important, could you attach the .onnx model you were using to this issue?

That being said, here’s an example of compilation using resnet50 using the model at https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50.

neuron-cc compile --framework ONNX resnet50/model.onnx --output /tmp/onnx.neff --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

Notice how the inputs and outputs are specified. For this model, the github page above conveniently specifies the input and output names and dimensions. For a more general ONNX model, you may find the net_drawer.py script provided by ONNX useful for visualizing the network.

python3 /usr/local/lib/python3.6/dist-packages/onnx/tools/net_drawer.py --input resnet50/model.onnx --output model.dot --embed_docstring
dot -Tpng model.dot -o model.png

Hopefully this helps. Please let us know if you experience any further issues.

Regards,
Taylor

from aws-neuron-sdk.

davidas1 avatar davidas1 commented on July 25, 2024

Just got around to testing your suggested solution, and I get the same error message with the resnet50 models as well (I tested all models from the link you gave - opset3 up to opset9)

About ONNX versions - I have installed onnx 1.6.0 and onnxruntime 1.1.0
What else can I check in my environment? I'm running DLAMI 26, aws_neuron_tensorflow_p36 conda env, updated as suggested in the DLAMI with Neuron Release Notes

from aws-neuron-sdk.

aws-taylor avatar aws-taylor commented on July 25, 2024

Hello David,

After some debugging, it appears the issue may be related to onnx 1.6.0. I was able to reproduce the issue when using onnx 1.6.0, but compilation works fine when downgrading to 1.5.0.

python3 -m pip install neuron-cc onnx=1.5.0
wget -q https://s3.amazonaws.com/download.onnx/models/opset_9/resnet50.tar.gz
tar xvf resnet50.tar.gz
neuron-cc compile \
  --framework ONNX resnet50/model.onnx \
  --output onnx.neff \
  --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

ls -la onnx.neff

I'll continue to investigate and try to figure out why onnx 1.6.0 is problematic.

-Taylor

from aws-neuron-sdk.

aws-taylor avatar aws-taylor commented on July 25, 2024

Hello again David,

I have some new information - the issue appears to be related to how the Onnx 1.6 binary wheel was compiled and the version of libprotobuf used. Looking at a corefile, I see the SEGFAULT coming from:

x00007f1b44b60a35 in pybind11::enum_<onnx::OpSchema::SupportType>::value(char const*, onnx::OpSchema::SupportType, char const*) ()
   from /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so

Notably, this file has a dependency on libprotobuf, and I've found some other github issues that alude to this file being sensitive to protobuf version.

ldd /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so
...
libprotobuf.so.10 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.10 (0x00007f610b038000)

I'm still investigating, but in the mean time if you do a source install of onnx then you ought to be able to use 1.6.

python3 -m pip install --force-reinstall --no-binary onnx onnx

-Taylor

from aws-neuron-sdk.

aws-taylor avatar aws-taylor commented on July 25, 2024

Seems like the same issue: schyun9212/maskrcnn-benchmark#3

from aws-neuron-sdk.

davidas1 avatar davidas1 commented on July 25, 2024

Thanks, that seems to solve the issue and enables me to run a sanity check of my setup.

The actual model I'm trying to compile includes an Upsample op (which looks to be supported, based on ONNX supported ops) + I assume you support opset 9, since Upsample was deprecated in newer ONNX versions.

For some reason the compilation now fails with:
Error message: check_upsampling() takes at least 4 positional arguments (1 given)

I've attached the log and a visualization of one of the Upsample modules in Netron, which is very simple:
neuroncc.log
onnx_upsample

If needed, I can open an issue with AWS support and share additional data (ONNX file, compiler artifacts, etc..)

from aws-neuron-sdk.

aws-zejdaj avatar aws-zejdaj commented on July 25, 2024

David, could you please share the model with us? Full or a small version that contains the upsample operator. That will speed up our debug process.

Thank you,
Jindrich

from aws-neuron-sdk.

awsrjh avatar awsrjh commented on July 25, 2024

Closing

from aws-neuron-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.