GithubHelp home page GithubHelp logo

Comments (4)

aws-donkrets avatar aws-donkrets commented on July 25, 2024

HI alexandrekm - Since you can't share the actual model I'm responding based upon the error message you provided. The message states: invalid literal for int() with base 10: '0.01'

Is the model attempting to assign a floating point value (0.01) to an integer typed variable?

from aws-neuron-sdk.

alexandrekm avatar alexandrekm commented on July 25, 2024

Hi @aws-donkrets Thanks for taking a look at this.

The model seems to converted from PyTorch to a SavedModel format successfully, but the subsequent neuron-cc compilation step fails.

Troubleshooting Steps:

  1. Cast Analysis: I haven't observed any explicit string-to-integer conversion within the model itself. It's actually a string that contains the '0.01' value. This cast can come from one of the frameworks we use but since I do not have access to a debugger within neuron-cc (where this fails) I am not sure. Is this something that I can do myself?
  2. Compilation Breakdown: The compilation process appears to be two-fold (is this correct?):
    Stage 1: Converts the PyTorch model to a SavedModel (presumably using torch.jit.trace which succeeds on it's own).
    Stage 2: Compiles the SavedModel using neuron-cc.

Reproducing the Error:

The failure can be isolated and reproduced by running just the second stage (neuron-cc compilation) with the specific commands extracted from the logs. Here's an example of the recreated command:

  /home/ubuntu/code/neuron-cc-inf1/1/graph_def.pb \
  --framework TENSORFLOW \
  --pipeline compile SaveTemps \
  --output /home/ubuntu/code/neuron-cc-inf1/1/graph_def.neff \
  --io-config '{"inputs": {"tensor.1:0": [[1, 3, 448, 768], "float32"]}, "outputs": "... (list of outputs) ..."}' \
  --verbose 35```

from aws-neuron-sdk.

jluntamazon avatar jluntamazon commented on July 25, 2024

I haven’t observed any explicit string-to-integer conversion within the model itself. It’s actually a string that contains the ‘0.01’ value. This cast can come from one of the frameworks we use but since I do not have access to a debugger within neuron-cc (where this fails) I am not sure. Is this something that I can do myself?

I think the easiest thing you could try yourself is to come up with a minimal reproduction that does not contain any proprietary architectural information. The way you might approach this is to create a model with a single layer (instead of multiple) and then attempt to compile it like before. If this still causes an error, then remove submodules from the end of the layer until just a few operators can reproduce the failure. At this point you should be able to share a minimal set of operations to reproduce the issue.

Compilation Breakdown: The compilation process appears to be two-fold (is this correct?):
Stage 1: Converts the PyTorch model to a SavedModel (presumably using torch.jit.trace which succeeds on it’s own).
Stage 2: Compiles the SavedModel using neuron-cc.

Yes, exactly correct. Because there are a few stages to compilation, the easiest thing to do is come up with a minimal reproduction so we can determine exactly which component is failing.

from aws-neuron-sdk.

alexandrekm avatar alexandrekm commented on July 25, 2024

I managed to understand what the issue was and disabling a part of the model solved it. Thanks for the help.

from aws-neuron-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.