I'm encountering a difficulty compiling a particular model to Neuron using the torch_n

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Internal Compiler error when compiling a model about aws-neuron-sdk HOT 4 CLOSED

alexandrekm commented on July 25, 2024

Internal Compiler error when compiling a model

from aws-neuron-sdk.

Comments (4)

aws-donkrets commented on July 25, 2024

HI alexandrekm - Since you can't share the actual model I'm responding based upon the error message you provided. The message states: invalid literal for int() with base 10: '0.01'

Is the model attempting to assign a floating point value (0.01) to an integer typed variable?

from aws-neuron-sdk.

alexandrekm commented on July 25, 2024

Hi @aws-donkrets Thanks for taking a look at this.

The model seems to converted from PyTorch to a SavedModel format successfully, but the subsequent neuron-cc compilation step fails.

Troubleshooting Steps:

Cast Analysis: I haven't observed any explicit string-to-integer conversion within the model itself. It's actually a string that contains the '0.01' value. This cast can come from one of the frameworks we use but since I do not have access to a debugger within neuron-cc (where this fails) I am not sure. Is this something that I can do myself?
Compilation Breakdown: The compilation process appears to be two-fold (is this correct?):
Stage 1: Converts the PyTorch model to a SavedModel (presumably using torch.jit.trace which succeeds on it's own).
Stage 2: Compiles the SavedModel using neuron-cc.

Reproducing the Error:

The failure can be isolated and reproduced by running just the second stage (neuron-cc compilation) with the specific commands extracted from the logs. Here's an example of the recreated command:

  /home/ubuntu/code/neuron-cc-inf1/1/graph_def.pb \
  --framework TENSORFLOW \
  --pipeline compile SaveTemps \
  --output /home/ubuntu/code/neuron-cc-inf1/1/graph_def.neff \
  --io-config '{"inputs": {"tensor.1:0": [[1, 3, 448, 768], "float32"]}, "outputs": "... (list of outputs) ..."}' \
  --verbose 35```

from aws-neuron-sdk.

jluntamazon commented on July 25, 2024

I haven’t observed any explicit string-to-integer conversion within the model itself. It’s actually a string that contains the ‘0.01’ value. This cast can come from one of the frameworks we use but since I do not have access to a debugger within neuron-cc (where this fails) I am not sure. Is this something that I can do myself?

I think the easiest thing you could try yourself is to come up with a minimal reproduction that does not contain any proprietary architectural information. The way you might approach this is to create a model with a single layer (instead of multiple) and then attempt to compile it like before. If this still causes an error, then remove submodules from the end of the layer until just a few operators can reproduce the failure. At this point you should be able to share a minimal set of operations to reproduce the issue.

Compilation Breakdown: The compilation process appears to be two-fold (is this correct?):
Stage 1: Converts the PyTorch model to a SavedModel (presumably using torch.jit.trace which succeeds on it’s own).
Stage 2: Compiles the SavedModel using neuron-cc.

Yes, exactly correct. Because there are a few stages to compilation, the easiest thing to do is come up with a minimal reproduction so we can determine exactly which component is failing.

from aws-neuron-sdk.

alexandrekm commented on July 25, 2024

I managed to understand what the issue was and disabling a part of the model solved it. Thanks for the help.

from aws-neuron-sdk.

Internal Compiler error when compiling a model about aws-neuron-sdk HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs