I want to finetune a detection model with int8-awareness and convert it to INT8 IR mod

Greetings, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to get int8 IR model (via int8 onnx model)? OpenVINO mo.py --data-type options only supports fp16,fp32 about nncf HOT 6 CLOSED

openvinotoolkit commented on May 28, 2024

How to get int8 IR model (via int8 onnx model)? OpenVINO mo.py --data-type options only supports fp16,fp32

from nncf.

Comments (6)

vshampor commented on May 28, 2024

Greetings, @nsk-lab !

The INT8 ONNX model differs from an FP32 ONNX model by the additional nodes specifying quantization in-model. No additional Model Optimizer parameters are required to handle such models - the INT8 IR will be produced automatically if you supply an INT8 ONNX as input.

from nncf.

nsk-lab commented on May 28, 2024

@vshampor Thank you for your reply.

I could train my model with int8 optimization and export to ONNX file (which include FakeQantize nodes).
Also I could convert it to IR model.

But when I load the IR model by openvino.inference_engine.ie_api.IECore.load_network, following error occurs

  File "ie_api.pyx", line 178, in openvino.inference_engine.ie_api.IECore.load_network
  File "ie_api.pyx", line 187, in openvino.inference_engine.ie_api.IECore.load_network
RuntimeError: Incorrect number of input edges for layer 560/variance/Fused_Add_

I attach visualization by netron.
It seems that the top left Add node is ok but bottom left Add node 560/variance/Fused_Add_ has wrong number of input even though the netron visualizer shows 2 input node as I expected.

Could you give me advice?
Is 560/variance/Fused_Add_ interpreted as having only 1 input? or some other problem?
any suggestions are welcome

from nncf.

vshampor commented on May 28, 2024

@nsk-lab does the problem persist in the FP32 IR, or does it only appear for the INT8 IR?

from nncf.

nsk-lab commented on May 28, 2024

When I trained without NNCF and exported to onnx, I could convert to IR and load the IR model (inference result is also ok).
I did not try a model trained by NNCF with FP32 (and converted to IR)

from nncf.

nsk-lab commented on May 28, 2024

@vshampor (I forget mention)

from nncf.

vshampor commented on May 28, 2024

@nsk-lab strange as it is, I'm inclined to believe that this is a bug in OpenVINO's Model Optimizer. Please raise an issue within the https://github.com/openvinotoolkit/openvino repository which owns Model Optimizer. In the meanwhile you can try using the "export_to_onnx_standard_ops": true option to produce an ONNX INT8 model which uses ONNX's standard QuantizeLinear/DequantizeLinear operators so that it can be loadable using onnxruntime. Successfully running INT8 inference through onnxruntime can give you additional confidence that NNCF's output ONNX files are OK.

from nncf.

Recommend Projects

How to get int8 IR model (via int8 onnx model)? OpenVINO mo.py --data-type options only supports fp16,fp32 about nncf HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs