GithubHelp home page GithubHelp logo

onnxconverter-common's Introduction

Linux Windows
Build Status Build Status

Introduction

The onnxconverter-common package provides common functions and utilities for use in converters from various AI frameworks to ONNX. It also enables the different converters to work together to convert a model from mixed frameworks, like a scikit-learn pipeline embedding a xgboost model.

License

MIT License

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

onnxconverter-common's People

Contributors

atinfinity avatar bowenbao avatar honzasterba avatar interesaaat avatar janjagusch avatar jcwchen avatar jiafatom avatar jtanios avatar jwfromm avatar microsoftopensource avatar msftgits avatar snnn avatar stevenlix avatar synapticarbors avatar tomwildenhain-microsoft avatar vinitra avatar wenbingl avatar xadupre avatar xiaowuhu avatar yetingqiaqia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onnxconverter-common's Issues

Error: onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input 'TopK_111_input_cast_0' of node:

Hi. I try to convert superpoint float32 model to float16 using following code.

import onnx
from onnxconverter_common.float16 import convert_float_to_float16


if __name__=="__main__":
    WIDTH = 512
    HEIGHT = 256
    MAX_KEY = 200

    onnx_model32 = onnx.load(f"super_point_p{MAX_KEY}_w{WIDTH}_h{HEIGHT}_nojit.onnx")
    onnx_model16 = convert_float_to_float16(onnx_model32)  
    onnx.save(onnx_model16, f"super_point_p{MAX_KEY}_w{WIDTH}_h{HEIGHT}_fp16_nojit.onnx")
    
    onnx.checker.check_model(onnx_model16)

It successfully convert to fp16 model without any messages but when i check the converted fp16 model that cause following error.

Traceback (most recent call last):
  File "/home/seungtaek/superpoint/convertonnxfloat2float16.py", line 31, in <module>
    onnx.checker.check_model(onnx_model16)
  File "/home/seungtaek/anaconda3/envs/pytorch_build/lib/python3.9/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input 'TopK_111_input_cast_0' of node: 
name: TopK_111 OpType: TopK
 is not output of any previous nodes.

I linked above two model one is fp32 the other is converted fp16.
download models

How can I solve this problem?

Thanks.

Security Development Lifecycle review for 2022-06

The security & compliance system is called 1CS. We are preparing a 1CS review. A major part of it is called “Security Development Lifecycle (SDL)”. You can find the SDL requirements at: https://msdata.visualstudio.com/DefaultCollection/Vienna/_compliance/product/3db74f93-ab79-972f-848a-2f402f2497cf/assessments/068b5efe-fd00-52cc-bef4-5676de0a4782 . Now we ask you please open the link, click each work item, carefully read the description. On the top of each description, you will see text like:

       [ ] microsoft/onnxruntime-extensions
       [ ] microsoft/onnxconverter-common
       [ ] microsoft/OLive
       [ ] microsoft/onnxruntime

When the work item is finished for your repo, or if you think it doesn’t apply to you, mark it as done by entering an x in the brackets. If you have any questions or need help, please let @snnn know. The goal is to finish all of them by 6/30. If you need more time, please let us know ahead and put a comment there with your ETA. Before you start, please also read the FAQ part of Software Testing and SDL | Executive Order Requirements.

1.12.2 version released?

Hi:
I can install 1.12.2 version by
pip install onnxconverter-common==1.12.2
But I note that on the home page I can only see the release information of version 1.9.0,
I cannot found any tags/branch information about 1.12.2 except commit.
I am curious about the release process of onnxconverter-coommon
And I relly want to known compared to the 1.9 version. what is eleases note of 1.12.2

Thank You!

convert_float_to_float16() produces a model that causes ValidationError with onnx.checker.check_model()

With ONNX 1.13.1, a fp32 model passes onnx.checker.check_model() without warnings or errors,

import onnx
onnx_model = onnx.load("/models/ResNet50.onnx")
onnx.checker.check_model(onnx_model)

but when converted into fp16 onnx.checker.check_model()

from onnxconverter_common import float16
onnx_model_fp16 = float16.convert_float_to_float16(onnx_model, keep_io_types = True)
import warnings
warnings.filterwarnings("ignore")
onnx.checker.check_model(onnx_model_fp16)

triggers ValidationError

ValidationError: Nodes in a graph must be topologically sorted, however input 'graph_input_cast_0' of node: name: StatefulPartitionedCall/resnet50/conv1_conv/Conv2D__6 OpType: Transpose is not output of any previous nodes.

The ResNet50.onnx model is attached (as a multiple disk archive due to maximum size restriction, rename ResNet50.z01.zip into ResNet50.z01, ResNet50.z02.zip into ResNet50.z02, ResNet50.z03.zip into ResNet50.z03).

There is a separate issue microsoft/onnxruntime#15494 about onnxruntime catastrophic failure when attempting to load the fp16 model that does not pass validation.

ResNet50.zip
ResNet50.z01.zip
ResNet50.z02.zip
ResNet50.z03.zip

Test test_to_onnx_type fails in version 1.6.0

One of the tests fail in version 1.6.0. Used versions are
platform linux -- Python 3.7.3, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /usr/bin/python3
The output of the failed test is:
self = <test_onnx.TestTypes testMethod=test_to_onnx_type>

 def test_to_onnx_type(self):
     dt = FloatTensorType((1, 5))
     assert str(dt) == 'FloatTensorType(shape=(1, 5))'
     onx = dt.to_onnx_type()
     assert "dim_value: 5" in str(onx)
     tt = onx.tensor_type
     assert "dim_value: 5" in str(tt)
     assert tt.elem_type == 1
  o = onx.sequence_type

E AttributeError: 'TypeProto' object has no attribute 'sequence_type'

convert to FP16 generate orphan and self-recurring nodes

Hi,
Polygraphy used onnxmltools.utils.float16_converter.convert_float_to_float16 to convert model to Fp16. However, I noticed that it generated some orphan cast nodes. I was wondering if anyone has encountered a similar issue or has any insights on how to address it? Any help or advice would be greatly appreciated! Thanks a bunch!

image

Repro:

   install latest onnxconverter-common
   polygraphy convert -o debug2.onnx --fp-to-fp16 debug.onnx

debug_onnx.zip

fp32 to fp16

I have a .onnx file for a pre-trained model and I am trying to convert it from fp32 to fp16, I used these lines of code to do this thing:

from onnxmltools.utils.float16_converter import convert_float_to_float16
from onnxmltools.utils import load_model, save_model

onnx_model = load_model('model.onnx')
new_onnx_model = convert_float_to_float16(onnx_model)
save_model(new_onnx_model, 'new_model.onnx')

but it seems the convolution kernels still have fp32 weights, How can I convert all the parameters from fp32 to fp16?

Extra Cast nodes causes overflow in onnxruntime 1.17

Some user reported that extra Cast nodes after running auto mixed precision conversion. See related issue here:
microsoft/onnxruntime#19437

ORT 1.17 has changed the behavior of Cast node removal, and no longer remove down cast (like float32 to float16 Cast). Please fix the fp16 conversion script to avoid adding extra Cast nodes. For example, the cast nodes in fp32 ---> Cast (to=fp16) --> Cast (to=fp32)--> fp32 can be removed with some post processing.

resize op convert to FP16 fail

There is a model from tensorflow2onnx, the FP32 model can run successfully.

Then use float16_converter.convert_float_to_float16(onnx_model, keep_io_types=True) convert to FP16 model.
But the FP16 model can't create session, error:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from model_fp16.onnx failed:Node (Resize__846) Op (Resize) [ShapeInferenceError] Either sizes or scales must be provided, but not both of them

The problem is similar with #266.
How to solve it ?

auto_convert_mixed_precision() doesn't support >2GB model

Describe the bug
Hi ORT team,

We use mixed_precision to accelerate the model inference speed. It works fine on many models.
But when model size>2GB, this auto_convert_mixed_precision() failed with ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 8002093824 error, see below error message:

image

Could you help check if there anything wrong with this auto_convert_mixed_precision() function?

Urgency
Mixed_precision is vital important on speedup, especially on large models. If auto_convert_mixed_precision() function couldn't be applied to >2GB models, I feel its ability is discounted a lot.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 18.04
  • ONNX Runtime installed from (source or binary): binary
  • ONNX Runtime version: 1.10
  • Python version: 3.6
  • Visual Studio version (if applicable): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: CUDA 11.5, CuDNN 8.3.1
  • GPU model and memory: V100, 16GB
  • I also tested on CPU, it failed with the same error message.

To Reproduce

  • Test code:
def convert_float32_to_float16(fp32_model_path, fp16_model_path):
    from onnxmltools.utils.float16_converter import convert_float_to_float16_model_path
    from onnxmltools.utils import save_model

    new_onnx_model = convert_float_to_float16_model_path(fp32_model_path, keep_io_types=True)
    save_model(new_onnx_model, fp16_model_path)

def convert_float32_to_mixed_precision(fp32_model_path, mixed_precision_model_path, ort_inputs):
    from onnxconverter_common import auto_mixed_precision
    import onnx

    model = onnx.load(fp32_model_path)

    import numpy as np
    np.random.seed(123)

    # Could also use rtol/atol attributes directly instead of this
    def validate(res1, res2):
        for r1, r2 in zip(res1, res2):
            if not np.allclose(r1, r2, rtol=0.01, atol=0.001):
                return False
        return True

    model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, ort_inputs, validate, keep_io_types=True)
    onnx.save(model_fp16, mixed_precision_model_path)

def test(onnx_model_path, ort_inputs, ort_output_names):
    import numpy as np
    import time
    np.random.seed(123)
    #Load ort model
    import onnxruntime as ort
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_options.intra_op_num_threads = 0
    sess = ort.InferenceSession(onnx_model_path, sess_options, providers=['CUDAExecutionProvider'])
    #sess = ort.InferenceSession(onnx_model_path, sess_options, providers=['CPUExecutionProvider'])

    #warm-up run
    warm_up_start_stamp = time.time()
    onnx_outs = sess.run(ort_output_names, ort_inputs)[0]
    print(f"onnx_outs of warm-up:", onnx_outs)
    print(f"It takes {time.time()-warm_up_start_stamp} to finish warm-up.\n")

    start_stamp = time.time()
    num_batches = 0
    for i in range(num_batches):
        print(f"batch id: {i}")
        onnx_outs = sess.run(ort_output_names, ort_inputs)
        print(f"onnx_outs:", onnx_outs)
        print(f"{i}th batch finished successfully. ")
    print(f"It takes {time.time()-start_stamp} to finish {num_batches} batches.\n")


fp32_model_path = './model/8_fp32/graph.onnx'
#fp16_model_path = './model/8_fp16/graph_fp16.onnx'
#convert_float32_to_float16(fp32_model_path, fp16_model_path)
mixed_precision_model_path = './model/8_mixed_precision/graph_mixed_precision.onnx'


ort_inputs = {
    "input_ids1":[
        [1,14297,107121,50708,48,7360,2770,1068,50708,48,40259,36,6752,2771,19239,18035,62618,91,61665,50708,48,14297,107121,269,50708,48,3465,102,62,280,50708,48,40259,1916,3705,1226,4026,25997,6009,50708,48,79,8,71,11,14297,107121,50708,48,460,325,5683,50708,48,1272,583,12321,4722,8,484,11,40259,91,8623,2,14297,107121,34006,11743,227,50708,48,16007,38851,38,4,290,50708,48,36053,50708,48,3153,5,13953,1226,3482,40259,6,13513,50708,48,14297,107121,50708,48,7360,2770,1068,50708,48,40259,36,6752,2771,19239,18035,62618,91,61665,50708,48,79,13,11,14297,107121,34006,11743,227,50708,48,36053,50708,48,3153,5,2],
        [1,2159,1532,228,50708,48,7862,1005,50708,48,8085,2379,37820,119,1314,2902,50708,48,2159,1532,50708,48,884,8848,50708,48,50708,48,2159,485,7862,228,50708,48,7862,1005,50708,48,1314,8085,2379,37820,9,30166,91,7922,12773,4,1758,50708,48,2159,4681,50708,48,7862,1005,50708,48,23,1314,8085,1065,2379,2,697,750,739,50708,48,65888,4202,12996,4,290,50708,48,7862,1005,50708,48,8085,2379,37820,119,1314,2902,50708,48,2159,168,7862,739,50708,48,7862,1005,50708,48,1314,8085,2379,3493,7,20640,2379,677,50708,48,2159,7862,4681,739,50708,48,7862,1005,50708,48,1314,8085,2379,37820,9,30166,91,7922,2]],
    "attention_mask1":[
        [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
        [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]],
    "input_ids2":[
        [1,18347,4,23548,5818,2639,4,18347,42,1805,331,14816,53768,70591,16,8822,4,18347,44,4,3859,140,64,409,4,18347,44,73,57,603,8641,550,4,115,13,160,19,18347,4,353,494,4817,4,8,547,40,1366,13,160,19,18347,16,125,42,1360,594,58,4,18347,7824,4,19053,4,430,2,18347,7824,4,1767,5,10,4,19053,4,430,8,6534,57,1077,18347,6,7824,4,18347,4,23548,5818,2639,4,18347,42,1805,331,14816,53768,70591,16,8822,4,115,24,19,18347,7824,4,19053,4,430,8,6534,57,1077,18347,6,7824,4,353,20,18347,4,365,494,2355,4,18347,6,8548,2],
        [1,187,665,167,4,9957,550,4,187,418,2189,770,40,116,4,187,665,4,583,183,4,4,187,307,9957,167,4,9957,550,4,40,187,418,2189,20,11543,16,488,776,5,246,4,187,2189,4,9957,550,4,8,40,187,48,418,2189,12,125,57,36550,2,1455,418,744,4,17941,5,10,4,9957,550,4,187,418,2189,770,40,116,4,187,48,9957,744,4,9957,550,4,40,187,418,744,9,1398,418,58,4,187,9957,2189,744,4,9957,550,4,40,187,418,2189,20,11543,16,488,776,5,246,4,187,418,744,4,9957,550,4,40,187,418,744,9,1398,2]],
    "attention_mask2":[
        [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
        [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]],
    "labels":[
        [-1,-1,-1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1],
        [-1,-1,-1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1]]
    }

ort_output_names = ['prediction0','prediction1','prediction2','prediction3']

print("Test fp32 model: ")
test(fp32_model_path, ort_inputs, ort_output_names)
print("f32 model test finished.")

print("Convert to mixed precision starts...")
convert_float32_to_mixed_precision(fp32_model_path, mixed_precision_model_path, ort_inputs)
print("Conversion finished.")

print("Test mixed precision model: ")
test(mixed_precision_model_path, ort_inputs, ort_output_names)
print("Mixed precision model test finished.")

Issues when converting model to float16

Some issues observed when converting model to float16:

  • When converting model with keep_io_types set to True, onnx.checker will complain:
    onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted,
    
    The fix would be fairly simple, inserting the Cast operator for input tensors to the beginning of node list.
  • Some operators (e.g., Resize, CumSum) support float16, however it's listed in DEFAULT_OP_BLOCK_LIST:
    • Should we update this list or auto generated from onnx opschema?
    • For operator like Resize, some of the optional inputs doesn't support float16. We need exclude fp16 conversion to them.

Integrate with ONNX 1.16.0 release branch

We are releasing ONNX 1.16.0. A release branch is created (https://github.com/onnx/onnx/tree/rel-1.16.0). The planned release date is March 25, 2024. Release candidates are also available from TestPyPI: pip install -i https://test.pypi.org/simple/ --pre onnx

It is important to integrate ONNX release branch ASAP so that any issues and incompatibilities can be detected and resolved before the ONNX release.

Key updates:

In case a bug in ONNX is detected during integration of ONNX 1.16.0, please open a ONNX Bug Report and tag ONNX Release Manager @cjvolzka so that the bug is fixed in the ONNX release branch.

Bad opitmization for MergePadConvOptimizer

I have a maybe strange architecture that pipe two padding in the same convolution.
A small example can be create with keras using the simple code.

def padding_network_v2(input_layer, nf=64):
    temp = keras.layers.Conv2D(filters=nf, kernel_size=(3, 3), strides=(1, 1), padding="same")(input_layer)
    pad1 = keras.layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(temp)
    pad2 = keras.layers.ZeroPadding2D(padding=((0, 1), (0, 1)))(temp)
    conv = keras.layers.Conv2D(filters=nf, kernel_size=(3, 3), strides=(1, 1), padding="valid")
    output = keras.layers.concatenate([conv(pad1), conv(pad2)], axis=3)

    return output

image

During optimization the function MergePadConvOptimizer optimize one of the convolution and accidentally remove the second.

The optimization code work correctly when padding operation just follow the input nodes for example :

def padding_network(input_layer, nf=64):
    pad1 = keras.layers.ZeroPadding2D(padding=((1,0), (1,0)))(input_layer)
    pad2 = keras.layers.ZeroPadding2D(padding=((0,1), (0,1)))(input_layer)
    conv = keras.layers.Conv2D(filters=nf, kernel_size=(3,3), strides=(2,2), padding="valid")
    output = keras.layers.concatenate([conv(pad1), conv(pad2)], axis=3)
    return output

We think the issue come from the delete_node_nto1. We look in the function but with the complexity of the function we are not sure to be able to fix it without breaking it.

The code to recreate the issue can be found here :
https://gist.github.com/edmBernard/0f899f76c5b234ec0189ce327a423900

Publish source distribution to pypi

Currently it looks like only a wheel is published for this package to pypi. It would be useful to also publish the source in tar.gz format.

Converting model fp32 to fp16 with auto_mixed_precision_model_path from gets NaN

Hi, I was trying to convert a Bert-like model from fp32 to fp16 using the auto_mixed_precision_model_path script, I have code like below:

import fire
import onnx

import numpy as np

from onnxconverter_common.auto_mixed_precision_model_path import (
    auto_convert_mixed_precision_model_path,
)


def get_feed_tensor(name: str, max_seq_len: int):
    if name in (
        'token_type_ids',
        'segment_ids',
        'position_ids',
    ):
        return np.zeros((1, max_seq_len), dtype=np.int64)
    elif name in (
        'input_ids',
    ):
        return np.random.randint(0, 1000, (1, max_seq_len), dtype=np.int64)
    elif name in (
        'attention_mask',
    ):
        return np.ones((1, max_seq_len), dtype=np.int64)
    elif name in (
        'bbox',
    ):
        ele = np.array([0, 0, 1, 1], dtype=np.int64)
        return np.resize(ele, (1, max_seq_len, 4))
    else:
        raise NotImplementedError()


def main(
    src_model_path: str,
    tgt_model_path: str,
    max_seq_len: int = 1024,
):
    input_feed = {}
    model = onnx.load(src_model_path)
    for input in model.graph.input:
        input_feed.update({ input.name: get_feed_tensor(input.name, max_seq_len) })

    def validate_fn(res1, res2):
        for r1, r2 in zip(res1, res2):
            if not np.allclose(r1, r2, rtol=1e-2, atol=1e-2):
                return False
        return True

    auto_convert_mixed_precision_model_path(
        src_model_path,
        input_feed,
        tgt_model_path,
        provider=['CUDAExecutionProvider'],
        customized_validate_func=validate_fn,
        keep_io_types=True,
        verbose=False,
    )


if __name__ == '__main__':
    fire.Fire(main)

During conversion, I noticed some warnings about truncation, and then the validate_fn fails because res2 is all NaN, I have a screenshot showing this:

image

Do you have any idea how I can fix this?

Failing tests

Tests fail with onnxruntime 1.12.1 built from source. Seems like InferenceSession needds to be instantiated with providers.

======================================================================
ERROR: test_auto_mixed_precision (test_auto_mixed_precision.AutoFloat16Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_auto_mixed_precision.py", line 43, in test_auto_mixed_precision
    expected = transpose_n_matmul(m1)
  File "/build/source/onnxconverter_common/onnx_fx.py", line 274, in __call__
    return Graph.inference_runtime(self.oxml, kwargs)
  File "/build/source/tests/test_onnxfx.py", line 12, in _ort_inference
    sess = _ort.InferenceSession(mdl.SerializeToString())
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

======================================================================
ERROR: test_auto_mixed_precision_model_path (test_auto_mixed_precision_model_path.AutoFloat16Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_auto_mixed_precision_model_path.py", line 32, in test_auto_mixed_precision_model_path
    expected = _ort_inference(model32_path, {'modelInput': input_x})
  File "/build/source/tests/test_auto_mixed_precision_model_path.py", line 13, in _ort_inference
    sess = _ort.InferenceSession(model_path)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

======================================================================
ERROR: test_float16 (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_float16
Traceback (most recent call last):
  File "/nix/store/sz0j8k8ljh7y8qgyfxgqb3ws11bcy4gs-python3-3.10.6/lib/python3.10/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/nix/store/sz0j8k8ljh7y8qgyfxgqb3ws11bcy4gs-python3-3.10.6/lib/python3.10/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/build/source/tests/test_float16.py", line 7, in <module>
    import onnxmltools
ModuleNotFoundError: No module named 'onnxmltools'


======================================================================
ERROR: test_onnx2py (test_onnx2py.Onnx2PyTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_onnx2py.py", line 27, in test_onnx2py
    sess1 = _ort.InferenceSession(onnx_model.SerializeToString())
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

======================================================================
ERROR: test_onnx2py (test_onnx2py.Onnx2PyTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_onnx2py.py", line 15, in tearDown
    for f in os.listdir(tmp_path):
FileNotFoundError: [Errno 2] No such file or directory: '/build/source/tests/temp'

======================================================================
ERROR: test_core (test_onnxfx.ONNXFunctionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_onnxfx.py", line 34, in test_core
    np.allclose(g([2.0], [-5.0]), np.array([2.0])))
  File "/build/source/onnxconverter_common/onnx_fx.py", line 274, in __call__
    return Graph.inference_runtime(self.oxml, kwargs)
  File "/build/source/tests/test_onnxfx.py", line 12, in _ort_inference
    sess = _ort.InferenceSession(mdl.SerializeToString())
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

======================================================================
ERROR: test_loop (test_onnxfx.ONNXFunctionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_onnxfx.py", line 58, in test_loop
    loop_test(np.array([16], dtype=np.int64))[2][4], 3.0)
  File "/build/source/onnxconverter_common/onnx_fx.py", line 274, in __call__
    return Graph.inference_runtime(self.oxml, kwargs)
  File "/build/source/tests/test_onnxfx.py", line 12, in _ort_inference
    sess = _ort.InferenceSession(mdl.SerializeToString())
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

======================================================================
ERROR: test_matmul_opt (test_onnxfx.ONNXFunctionTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/source/tests/test_onnxfx.py", line 75, in test_matmul_opt
    expected = transpose_n_matmul(m1)
  File "/build/source/onnxconverter_common/onnx_fx.py", line 274, in __call__
    return Graph.inference_runtime(self.oxml, kwargs)
  File "/build/source/tests/test_onnxfx.py", line 12, in _ort_inference
    sess = _ort.InferenceSession(mdl.SerializeToString())
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/nix/store/vy5fnk87f62mzqaaw3l208psi7lvpfns-onnxruntime-1.12.1-python/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 375, in _create_inference_session
    raise ValueError(
ValueError: This ORT build has ['DnnlExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['DnnlExecutionProvider', 'CPUExecutionProvider'], ...)

----------------------------------------------------------------------
Ran 27 tests in 1.060s

When needed I can provide the exact set of packages/build parameters as this was built using Nix.

How to convert a model to mixed precision?

Hey guys,

I hope you are doing great. I am trying to convert a model to mixed precision, unfortunately, the doc is wrong and there is no clear instruction or working code that I can use.

The doc just doesn't work see #251

I've also tried onnxconverter_common.auto_mixed_precision_model_path.auto_convert_mixed_precision_model_path not even sure what changes compared to onnxconverter_common.auto_mixed_precision.auto_convert_mixed_precision but it doesn't work.

It took more than 15m on a quite powerful machine, this seems very odd since PyTorch takes like a fraction of a second, may I ask you what is going on internally?

It outputs a lot of garbage, sees a txt file with the logs and doesn't perform the conversion. It's more than 100MB because the code outputs a ton of weird stuff. After a while, it crashed with just too many outputs.

I am just trying to convert a model to mixed precision, this seems like a fairly common task. (Could you guys invest a little bit of time in testing and writing doc? This would help the community enormously, otherwise, most people will never be able to use the software in the current undocumented/broken state.

I am working on an article about exporting a torch model to onnx in mixed precision. My code

import onnxruntime as ort
import torch
from torchvision.models import ConvNeXt_Small_Weights, convnext_small
from onnxconverter_common.auto_mixed_precision_model_path import (
    auto_convert_mixed_precision_model_path,
)
import onnx

model_name = "model.onnx"
model = convnext_small(ConvNeXt_Small_Weights.IMAGENET1K_V1).eval().cuda()

x = torch.randn(1, 3, 224, 224, device="cuda")
# Export the model
torch.onnx.export(
    model,  # model being run
    x,  # model input (or a tuple for multiple inputs)
    model_name,  # where to save the model (can be a file or file-like object)
    opset_version=16,
    export_params=True,  # store the trained parameter weights inside the model file
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["image"],  # the model's input names
    output_names=["output"],  # the model's output names
    dynamic_axes={
        "image": {0: "batch_size"},  # variable length axes
        "output": {0: "batch_size"},
    },
)

# model = onnx.load("model.onnx")
# model_fp16 = auto_convert_mixed_precision(model, { "image" : x.cpu().numpy() }, None, rtol=0.01, atol=0.001, keep_io_types=True)
auto_convert_mixed_precision_model_path("./model.onnx", { "image" : x.cpu().numpy() }, "./model_fp16.onnx", provider=["CUDAExecutionProvider"], rtol=0.01, atol=0.001, keep_io_types=True, verbose=False)
# onnx.save(model_fp16, "model_fp16.onnx")
# let's check
print("Checking")
x = torch.randn(1, 3, 224, 224, device="cuda", dtype=torch.float16)
ort_session = ort.InferenceSession("model_fp16.onnx", providers=["CUDAExecutionProvider"])
outputs = ort_session.run(None, {"image": x.cpu().numpy()})
print(outputs[0].shape, outputs[0].dtype)

Thanks

add NOTICE file to onnxconverter-common

Xiaowu,

This document has some instructions for manually creating a NOTICE https://docs.opensource.microsoft.com/using/required-notice-template/ I’ll see if I can put a link to that page in a place where it is easier to stumble upon.

If it would help to see completed examples, you can check out some of the most popular repositories https://github.com/microsoft?q=&type=all&language=&sort=stargazers .

Justin

=============

From: Xiaowu Hu <[email protected]>
Sent: Monday, July 11, 2022 1:36 AM
To: Open Source at Microsoft <[email protected]>
Cc: Gary Miguel <[email protected]>; Faith Xu <[email protected]>
Subject: NOTICE file for OSS

Hi,

We have an OSS project here: https://github.com/microsoft/onnxconverter-common
So far we don’t have NOTICE file according to this link: NOTICE Q&A | Docs - Microsoft Open Source
Do we need to add one into our project above? Where can I get a template or copy/paste from somewhere else?

Thanks,
Xiaowu

StrictVersion is deprecated

due to the StrictVersion will be deprecated and it cannot recognize version such as "1.12.0rc5", so changed to use verlib.NormalizedVersion

tensorflow max_pool_with_argmax op does not return indices

Hello, I think there is a bug with the tensorflow max_pool_with_argmax op.
When running the op with onnxruntime, I find that the op returns the pooled values twice instead of the pooled values and the pooled indices.
I have already opened an issue on the keras-onnx repo and I don't know the onnx codes in detail but it seems to me that it is here that the multi-output behavior of an op is managed and the bug is related to this so the problem might be here.

Here is a code reproducing the bug:

import tensorflow as tf
import numpy as np

class Bug(tf.keras.Model):   
    def __init__(self):
        super(Bug, self).__init__()
    def call(self, inputs):
        v, i = tf.nn.max_pool_with_argmax(inputs, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        return tf.cast(i, tf.float32)

b = Bug()
np.random.seed(0)
inp = np.random.uniform(0, 10, (2, 4, 4, 3))
keras_indices = b(inp)

import onnx
import keras2onnx

onnx_model = keras2onnx.convert_keras(b, target_opset=12)
onnx.save_model(onnx_model, "test.onnx")

import onnxruntime as rt

session = rt.InferenceSession("test.onnx")
input, output = session.get_inputs()[0], session.get_outputs()[0]
onnx_indices = session.run([output.name], {input.name: inp.astype(np.float32)})[0]

print(inp[0,:,:,2])
print(onnx_indices[0,:,:,2])

In the end it produces:

[[6.02763376 6.45894113 9.63662761 5.2889492 ]
 [0.71036058 8.32619846 9.78618342 7.80529176]
 [1.43353287 4.1466194  4.56150332 6.17635497]
 [9.43748079 4.37031954 6.66766715 1.28926298]]
[[8.326199  9.786183 ]
 [9.437481  6.6676674]]

Note that I had to cast the resulting indices as float otherwise I get the following error during inference:

FAIL : Load model from test.onnx failed:Type Error: Type (tensor(int64)) of output arg (Identity:0) of node (bug/MaxPoolWithArgmax_transpose_2_1) does not match expected type (tensor(float)).

This was obtained on ubuntu 20.04, with tensorflow 2.4.1, onnx 1.8.1, keras2onnx 1.8.0, onnxruntime 1.7.0, onnxconverter-common 1.8.0.

protobuf version

Is there a reason for protobuf version to be fixed to 3.20.2 ? I believe onnx now supports protobuf >= 4, so it would be nice if onnxconverter could be updated as well.

Inference issue after convert_float_to_float16

Describe the bug

I tried to use mixed precision on model inception_v2.onnx and vgg19.onnx on GPU machine.
At first, I use convert_float_to_float16_model_path with keep_io_types=False, but the inference became even slower.
Here is my script.

  • code for conversion
from onnxconverter_common import convert_float_to_float16_model_path
model = "inception_v2.onnx"
new_onnx_model = convert_float_to_float16_model_path(model, keep_io_types=False)
file_path = "new_inception_v2.onnx"
with open(file_path, 'wb') as f:
    f.write(new_onnx_model.SerializeToString())
  • code for inference benchmark
import numpy as np
import time
import onnxruntime as ort
def benchmark(model_path):
    session = ort.InferenceSession(model_path)

    total = 0.0
    runs = 200
    input_dict = {"data_0": np.random.random_sample((1,3,224,224)).astype(np.float32)}

    # Warming up
    for i in range(20):
        _ = session.run([], input_dict)

    for i in range(runs):
        start = time.perf_counter()
        _ = session.run([], input_dict)
        end = (time.perf_counter() - start) * 1000
        total += end
    total /= runs
    print(f"Avg: {total:.4f}ms")

Then I tried convert_float_to_float16_model_path with keep_io_types=True. And this time an error occured.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from keep_inception_v2.onnx failed:D:\a\_work\1\s\onnxruntime\core\graph\graph.cc:1128 onnxruntime::Graph::Graph [ONNXRuntimeError] : 1 : FAIL : Tensor element type mismatch. 10 != 1

System information

  • OS Platform: tested on both Linux and Windows
  • ONNX Runtime version: onnxruntime-gpu with version 1.7.0 and 1.9.0
  • Python version: 3.6
  • onnx version: 1.10.1
  • onnxconverter-common version: 1.8.1

To Reproduce

  • Code has been shared before.
  • Model con be downloaded here

Thanks !

regarding the keep_io_types in float16 converter

https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py#L135

I think it's better for us to move the keep_io_types logic to the bottom(after the major fp16 conversion is done) because in the current implementation, there are so many inputs in model.graph.input are float at the beginning. Then many cast nodes will be added for that.

Current use of convert_float_to_float16 with keep_io_types=True:
model_fp16_ = oc.convert_float_to_float16(model_fp32, keep_io_types=False)
model_fp16 = oc.convert_float_to_float16(model_fp16_, keep_io_types=True)

Ideally, just one call of convert_float_to_float16 should be able to do the conversion

FP16 conversion yields an unusable model

I'm working with a model in Sagemaker (Resnet50 640x640, size [1, -1, -1, 3]) converted to ONNX. When trying to get more performance out of it by converting it to FP16, the conversion succeeds but trying to run the model gives this error:

E0907 08:27:25.823138 1379 model_lifecycle.cc:626] failed to load 'sagemaker' version 1: Internal: onnx runtime error 1: 
Load model from /models/sagemaker/1/model.onnx failed:Node (StatefulPartitionedCall/map/while_loop) Op (Loop) TypeInferenceError] 
Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] 
Either `sizes` or `scales` must be provided, but not both of them

Trying out mixed precision instead fails at shape inferencing:

Traceback (most recent call last):
  File "/workspace/fp-16-onnx-converter.py", line 15, in <module>
    model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, input_feed, rtol=0.01, atol=0.001, keep_io_types=True)
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 80, in auto_convert_mixed_precision
    if not run_attempt(node_names):
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 72, in run_attempt
    res1 = get_tensor_values_using_ort(model, feed_dict)
  File "/usr/local/lib/python3.10/dist-packages/onnxconverter_common/auto_mixed_precision.py", line 132, in get_tensor_values_using_ort
    sess = ort.InferenceSession(model.SerializeToString(), sess_options, providers=['CUDAExecutionProvider'])
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 426, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (StatefulPartitionedCall/map/while_loop) Op (Loop) [TypeInferenceError] Graph attribute inferencing failed: Node (Resize__59) Op (Resize) [ShapeInferenceError] Either `sizes` or `scales` must be provided, but not both of them

It gives the same error with the latest shape inferencing script from Github. I am not sure where I need to post this issue as multiple parts of the ONNX stack seem involved and not working.

Fp16 model runs slower than fp32 model

Hi @jcwchen,
I converted a model from keras to onnx and converted that float32 to float16 model using float16.py and noticed that the inference time it took was more than the float32 model on CPU. So can you please let me know the reason behind it?

Thanks

F16 file does not convert correctly

The float16.py conversion has a blacklist option , but it does not seem to blacklist correctly and still tries to run Resize at float16, leading to errors.

onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input 'Resize__139_input_cast_1' of node: name: Resize__139 OpType: Resize is not output of any previous nodes.

Use onnxconverter-common to quantize the model.onnx model to float16. When loading the fp16 model, the following error occurs. How should I solve it: onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input 'Resize__139_input_cast_1' of nodes:
name: Resize__139 OpType: Resize
is not output of any previous nodes.

#Onnx Quantisation

Can we use the convert_float_to_float16 function in float16 module to convert large onnx models like owlv2-L/14 ?
I tried to convert them but during the onnxruntime_inference I have some issue with graph.

InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("", ReduceMean, "", -1) : ("_0x7fec01fb3880_XU": tensor(float),) -> ("_0x7fec01fb3880_Mean2D",) , Error Unrecognized attribute: axes for operator ReduceMean

Is there any upgrades on onnxconverter-common?

while installing tf2onnx==1.16.0
onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 3.20.3 which is incompatible.
but
tensorflow-intel 2.15.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.2 which is incompatible.
Using Python 3.11.7
Thanks in advance

Add `convert_float_to_bfloat16` function to avoid overflow

I used convert_float_to_float16 function to convert fp32 models to fp16, but some models have outputs that have changed significantly due to overflow.
I think bf16 is useful to avoid overflow in such cases.

In the convert_float_to_float16 function, FLOAT16 is hard-coded and could not be changed to BFLOAT16 and so on.
I think that it would be more useful if it could be changed.

Thank you.

FP16 model can not get acceleration on GPU with ONNXRuntime-GPU

Hello,
I use the float16 tool to convert the FP32 model to the FP16 model and use ONNXRuntime-GPU 1.13.1 to inference.
I found that many models cannot obtain inference acceleration.
I want to know what kind of ONNX FP32 models can obtain inference acceleration after converting to FP16? ?
Looking forward to your answer, thank you

`auto_convert_mixed_precision` Error: two nodes with same node name error occurred during

I want to convert a model into an AMP model.
This is my code:

def convert_float32_to_mixed_precision(fp32_model_path, mixed_precision_model_path):
    from onnxconverter_common import auto_mixed_precision
    import onnx

    model = onnx.load(fp32_model_path)

    import numpy as np
    np.random.seed(123)
    test_data = {"image": 2*np.random.rand(1, 3, 640, 640).astype(np.float32)-1.0,"scale_factor":[[1,1]]}

    def validate(res1, res2):
        for r1, r2 in zip(res1, res2):
            if not np.allclose(r1, r2, rtol=0.01, atol=0.001):
                return False
        return True

    model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, test_data, validate, keep_io_types=True)
    onnx.save(model_fp16, mixed_precision_model_path)

fp32_model_path = 'F32.onnx'
mixed_precision_model_path = 'AMP.onnx'

print("Convert to mixed precision starts...")
convert_float32_to_mixed_precision(fp32_model_path, mixed_precision_model_path)
print("Conversion finished.")

OR

from onnxconverter_common import auto_mixed_precision
import onnx
import numpy as np

test_data = {"image": 2*np.random.rand(1, 3, 640, 640).astype(np.float32)-1.0,"scale_factor":[[1,1]]}
model = onnx.load("float32.onnx")
model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, test_data, rtol=0.01, atol=0.001, keep_io_types=True)
onnx.save(model_fp16, "AMP-float32.onnx")

Using my model to run this code resulted in an error:1 : FAIL : This is an invalid model. Error: two nodes with same node name (_output_cast0).
But I don't have duplicate node names.
This is my model: Google Drive

onnxconverter_common.auto_mixed_precision.auto_convert_mixed_precision never ends

Hey guys,

I hope you are doing great. onnxconverter_common.auto_mixed_precision.auto_convert_mixed_precision takes forever, I let it run for 15m and is still more or less half way. Any idea why? My code:

import onnxruntime as ort
import torch
from torchvision.models import ConvNeXt_Small_Weights, convnext_small
from onnxconverter_common.auto_mixed_precision import auto_convert_mixed_precision
import onnx

model_name = "model.onnx"
model = convnext_small(ConvNeXt_Small_Weights.IMAGENET1K_V1).eval().cuda()

x = torch.randn(1, 3, 224, 224, device="cuda")
# # Export the model
torch.onnx.export(
    model,  # model being run
    x,  # model input (or a tuple for multiple inputs)
    model_name,  # where to save the model (can be a file or file-like object)
    opset_version=16,
    export_params=True,  # store the trained parameter weights inside the model file
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["image"],  # the model's input names
    output_names=["output"],  # the model's output names
    dynamic_axes={
        "image": {0: "batch_size"},  # variable length axes
        "output": {0: "batch_size"},
    },
)

model = onnx.load("model.onnx")
model_fp16 = auto_convert_mixed_precision(model, { "image" : x.cpu().numpy() }, None, rtol=0.01, atol=0.001, keep_io_types=True)
onnx.save(model_fp16, "model_fp16.onnx")
# let's check
print("Checking")
x = torch.randn(1, 3, 224, 224, device="cuda", dtype=torch.float16)
ort_session = ort.InferenceSession("model_fp16.onnx", providers=["CUDAExecutionProvider"])
outputs = ort_session.run(None, {"image": x.cpu().numpy()})
print(outputs[0].shape, outputs[0].dtype)

Thanks a lot :)

"No space left on device" issue on auto_convert_mixed_precision_model_path()

Hi team,

I found auto_convert_mixed_precision_model_path() is easy to have "No space left on device" issue, where it seems to occupy lots of temporary disk space when running.

Below is an example of failure message, which is failed at running attempt 671:

segments= [*192*, *12*, *6*, *1*, (1), *1*, *3*, (1), *2*, *1*, *1*, (1), *6*, (1), (1), (1), (1), *2*, *1*, *1*, (1), (1), (1), (1), (1), (1), *1*, (1), (1), *1*, (1), (1), 1, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *12*, *3*, 1, 1, 1, *6*, 1, 1, 1, *3*, 1, 1, 1, 1, 1, 1, *3*, 1, 1, 1, 1, *2*, *3*, 1, *2*, *3*, 1, 1, 1, *3*, 1, 1, 1, *3*, 1, *2*, 1, 1, 1, *6*, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *2*, 1, 1, 1, *192*, *48*, 1, 1, 1, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *6*, *6*, *3*, 1, 1, 1, *6*, 1, 1, 1, *3*, 1, 1, 1, *3*, *6*, *24*, *12*, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *12*, *24*, *24*, *24*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *6*, *24*, 1, 1, 1, *3*, *6*, *12*, *24*, 1, 1, 1, *3*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *24*, *48*, *12*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *6*, *6*, *3*, 1, 1, 1, *3*, 1, 1, 1, 1, *2*, *3*, 1, 1, 1, 1, *2*, *3*, 1, 1, 1, *3*, 1, 1, 1, *6*, *3*, 1, 1, 1, 1, *2*, *3*, *3*, *3*, *3*, *3*, *6*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *12*, *3*, 1, 1, 1, *3*, 1, 1, 1, *3*, *3*, 1, 1, 1, *3*, *24*, *24*, 1, 1, 1, 1, 1, 1, 1, *2*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *6*, *6*, *3*, 1, 1, 1, *3*, 1, 1, 1, 1, *2*, *3*, *12*, *12*, *6*, *6*, *6*, *6*, *6*, *6*, 1, *2*, *3*, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, *2*, 1, 1, 1, *13*]
Running attempt 671 excluding conversion of 445 nodes
convert to float 16...
node block list =
['MatMul_211', 'Reshape_216', 'Where_221', 'Reshape_228', 'Softmax_229', 'MatMul_230', 'Transpose_231', 'Reshape_236', 'MatMul_237', 'Add_238', 'Add_239', 'ReduceMean_240', 'Sub_241', 'Pow_243', 'ReduceMean_244', 'Add_246', 'Sqrt_247', 'Mul_249', 'Add_250', 'MatMul_251', 'Add_252', 'Div_255', 'Erf_256', 'Constant_257', 'Add_258', 'Mul_259', 'Constant_260', 'Mul_261', 'MatMul_262', 'Add_263', 'Add_264', 'ReduceMean_265', 'Sub_266', 'Constant_267', 'Pow_268', 'ReduceMean_269', 'Constant_270', 'Add_271', 'Sqrt_272', 'Div_273', 'Mul_274', 'Add_275', 'Slice_291', 'MatMul_292', 'Add_293', 'MatMul_300', 'Add_301', 'Constant_302', 'Slice_306', 'MatMul_307', 'Add_308', 'Constant_309', 'Mul_310', 'Constant_311', 'Unsqueeze_315', 'Concat_316', 'Reshape_317', 'Transpose_318', 'Reshape_324', 'Concat_330', 'Reshape_331', 'Transpose_332', 'Transpose_336', 'MatMul_337', '......', 'Concat_1369', 'Reshape_1370', 'MatMul_1371', 'Add_1372', 'Add_1373', 'ReduceMean_1374', 'Sub_1375', 'ReduceMean_1378', 'Constant_1379', 'Add_1380', 'Sqrt_1381', 'Div_1382', 'Mul_1383', 'Add_1384', 'MatMul_1385', 'Add_1386', 'Cast_1387', 'Constant_1388', 'Div_1389', 'Erf_1390', 'Constant_1391', 'Add_1392', 'Mul_1393', 'Constant_1394', 'Mul_1395', 'MatMul_1396', 'Add_1397', 'Add_1398', 'ReduceMean_1399', 'Sub_1400', 'Constant_1401', 'Pow_1402', 'ReduceMean_1403', 'Constant_1404', 'Add_1405', 'Sqrt_1406', 'Div_1407', 'Mul_1408', 'Add_1409', 'Shape_1410', 'MatMul_1426', 'Add_1427', 'Constant_1428', 'Constant_1432', 'Slice_1433', 'MatMul_1434', 'Add_1435', 'Sub_1501', 'Sqrt_1507', 'Div_1508', 'Mul_1509', 'Add_1510', 'MatMul_1511', 'Add_1512', 'Cast_1513', 'Constant_1514', 'Div_1515', 'Erf_1516', 'Constant_1517', 'Add_1518', 'Mul_1519', 'MatMul_1522', 'Add_1523', 'Add_1524']
Traceback (most recent call last):
  File "fp16_convert.py", line 104, in <module>
    convert_float32_to_mixed_precision_model_path(fp32_model_path, fp16_mixed_model_path, ort_inputs, providers)
  File "fp16_convert.py", line 18, in convert_float32_to_mixed_precision_model_path
    fp32_model_path, input_feed, mixed_precision_model_path, location="graph_mixed_precision_tensor.data", customized_validate_func=validate, keep_io_types=True, provider=providers, verbose=True)
  File "/usr/local/lib/python3.6/site-packages/onnxconverter_common/auto_mixed_precision_model_path.py", line 123, in auto_convert_mixed_precision_model_path
    final_block_list = _find_nodes_blocking_fp16(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/onnxconverter_common/auto_mixed_precision_model_path.py", line 203, in _find_nodes_blocking_fp16
    if _convert_and_check_inference_result(**kwargs):
  File "/usr/local/lib/python3.6/site-packages/onnxconverter_common/auto_mixed_precision_model_path.py", line 244, in _convert_and_check_inference_result
    save_model(model_16, target_model_path, location=location)
  File "/usr/local/lib/python3.6/site-packages/onnxconverter_common/auto_mixed_precision_model_path.py", line 261, in save_model
    onnx.save_model(model, model_path, save_as_external_data=True, location=location)
  File "/usr/local/lib/python3.6/site-packages/onnx/__init__.py", line 200, in save_model
    proto = write_external_data_tensors(proto, basepath)
  File "/usr/local/lib/python3.6/site-packages/onnx/external_data_helper.py", line 263, in write_external_data_tensors
    save_external_data(tensor, filepath)
  File "/usr/local/lib/python3.6/site-packages/onnx/external_data_helper.py", line 180, in save_external_data
    data_file.write(tensor.raw_data)
OSError: [Errno 28] No space left on device

This is my conversion code:

from onnxconverter_common import auto_mixed_precision_model_path
import onnx
import numpy as np
import onnxruntime as ort
from onnxmltools.utils.float16_converter import convert_float_to_float16_model_path
from onnxmltools.utils import save_model


def convert_float32_to_mixed_precision_model_path(fp32_model_path, mixed_precision_model_path, input_feed, providers):
    # Could also use rtol/atol attributes directly instead of this
    def validate(res1, res2):
        for r1, r2 in zip(res1, res2):
            if not np.allclose(r1, r2, rtol=0.01, atol=0.001):
                return False
        return True

    auto_mixed_precision_model_path.auto_convert_mixed_precision_model_path(
        fp32_model_path, input_feed, mixed_precision_model_path, location="graph_mixed_precision_tensor.data", customized_validate_func=validate, keep_io_types=True, provider=providers, verbose=True)
    #onnx.save(model_fp16, mixed_precision_model_path)


def convert_float32_to_float16(fp32_model_path, fp16_model_path):
    from onnxmltools.utils.float16_converter import convert_float_to_float16_model_path
    from onnxmltools.utils import save_model

    new_onnx_model = convert_float_to_float16_model_path(fp32_model_path, keep_io_types=True)
    save_model(new_onnx_model, fp16_model_path)

def convert_float32_to_mixed_precision(fp32_model_path, mixed_precision_model_path, ort_inputs):
    from onnxconverter_common import auto_mixed_precision
    import onnx

    model = onnx.load(fp32_model_path)

    import numpy as np
    np.random.seed(123)

    # Could also use rtol/atol attributes directly instead of this
    def validate(res1, res2):
        for r1, r2 in zip(res1, res2):
            if not np.allclose(r1, r2, rtol=0.01, atol=0.001):
                return False
        return True

    model_fp16 = auto_mixed_precision.auto_convert_mixed_precision(model, ort_inputs, validate, keep_io_types=True)
    onnx.save(model_fp16, mixed_precision_model_path)

def test(onnx_model_path, ort_inputs, ort_output_names):
    import numpy as np
    import time
    np.random.seed(123)
    #Load ort model
    import onnxruntime as ort
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_options.intra_op_num_threads = 0

    sess = ort.InferenceSession(onnx_model_path, sess_options, providers=['CUDAExecutionProvider'])
    #sess = ort.InferenceSession(onnx_model_path, sess_options, providers=['CPUExecutionProvider'])

    #warm-up run
    warm_up_start_stamp = time.time()
    onnx_outs = sess.run(ort_output_names, ort_inputs)[0]
    print(f"onnx_outs of warm-up:", onnx_outs)
    print(f"It takes {time.time()-warm_up_start_stamp} to finish warm-up.\n")

    start_stamp = time.time()
    num_batches = 0
    for i in range(num_batches):
        print(f"batch id: {i}")
        onnx_outs = sess.run(ort_output_names, ort_inputs)
        print(f"onnx_outs:", onnx_outs)
        print(f"{i}th batch finished successfully. ")
    print(f"It takes {time.time()-start_stamp} to finish {num_batches} batches.\n")


fp32_model_path = './model/graph.onnx'
#fp16_model_path = './model/8_fp16/graph_fp16.onnx'
#convert_float32_to_float16(fp32_model_path, fp16_model_path)
fp16_mixed_model_path = './model/mixed_precision/graph_mixed_precision.onnx'


ort_inputs={
        'input_ids':[
            [10093,21382,2094,26264,10093,11106,15879,2705,4132,5099,11057,24577,5576,12379,26743,3468,1998,10003,3857,5099,11057,24577,10093,11106,15879,2705,2565,2005,2151,12233,5901,15770,8231,2007,27084,6162,2565,3112,2085,1999,1996,2925,2007,2256,7721,7661,21335,3468,2578,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [2041,5963,21475,4580,5269,2041,5963,5269,2379,2033,4638,2085,2424,2041,5963,21475,4580,5269,2379,2033,2041,5963,21475,4580,12183,14266,1998,5269,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [2270,2636,2005,21904,5937,7523,3784,2270,2636,2156,1015,21904,5937,1055,3042,1016,4769,1017,2287,2062,3046,6575,2085,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [8285,9323,11742,6084,10846,2047,12609,11742,1018,5479,3193,2783,4107,7523,2256,2312,4989,1997,2047,2109,11742,4683,2800,2085,5959,2307,9144,2326,2651,2123,1056,3335,2256,2783,4107,21134,3942,2085,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [2273,2015,3509,16722,4293,2125,2085,4121,5096,2006,2035,2273,2015,3509,16722,2085,9241,3132,3749,3828,2085,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [4268,4542,17764,2039,2000,3438,2125,2131,3201,2005,6172,2039,2000,3438,2125,5096,2012,3137,9746,4497,2085,1998,3828,4497,2085,1998,2131,2039,2000,3438,2125,2012,3137,9746,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [2424,6100,1997,1037,2610,3189,3784,4607,1037,2171,1998,3945,2424,2026,14757,19040,3229,2610,2636,14757,19040,2015,1998,2062,2074,4607,2151,2171,3945,3229,14757,19040,2015,4606,2610,2636,2062,2797,5851,2074,2828,1999,2171,2110,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
            [3828,2006,13675,12322,2015,2012,15849,17406,4497,2256,4100,3296,5096,3693,2252,1051,2131,14610,2058,1015,2454,9144,2296,4696,2006,5096,2360,7592,2000,3500,2007,4121,2188,3871,10995,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
            ]
}

ort_output_names = ['seq_embedding']

#print("Test fp32 model: ")
#test(fp32_model_path, ort_inputs, ort_output_names)
#print("f32 model test finished.")

providers=['CUDAExecutionProvider']
print("Convert to mixed precision starts...")
convert_float32_to_mixed_precision_model_path(fp32_model_path, fp16_mixed_model_path, ort_inputs, providers)
print("Conversion finished.")

#print("Test mixed precision model: ")
#test(mixed_precision_model_path, ort_inputs, ort_output_names)
#print("Mixed precision model test finished.")

How to reproduce?
Check this zip file:
https://drive.google.com/file/d/1oJayYH4HCeB6VE1Kp-DiNvqLTMAWsUwc/view?usp=sharing
It includes:

  1. code file: fp16_converter.py
  2. model file: model/graph.onnx
  3. failure log file: prophetNet_conversion.log
    For this error, this log file could help as it contains all the information you need. Otherwise, run the model conversion code needs to take several hours until failure.

Command to run:
python fp16_converter.py

Performance degrade after sess_options.enable_profiling = True

Hi @xiaowuhu

Using the tool referred in https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html
I measure stable diffusion with specify sess_options.enable_profiling = True
before add it . I get 22.40it/s, after enabling it . I just can achieve 10.45. performance degrading from 22.40 to 10.45.

Could it be improved ?
Following is measured data

image

Another question is that I have two configurations, before enable profiling

config A is 22.40 it/s , config B is 20.23 it/s
but after turn on the profiling with sess_options.enable_profiling = True
config A is 10.45 it/s, config B is 11.03 it/s

Since the total number operations is different, I'm not sure it affect performance measuring or not
config A is 56250 counts to measure
config B is 49800 counts to measure.

Redundant dependencies in requirements.txt

requirements.txt specifies unversioned dependencies on numpy and protobuf, both of which are (version constrained) dependencies of onnx, and so these are redundant and can be removed. For reference, the dependencies identified from poetry are:

├── onnxconverter-common >=1.7.0
│   ├── numpy *
│   ├── onnx *
│   │   ├── numpy >=1.16.6 (circular dependency aborted here)
│   │   ├── protobuf >=3.12.2
│   │   └── typing-extensions >=3.6.2.1
│   ├── packaging *
│   │   └── pyparsing >=2.0.2,<3.0.5 || >3.0.5
│   └── protobuf * (circular dependency aborted here)

Fp32-->fp16: original fp32 model works well with input data, but converted fp16 model failed with the same input data

Hi,

I am using onnxmltools to convert a fp32 model to fp16. The original fp32 model was converted from pyTorch model with opset12. The fp32 model works well on input data. However, the fp16 model failed with error msg when inferencing the input data. Could you help have a check?

This is the code I used to convert fp32 onnx model to fp16 model. It finished successfully.

import onnx
from onnxmltools.utils.float16_converter import *
from onnxmltools.utils import load_model, save_model

onnx_model_path = './graph_opset12.onnx'
new_onnx_model = convert_float_to_float16_model_path(onnx_model_path, keep_io_types=True)
save_model(new_onnx_model, './graph_opset12_fp16.onnx')

After I got the converted fp16 model, I used below code to run this model:

# Import Libraries
import argparse
import torch
import random
from torchvision.transforms import *
from PIL import Image, ImageFile
from io import BytesIO
import base64
import time
from torch.utils.data import IterableDataset
import torchvision as tv

ImageFile.LOAD_TRUNCATED_IMAGES = True
script_start_time = time.time()

class ImageDataset_Base64(IterableDataset):

    def __init__(self, filename, transforms=None):
        print("File from which we are training {}".format(filename))
        self.filename = filename
        self.transform = transforms
        self.parts = {}
        self.lines = open(self.filename).readlines()
        self.length = len(self.lines)
        print("Number of data points {}".format(self.length))
        for i in range(8):
            self.parts[i] = self.lines[int(i*self.length/8):int((i+1)*self.length/8)]

    def preprocess_img(self, img_b64):
        try :
            im = Image.open(BytesIO(base64.b64decode(img_b64)))
            X = im.convert('RGB')
        except :
            X = Image.new('RGB', (480, 480)) # default color is black
        if self.transform is not None:
            X = self.transform(X)
        return X

    def preprocess_id(self, id):
        return int(id)

    def preprocess_label(self, label):
        try :
            y = int(label)
        except:
            return 0
        if y in [0, 1, 2]: return y
        else: return 0

    def line_mapper(self, line):
        # splits the line into text and label and applies preprocessing to the text
        url, id, imgb64  = line.rstrip().split('\t')
        label = random.choice([0,1,2])
        # id_str = random.randint(1,1000000)
        id = self.preprocess_id(id)
        X = self.preprocess_img(imgb64)
        y = self.preprocess_label(label)
        return id, X, y

    def __iter__(self):
        # create an iterator
        worker_info = torch.utils.data.get_worker_info()
        worker_id = worker_info.id
        # map each element using the line_mapper
        mapped_itr = map(self.line_mapper, self.parts[worker_id])
        return mapped_itr

def get_val_loader(args):
    val_tx = tv.transforms.Compose([
        tv.transforms.Resize((args.image_size, args.image_size)),
        tv.transforms.ToTensor(),
        tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])
    val_set = ImageDataset_Base64(args.test_file_path, transforms=val_tx)
    val_loader = torch.utils.data.DataLoader(val_set, args.batch_size, 
                        num_workers=8, pin_memory=True)
    return val_loader

def evaluate(args):
    val_loader = get_val_loader(args)

    import onnxruntime as ort
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
    sess_options.intra_op_num_threads = 0
    sess = ort.InferenceSession(args.onnx_model_path, sess_options)
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    def to_numpy(torch_tensor):
        return torch_tensor.detach().cpu().numpy() if torch_tensor.requires_grad else torch_tensor.cpu().numpy()
    
    accumulated_inference_time = 0
    with torch.no_grad():
        for i, (id, data, target) in enumerate(val_loader):
            data = data.to(args.device)
            #print(data.size())
            
            #below is onnx inference code
            data = to_numpy(data)
            start_stamp = time.time()
            pred = sess.run([label_name], {input_name: data})[0]
            accumulated_inference_time += time.time() - start_stamp

    print(f"Total Onnx model inference time is {accumulated_inference_time}")
    
if __name__ == "__main__":
    script_start_time = time.time()
    parser = argparse.ArgumentParser()
    args, unknown = parser.parse_known_args()

    args.batch_size = 128
    args.image_size = 480

    args.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    args.n_gpu = torch.cuda.device_count()
    print(" ARGS.device " + str(args.device) + "  ARGS.n_gpu " + str(args.n_gpu))

    args.test_file_path = "./DivideImages_0.tsv"
    args.onnx_model_path = "./graph_opset12.onnx"
    evaluate(args)

    print(f"Total running time: {time.time()-script_start_time}")

However, it failed with below failure msg when running fp16 model. While fp32 model runs fine.

File "/home/tiy/.local/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running InstanceNormalization node. Name:'InstanceNormalization_31' Status Message: CUDNN error executing cudnnBatchNormalizationForwardTraining( CudnnHandle(), CUDNN_BATCHNORM_SPATIAL, &one, &zero, data_desc, x_data, data_desc, y_data, stats_desc, unused_scale.get(), unused_bias.get(), 1.0f, mean.get(), variance.get(), CUDNN_BN_MIN_EPSILON, nullptr, nullptr)

The code, model and data can be found here: https://www.dropbox.com/s/27nlnm7avp7wins/Resnet_fp16_test.zip?dl=0

  • To run the code: python resnet-opset12_fp16-test.py
  • To test fp16 model: replace args.onnx_model_path = "./graph_opset12.onnx" in the script to be args.onnx_model_path = "./graph_opset12_fp16.onnx" and run the command python resnet-opset12_fp16-test.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.