Issue Type Feature Request OS <p dir="auto

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Exclude selected operations from INT8 quantized model about onnx2tf HOT 18 CLOSED

adamp87 commented on June 14, 2024

Exclude selected operations from INT8 quantized model

from onnx2tf.

Comments (18)

PINTO0309 commented on June 14, 2024

The yolov8 thread is too long to read.

Clearly state here which operations to skip and from which operations. It's too much trouble to find out.

from onnx2tf.

adamp87 commented on June 14, 2024

The idea would be to have a general option, maybe in PARAM_REPLACEMENT_FILE, so a user could edit each model accordingly. Yolov8 is just an example where it could be beneficial. For the attached yolov8 model visualization, the Concatenation and every OP after it should be excluded.

from onnx2tf.

PINTO0309 commented on June 14, 2024

There is more than one Concat. I didn't know which Concat he was talking about, so I cut it down appropriately. And I still don't understand what you expect me to do.

onnx2tf \
-i yolov8_n.onnx \
-onimc /model.22/Sigmoid_output_0 /model.22/Sub_1_output_0 /model.22/Div_output_0

from onnx2tf.

adamp87 commented on June 14, 2024

Hey,

sorry for the confusion. Let me try to explain with a visual example. This is a toy example, the model makes no sense its just for illustration.

The left is the fully quantized INT8 model, which is the output of the onnx2tf tool. The right model is what I would like to have. I would like to insert a dequantization and keep everything after dequantization in FP32.

from onnx2tf.

PINTO0309 commented on June 14, 2024

Are you saying that the position of the inverse quantization should be controllable by onnx2tf? That is impossible. Let me know if you know of any of the TensorFlow converter parameters below that can handle your request.

I would have implemented such a feature a year ago if I knew how.

https://github.com/tensorflow/tensorflow/blob/6e6ca51e99c8d46c401ad11982cbf846c3a4071f/tensorflow/lite/python/lite.py#L603-L678

class TFLiteConverterBase:
  """Converter superclass to share functionality between V1 and V2 converters."""

  # Stores the original model type temporarily to transmit the information
  # from the factory class methods to TFLiteConverterBase init function.
  _original_model_type = conversion_metdata_fb.ModelType.NONE

  def __init__(self):
    self.optimizations = set()
    self.representative_dataset = None
    self.target_spec = TargetSpec()
    self.allow_custom_ops = False
    self.experimental_new_converter = True
    self.experimental_new_quantizer = True
    self.experimental_enable_resource_variables = True
    self._experimental_calibrate_only = False
    self._experimental_sparsify_model = False
    self._experimental_disable_per_channel = False
    self._debug_info = None  # contains the stack traces of all the original
    # nodes in the `GraphDef` to the converter.
    self.saved_model_dir = None
    self._saved_model_tags = None
    self._saved_model_version = 0
    self._saved_model_exported_names = []
    self._tflite_metrics = metrics.TFLiteConverterMetrics()
    self._collected_converter_params = {}
    self.unfold_batchmatmul = False
    self.legalize_custom_tensor_list_ops = False
    self._experimental_lower_tensor_list_ops = True
    self._experimental_default_to_single_batch_in_tensor_list_ops = False
    self._experimental_unfold_large_splat_constant = False
    self._experimental_tf_quantization_mode = None
    # If unset, bias:int32 is by default except 16x8 quant.
    # For 16x8 quant, bias:int64 is used to prevent any overflow by default.
    # The accumulator type will be the same as bias type set by
    # full_integer_quantization_bias_type.
    self._experimental_full_integer_quantization_bias_type = None
    # Provides specs for quantization, whether preset or custom.
    self._experimental_quantization_options = None  # Deprecated
    self.experimental_use_stablehlo_quantizer = False
    # Initializes conversion metadata.
    self.exclude_conversion_metadata = False
    self._metadata = conversion_metdata_fb.ConversionMetadataT()
    self._metadata.environment = conversion_metdata_fb.EnvironmentT()
    self._metadata.options = conversion_metdata_fb.ConversionOptionsT()
    self._metadata.environment.tensorflowVersion = versions.__version__
    self._metadata.environment.modelType = self._get_original_model_type()
    self._experimental_enable_dynamic_update_slice = False
    self._experimental_preserve_assert_op = False
    self._experimental_guarantee_all_funcs_one_use = False

    # When the value is true, the MLIR quantantizer triggers dynamic range
    # quantization in MLIR instead of the old quantizer. Used only if
    # experimental_new_quantizer is on.
    self.experimental_new_dynamic_range_quantizer = True
    # Experimental flag to enable low-bit QAT in 8 bit.
    self._experimental_low_bit_qat = False
    # Experimental flag to add all TF ops (including custom TF ops) to the
    # converted model as flex ops.
    self._experimental_allow_all_select_tf_ops = False

    self._experimental_variable_quantization = False
    self._experimental_disable_fuse_mul_and_fc = False
    self._experimental_use_buffer_offset = False
    self._experimental_reduce_type_precision = False
    self._experimental_qdq_conversion_mode = None

    # Debug parameters
    self.ir_dump_dir = None
    self.ir_dump_pass_regex = None
    self.ir_dump_func_regex = None
    self.enable_timing = None
    self.print_ir_before = None
    self.print_ir_after = None
    self.print_ir_module_scope = None
    self.elide_elementsattrs_if_larger = None

from onnx2tf.

PINTO0309 commented on June 14, 2024

I understand that it is a toy model, but I don't understand at all the significance of not separating the last Mul and Add of tflite from the model. In the first place, it is Float32 multiplication and addition, so writing two lines of multiplication and addition on the program side would not make any difference in performance.

This idea can only be used if primitive operations are followed, though.

from onnx2tf.

adamp87 commented on June 14, 2024

I dont know if thats possible with TFLite, but would be good. For example OpenVINOs NNCF convert() function do accept a parameter named ignored_scope.

Yes, it is a toy example and toy example are meant to be simple. In YOLO8, after the Concat (not the last before the output) everything should be executed in FP32.

from onnx2tf.

PINTO0309 commented on June 14, 2024

I finally understand what you are intending.

I will say it again. It is impossible. Submit a feature request issue to TensorFlow.

from onnx2tf.

adamp87 commented on June 14, 2024

Oh, all right, sad to hear. Thanks for your help.

from onnx2tf.

EpiX-1 commented on June 14, 2024

Hi @adamp87,

I think you can achieve what you want by converting your onnx model to keras using the -oh5 option from onnx2tf.
Then you can quantize your keras model with the TensorFlow API using tf.lite.experimental.QuantizationDebugger() which let you specify nodes/operators to skip during the quantization process (check tf.lite.experimental.QuantizationDebugger()).
Hope this helps.

from onnx2tf.

adamp87 commented on June 14, 2024

Hi @EpiX-1,

thank you so much for the suggestion. I think this could go the right way. Sadly Im having an issue and not sure how to go further. Are you familiar with onnx2tf or could you give me a hint what could be the problem?

By running the following in Colab:

!git clone https://github.com/adamp87/ultralytics.git
%cd /content/ultralytics
!git checkout tflite_accurate
!pip install -e .
!yolo export model=yolov8n.pt data=coco128.yaml format=tflite imgsz=640

I get this error during the init of QuantizationDebugger:
'/model.10/Resize' is not a valid root scope name. A root scope name has to match the following pattern: ^[A-Za-z0-9.][A-Za-z0-9_.\\/>-]*$

You can find my code here: GitHub Compare

Thanks for your help

from onnx2tf.

EpiX-1 commented on June 14, 2024

I'm able to reproduce the issue. Your code seems good to me.
I've managed to resolve the issue by reverting to an older commit of YOLOv8. I have no idea why it occurs, though.

from onnx2tf.

adamp87 commented on June 14, 2024

Thanks for your reply. If you revert to an older commit you remove the code with the QuantizationDebugger

…

On Fri, Mar 1, 2024, 20:03 EpiX-1 ***@***.***> wrote: I'm able to reproduce the issue. Your code seems good to me. I've managed to resolve the issue by reverting to an older commit of YOLOv8 <https://github.com/ultralytics/ultralytics/tree/b507e3a03260bc333474641b7d36aaabb736b8af>. I have no idea why it occurs, though. — Reply to this email directly, view it on GitHub <#578 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3R7PUKMMNBWIRD74RZ5YLYWDGH5AVCNFSM6AAAAABCOI32UOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTG43DAMRSGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from onnx2tf.

EpiX-1 commented on June 14, 2024

Sure,
What I meant is that I've cloned the original YOLOv8 repository at the specific commit I provided you, to export the .pt model to onnx. After converting manually the onnx model to keras using onnx2tf, I've reused my code, which is very similar to yours, to quantize the keras model with the QuantizationDebugger.
I hope it's clearer now.

from onnx2tf.

adamp87 commented on June 14, 2024

Sorry my bad, thanks for the clarification. I will try it the next days and try to find what caused the problem. Lets keep in touch

…

On Fri, Mar 1, 2024, 21:35 EpiX-1 ***@***.***> wrote: Sure, What I meant is that I've cloned the original YOLOv8 repository at the specific commit I provided you, to export the .pt model to onnx. After converting manually the onnx model to keras using onnx2tf, I've reused my code, which is very similar to yours, to quantize the keras model with the QuantizationDebugger. I hope it's clearer now. — Reply to this email directly, view it on GitHub <#578 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3R7PRG7ER2BOO4TKAASEDYWDQ7RAVCNFSM6AAAAABCOI32UOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTHA3TOMRVGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from onnx2tf.

Exclude selected operations from INT8 quantized model about onnx2tf HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs