Hi Author, I need to add some extra output tensors which are used fo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

My attached <a href="https://github.com/alibaba/TinyNeuralNetwork/files/781

Add extra output for inference,about alibaba/tinyneuralnetwork

Comments (44)

liamsun2019 commented on May 1, 2024

source code snippet:

center = ret[head]
center_max = torch.sigmoid(center)
center_max = self.maxpool(center_max)
center_peaks = (center_max == center).float()
center = center * center_peaks
ret['filtered_hm'] = center

where ret['filtered_hm'] is one of the extra outputs. The error message is supposed to be related that.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

If I changed the codes like following:
center = ret[head]
center_max = torch.sigmoid(center)
center_max = self.maxpool(center_max)
ret['hm_hmax'] = center_max

Another error rises up:
assert tensor.q_zero_point() == 128, "As for symmetric quantization, "
AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train
the model after QAT preparation. Attached the script.
movenet_qat.zip

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Above experiments are based on the recent version.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

BTW，the following line
center = ret[head]

better rewritten as:
center = ret[head].clone()

to avoid being overridden. It does not influence the experiment results.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019

I add some logic in forward to implement this requirement and the training goes well but the conversion to tflite fails with following error message:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/functional_modules.py", line 160, in mul
r = ops.quantized.mul(x, y, scale=self.scale, zero_point=self.zero_point)
RuntimeError: Mul operands should have same data type.

This is because the graph rewriter for quantization doesn't properly handle type casting functions like .float(). At this point, you may rewrite it yourself.

The diff to the model I made it to work is shown below.

317c317,318
<         mul_1 = self.float_functional_simple_13.mul(hm_3, float_1)
---
>         fake_dequant_0 = self.fake_dequant_0(hm_3)
>         mul_1 = fake_dequant_0 * float_1
340,341c341,342
<         fake_dequant_0 = self.fake_dequant_0(hm_3)
<         fake_dequant_1 = self.fake_dequant_1(mul_1)
---
>         # fake_dequant_1 = self.fake_dequant_1(mul_1)
>         fake_dequant_1 = mul_1

If I changed the codes like following: center = ret[head] center_max = torch.sigmoid(center) center_max = self.maxpool(center_max) ret['hm_hmax'] = center_max

Another error rises up: assert tensor.q_zero_point() == 128, "As for symmetric quantization, " AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train the model after QAT preparation. Attached the script. movenet_qat.zip

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

from movenet_qat import MoveNet_qat
model = MoveNet_qat()

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)

# QAT prep
quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize()

# Invoke once
qat_model(dummy_input) 

# Conversion goes here

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Big Thanks. I'll try it out later and let you know the results.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

One more question, is there a simple way for forward to output different tensors under different conditions? For instance, I need o1, o2 for training while o3 and o4 for inference. In the stage of conversion to tflite, I just want o3 and o4 to be output. Do I have to manually edit the .py file to achieve this ?

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

==> Based on my experiments, it shows that this error only rises up after I add some extra outputs to the forward operation. If I remove these extra outputs, the error disappear. I set max epoch to 2 to conduct the experiments.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

My script attached
movenet_qat.zip

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

My simple guess is since the added outputs do not join in training, so the tracer(just guess) might not trace them correctly.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Any updates?
^_^

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

Any updates? ^_^

Sorry for late reply, we were working on something else.

The error message is legit. For symmetric QAT, you must train the model for at least one iteration or invoke the forward function of the model at least once, otherwise the zero point will remain zero, which leads to this error.

==> Based on my experiments, it shows that this error only rises up after I add some extra outputs to the forward operation. If I remove these extra outputs, the error disappear. I set max epoch to 2 to conduct the experiments.

As can be seen in
https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/observer.py#L272-L273 and https://github.com/pytorch/pytorch/blob/402f2934bf380964a403d2e139ec529d1f5bac0e/torch/ao/quantization/utils.py#L148-L176, if you don't run inference once, the min, max values of the observers will remain -inf and inf, so that the scale and the zero point will be set 1 and 0 accordingly, which leads to the failed asserts in the converter. There's nothing more I can say without the details of your experiment.

One more question, is there a simple way for forward to output different tensors under different conditions? For instance, I need o1, o2 for training while o3 and o4 for inference. In the stage of conversion to tflite, I just want o3 and o4 to be output. Do I have to manually edit the .py file to achieve this ?

The problem is already covered in our FAQ. So, the brief answer is yes, because it's how tracing works.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Got it, I will try the proposed way in FAQ. Big thanks for your help.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

I followed the method in FAQ:

Generate the script for inference. (Looks that quantizer.quantize() will force setting training mode and I have to hack my code to generate the script)
QAT train the model and get the qat_last_model.pth
Based on the script for inference, convert to tflite like following:
if name == "main":
qat_model = MoveNet_qat()
qat_model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
with torch.no_grad():
qat_model.eval()
qat_model.cpu()
qat_model(dummy_input.to('cpu'))
torch.backends.quantized.engine = 'qnnpack'
converter = TFLiteConverter(qat_model, dummy_input, tflite_path="test.tflite", asymmetric=False)
converter.convert()

The tflite can be generated. But the weights/bias in it are already converted to float32. I actually need a QAT tflite whose weights/bias are supposed to be int8/int32. How could I achieve it?

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 You need to go through quantizer again although your model is already QAT rewritten (because a QAT rewritten model is still a float model, not a quantized one).

qat_model = MoveNet_qat()
qat_model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)

dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
quantizer = QATQuantizer(qat_model, dummy_input, work_dir='out', config={'rewrite_graph': False, ...})
qat_model = quantizer.quantize()

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

See https://github.com/alibaba/TinyNeuralNetwork/blob/main/examples/qat/qat.py#L30 and https://github.com/alibaba/TinyNeuralNetwork/tree/main/examples/qat#the-quantization-process-in-pytorch for more details.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

if name == "main":
model = MoveNet_qat()
model.load_state_dict(torch.load('qat_last_model.pth'), strict=False)
dummy_input = torch.ones((1, 3, 256, 256), dtype=torch.float32)
quantizer = QATQuantizer(model, dummy_input, work_dir='./', config={'backend': "qnnpack", 'force_overwrite': False, 'asymmetric': False, 'per_tensor': False, 'rewrite_graph': False})
qat_model = quantizer.quantize()
qat_model(dummy_input.to('cpu'))

with torch.no_grad():
    qat_model.eval()
    qat_model.cpu()
    qat_model = torch.quantization.convert(qat_model)
    torch.backends.quantized.engine = 'qnnpack'
    converter = TFLiteConverter(qat_model, dummy_input, tflite_path="ohyeah.tflite", asymmetric=False)
    converter.convert()

The following error rises up:
assert tensor.q_zero_point() == 128, "As for symmetric quantization, "
AssertionError: As for symmetric quantization, the zero point of the u8 tensors should be 128. This could happen if you didn't train the model after QAT preparation.

I do not need training anymore but looks like that I have to. Any suggestions?

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 Would you please share the code of the class MoveNet_qat?

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Sure, FYR
test.zip

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 I can reproduce locally. Looking into it now.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

It seems the problem is on torch.sigmoid. A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it.
==> I apply qscheme of per channel and symmetric, instead of per tensor. So this issue also exists for qscheme of per channel and symmetric?

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

Just wonder what your target platform is. Take NNAPI as an example, it supports the common qschemes:

ANEURALNETWORKS_TENSOR_QUANT8_ASYMM (uint8)
ANEURALNETWORKS_TENSOR_QUANT8_ASYMM_SIGNED (int8)
ANEURALNETWORKS_TENSOR_QUANT8_SYMM (int8 with zero point=0)
ANEURALNETWORKS_TENSOR_QUANT8_SYMM_PER_CHANNEL (int8 with zero point=0)

For ops that support per channel (e.g. Conv2D), you should use (4) for the weight and (2) or (3) for the input. As for other ops, you use (2) or (3). But I don't the support for (2) is broad enough, usually there is only support for (3).

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

A tensor with the qscheme of per tensor and symmetric will become per tensor and affine after running through this op, so we need to insert a (re)quantize op after it. ==> I apply qscheme of per channel and symmetric, instead of per tensor. So this issue also exists for qscheme of per channel and symmetric?

The activations are always using the per tensor qscheme. That's why I ask the previous question. If your target platform has support for (2), then we may just cancel the limitation. But if it only supports (3), then we need to insert the requantize nodes during QAT graph rewriting.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

For per-channel QAT，our target platform supports asymmetric_affine int8 for activation. And weights supports perchannel_symmetric_affine int8

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

For per-channel QAT，our target platform supports asymmetric_affine int8 for activation. And weights supports perchannel_symmetric_affine int8

OK, we will work on it.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 I've uploaded the related changes. You may have to use the following line instead for defining the converter object.

converter = TFLiteConverter(qat_model, dummy_input, tflite_path="ohyeah.tflite", asymmetric=True, quantize_target_type='int8')

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

OK. Will update and try it out.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

I tried the recent version and can get the outputs for inference. Thanks a lot.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 Looks like this issue is resolved. I'll close it. Please feel free to open a new issue when you encounter new problems. Again, thanks for supporting our project.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 FYI, we've decoupled asymmetric and per_tensor in the quantizer so you are now free to do asymmetric per-channel quantization. Please read here for more details.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

My understanding is that you support asym per-channel QAT now, right ?

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

My understanding is that you support asym per-channel QAT now, right ?

Yes. But actually this should be
OPs that support per-channel, weight , symmetric, int8, per-channel
OPs that support per-channel, activation , asymmetric, int8, per-tensor
Other OPs, weight, symmetric, int8, per-tensor
Other OPs, activation , asymmetric, int8, per-tensor
for config={'asymmetric': True, 'per_tensor': False}.

Previously, we have
OPs that support per-channel, weight , symmetric, int8, per-channel
OPs that support per-channel, activation , symmetric, int8, per-tensor
Other OPs, weight, symmetric, int8, per-tensor
Other OPs, activation , symmetric, int8, per-tensor
for config={'asymmetric': False, 'per_tensor': False}.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

I just tried with config:

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True})

and converter:
converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=True)

The following error is output:
assert tensor.q_zero_point() == asym_s8_offset, "As for asymmetric quantization, "
RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

With above message, looks like some pytorch ops do not support asym per-channel. As what you mentioned before, all the activations only suppport per-tensor qat, so probably asym per-channel schema cannot be applied to my case.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 Please try the following configuration.

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': True, 'per_tensor': False, 'rewrite_graph': True})

converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=True, quantize_target_type='int8')

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Yes, it works this way and the resulted QAT tflite seems to always hold weights in int8 data type.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 As far as I know, the model with u8 weights or inputs doesn't support per-channel quantization.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

I got it. But, based on the recent version, I encounterd following error when doing symmetric per-channel QAT which is fine with old version:

WARNING (tinynn.converter.base) Symmetric quantized model with uint8 is unsupported in most backends of TFLite
Traceback (most recent call last):
File "main.py", line 243, in
main_quant(opt)
File "main.py", line 234, in main_quant
converter.convert()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/base.py", line 285, in convert
self.init_operations()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/base.py", line 250, in init_operations
converter.parse(node, attrs, args, self.common_graph)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/quantized.py", line 116, in parse
self.parse_common(graph_converter)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/quantized.py", line 74, in parse_common
weight_tensor = self.create_attr_tensor(weight)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/torch/base.py", line 198, in create_attr_tensor
return tfl.Tensor(tensor, name, has_buffer=True, asymmetric=self.asymmetric, q_type=self.q_type)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+5fe672e315215dc3914d264d5f7756d783b5addb-py3.6.egg/tinynn/converter/operators/tflite/base.py", line 198, in init
asym_s8_offset = tensor.q_zero_point()
RuntimeError: Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I wonder if something wrong degrades the latest codes? My config is:

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "qnnpack", 'force_overwrite': True, 'asymmetric': False, 'per_tensor': False, 'rewrite_graph': True})

converter = TFLiteConverter(qat_model, dummy_input, tflite_path='out' + '/qat_model.tflite', asymmetric=False)

I wonder if the usage is changed or not. Above config works well for old version.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

One thing needs to be pointed out that I used the exactly same outputs for training and inference for above experiment.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Looks like that I need to set quantize_target_type=int8 explicitly which is not a must before.

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

Looks like that I need to set quantize_target_type=int8 explicitly which is not a must before.

Yes, the changes in 9b656ce are not backward compatible. You have to do it now when you use per-channel quantization.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Thanks. Another question that may not be related with the issue. I notice 'Dequantize' nodes exist for some QAT tflite models while the converted tflite model via tinynn does not.

I just wonder how can this happen?

from tinyneuralnetwork.

peterjc123 commented on May 1, 2024

@liamsun2019 Just curious how you get the model? It seems that you use onnx2tf and then TFLiteConverter from official TF to get the first model.
As for the first model, you convert via dynamic range quantization, in which it tries to quantized all weights and biases. But since Conv2D is not a op that supports this kind of inference (it's called Hybrid kernels internally), they will be converted back to floating point. That's why you see the dequantize nodes in the graph. So it only reduces the size of the model.
For the second model, it's converted via quantization-aware training. As you can see, the weights and biases are quantized, so they will actually go through the quantized kernels, so you are likely to see a speedup in model inference.

from tinyneuralnetwork.

liamsun2019 commented on May 1, 2024

Big thanks for your detailed illustration. I made a mistake when introducing the 2 graphs. In fact, the 1st graph comes from a tflite model that's not in QAT representation but I took for that it's QAT.

from tinyneuralnetwork.

Add extra output for inference about tinyneuralnetwork HOT 44 CLOSED

Comments (44)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs