Hi, I check the QAT model based on tinynn and it seems that the QAT is done with per-t

I tried 3 approaches to achieve per channel QAT: backend: qnnp

The fbgemm mode of pytorch only supports uint8, so the first configuration does

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to perform per-channel pat? about tinyneuralnetwork HOT 6 CLOSED

alibaba commented on May 2, 2024

How to perform per-channel pat?

from tinyneuralnetwork.

Comments (6)

dinghuanghao commented on May 2, 2024

You can switch to per-channel QAT by switching the backend to fbgemm. But current model converter does not support converting per-channel quantized models to tflite.

quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "fbgemm"})

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I tried 3 approaches to achieve per channel QAT:

backend: qnnpack
Minor modifications to following code lines:
if not self.asymmetric:
sym_fq = torch_q.FakeQuantize.with_args(observer=torch_q.MovingAverageMinMaxObserver, quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, reduce_range=False)

Errors rise up during the conversion:
File "out/movenet_qat.py", line 158, in forward
backbone_body_0_1 = self.backbone_body_0_1(backbone_body_0_0)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 332, in forward
input, self._packed_params, self.scale, self.zero_point)
RuntimeError: expected scalar type QUInt8 but found QInt8

backend: qnnpack
if not self.asymmetric:
sym_fq = torch_q.FakeQuantize.with_args(observer=torch_q.MovingAverageMinMaxObserver, quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_channel_symmetric, reduce_range=False)

Different errors reporting:
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/base.py", line 264, in convert
self.init_operations()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/base.py", line 231, in init_operations
converter.parse(node, attrs, args, self.common_graph)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/aten.py", line 479, in parse
self.elementwise_unary(tfl.QuantizeOperator, graph_converter)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/base.py", line 244, in elementwise_unary
outputs = self.to_tfl_tensors(self.output_names, self.output_tensors)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/base.py", line 155, in to_tfl_tensors
t = tfl.Tensor(t, n, has_buffer=non_existent_as_buffer, asymmetric=self.asymmetric)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/tflite/base.py", line 166, in init
assert tensor.q_zero_point() == 128

backend: fbgemm
Fails with following error:
File "main.py", line 196, in main_quant
qat_model = torch.quantization.convert(qat_model)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 414, in convert
_convert(module, mapping, inplace=True)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 459, in _convert
reassign[name] = swap_module(mod, mapping)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 485, in swap_module
new_mod = mapping[type(mod)].from_float(mod)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 368, in from_float
return cls.get_qconv(mod, activation_post_process, weight_post_process)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 153, in get_qconv
weight_post_process(mod.weight)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/fake_quantize.py", line 100, in forward
self.ch_axis, self.quant_min, self.quant_max)
RuntimeError: dimensions of scale and zero-point are not consistent with input tensor

My question is:

These above errors could be thought of as "known issues", right?
Do you have milestone to support per-channel QAT conversion to tflite?
Is there any alternative way to perform the conversion (QAT pytorch pth to tflite)?

Thanks for your time.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

All of the above tests are int8 quantized.

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

The fbgemm mode of pytorch only supports uint8, so the first configuration does not work properly
The second configuration can complete model training, but the current model converter does not support it.
The third configuration can be run in our code, there may be problems in use

We have understood your needs, if no serious problems are encountered, we can support this feature within two weeks :)

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Got it. Thanks for your comments. Look forward to your support for this feature.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@liamsun2019 Per-channel quantization is supported now. Maybe you could have a try.

from tinyneuralnetwork.

How to perform per-channel pat? about tinyneuralnetwork HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs