Comments (6)
You can switch to per-channel QAT by switching the backend to fbgemm. But current model converter does not support converting per-channel quantized models to tflite.
quantizer = QATQuantizer(model, dummy_input, work_dir='out', config={'backend': "fbgemm"})
from tinyneuralnetwork.
I tried 3 approaches to achieve per channel QAT:
- backend: qnnpack
Minor modifications to following code lines:
if not self.asymmetric:
sym_fq = torch_q.FakeQuantize.with_args(observer=torch_q.MovingAverageMinMaxObserver, quant_min=-128, quant_max=127, dtype=torch.qint8, qscheme=torch.per_channel_symmetric, reduce_range=False)
Errors rise up during the conversion:
File "out/movenet_qat.py", line 158, in forward
backbone_body_0_1 = self.backbone_body_0_1(backbone_body_0_0)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 332, in forward
input, self._packed_params, self.scale, self.zero_point)
RuntimeError: expected scalar type QUInt8 but found QInt8
- backend: qnnpack
if not self.asymmetric:
sym_fq = torch_q.FakeQuantize.with_args(observer=torch_q.MovingAverageMinMaxObserver, quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_channel_symmetric, reduce_range=False)
Different errors reporting:
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/base.py", line 264, in convert
self.init_operations()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/base.py", line 231, in init_operations
converter.parse(node, attrs, args, self.common_graph)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/aten.py", line 479, in parse
self.elementwise_unary(tfl.QuantizeOperator, graph_converter)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/base.py", line 244, in elementwise_unary
outputs = self.to_tfl_tensors(self.output_names, self.output_tensors)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/torch/base.py", line 155, in to_tfl_tensors
t = tfl.Tensor(t, n, has_buffer=non_existent_as_buffer, asymmetric=self.asymmetric)
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+3135b58d66119d4580663dcaad444b7809afeaab-py3.6.egg/tinynn/converter/operators/tflite/base.py", line 166, in init
assert tensor.q_zero_point() == 128
- backend: fbgemm
Fails with following error:
File "main.py", line 196, in main_quant
qat_model = torch.quantization.convert(qat_model)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 414, in convert
_convert(module, mapping, inplace=True)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 459, in _convert
reassign[name] = swap_module(mod, mapping)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/quantize.py", line 485, in swap_module
new_mod = mapping[type(mod)].from_float(mod)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 368, in from_float
return cls.get_qconv(mod, activation_post_process, weight_post_process)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 153, in get_qconv
weight_post_process(mod.weight)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/quantization/fake_quantize.py", line 100, in forward
self.ch_axis, self.quant_min, self.quant_max)
RuntimeError: dimensions of scale and zero-point are not consistent with input tensor
My question is:
- These above errors could be thought of as "known issues", right?
- Do you have milestone to support per-channel QAT conversion to tflite?
- Is there any alternative way to perform the conversion (QAT pytorch pth to tflite)?
Thanks for your time.
from tinyneuralnetwork.
All of the above tests are int8 quantized.
from tinyneuralnetwork.
- The fbgemm mode of pytorch only supports uint8, so the first configuration does not work properly
- The second configuration can complete model training, but the current model converter does not support it.
- The third configuration can be run in our code, there may be problems in use
We have understood your needs, if no serious problems are encountered, we can support this feature within two weeks :)
from tinyneuralnetwork.
Got it. Thanks for your comments. Look forward to your support for this feature.
from tinyneuralnetwork.
@liamsun2019 Per-channel quantization is supported now. Maybe you could have a try.
from tinyneuralnetwork.
Related Issues (20)
- Meet Detailed error: Tensor-likes are not close! using TFLiteConverter HOT 2
- [Converter] Need transpose optimization HOT 2
- Float model failed to convert to TFLite
- [converter] map gather(+reshape) ops with seperate consecutive indices to split(unpack) ops
- tinynn.converter module not found! HOT 2
- [CI] several tests for modifier failed
- Whether to support pytorch to keras HOT 1
- TransposeConv wrong shape? HOT 15
- change input to INT8 after converting to tflite HOT 2
- [converter] implement torch's `aten::scaled_dot_product_attention` operator HOT 2
- Request: clamp would be more efficient to go to Bounded Relu than Maximum + Minimum HOT 3
- Do not support PReLU module? HOT 5
- torch.max not working HOT 2
- OneShotChannelPruner results in the miss of some operators HOT 4
- KeyError when executing quantization HOT 5
- PyTorch 转 TFLite 使用 int8 量化 HOT 4
- Does tinynn support following int16 quantization? HOT 1
- jit.trace succeed but tinynn tracer failed HOT 1
- It became larger after converting to tflite model HOT 4
- how to do Post-training integer quantization with int16 activation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tinyneuralnetwork.