Hi, sorry to bother. This issue may not be directly related to tinynn. The scenario on

-----------before----------- <div class="snippet-clipboard-content notranslate pos

I modified the deion file of the quantized model as the way you suggested. The t

For an int8 QAT model, the dequantized weights/bias are different from the original model about tinyneuralnetwork HOT 24 CLOSED

alibaba commented on May 2, 2024

For an int8 QAT model, the dequantized weights/bias are different from the original model

from tinyneuralnetwork.

Comments (24)

dinghuanghao commented on May 2, 2024 1

For example, in the BN layer, even if requires_grad is false, the internal statistical variables(running_mean, running_var) will still change. You must set layer.eval(). You can add this and try again.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024 1

Not yet. I can try that later.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Actually, I tried the approach you just mentioned but the situation remained unchanged, i,e the weights/bias are different after dequantization. Is there an alternative way to achieve that? Thanks.

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

After calling our quantizer, a quantized model code will be generated. You need to move the operators that do not need to be quantized outside of QuantStub() and DeQuantStub()

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

-----------before-----------

  def forward(self, input_1):
        fake_quant_0 = self.fake_quant_0(input_1) # quantize
        model_0_0 = self.model_0_0(fake_quant_0) # qat
        model_0_1 = self.model_0_1(model_0_0) # qat
        model_0_2 = self.model_0_2(model_0_1) # qat
        model_1_0 = self.model_1_0(model_0_2) # qat
        fake_dequant_0 = self.fake_dequant_0(model_1_0)  # dequantize

-----------after-----------

  def forward(self, input_1):
        model_0_0 = self.model_0_0(input_1)   # float32
        fake_quant_0 = self.fake_quant_0(model_0_0) # quantize
        model_0_1 = self.model_0_1(fake_quant_0) # qat
        model_0_2 = self.model_0_2(model_0_1) # qat
        model_1_0 = self.model_1_0(model_0_2) # qat
        fake_dequant_0 = self.fake_dequant_0(model_1_0) # dequantize

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Thanks for your suggestions. Based on your sample, my understanding is that, except the layers that need to be fine-tuned, the other layers remain non-qat, i.e, the full-precision/float32. Then the generated model will be a hybrid QAT one as a result, right?

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I modified the description file of the quantized model as the way you suggested. The training goes normally while the final conversion to tflite fails with the following errors:

converter.convert()

File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+858a132143b7f4c6f3e1bfc95084a29e37ea8ef7-py3.6.egg/tinynn/converter/base.py", line 260, in convert
self.init_jit_graph()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+858a132143b7f4c6f3e1bfc95084a29e37ea8ef7-py3.6.egg/tinynn/converter/base.py", line 79, in init_jit_graph
script = torch.jit.trace(self.model, self.dummy_input)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 742, in trace
_module_class,
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 940, in trace_module
_force_outplace,
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
File "out/movenet_qat.py", line 182, in forward
backbone_body_0_1 = self.backbone_body_0_1(backbone_body_0_0)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/modules/conv.py", line 332, in forward
input, self._packed_params, self.scale, self.zero_point)
RuntimeError: Could not run 'quantized::conv2d.new' with arguments from the 'CPU' backend. 'quantized::conv2d.new' is only available for these backends: [QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].
movenet_qat.zip

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

When you manually modify the model, you need to set rewrite_graph = false when initializing the quantizer.

quantizer = QATQuantizer(model, dummy_input, config={'rewrite_graph': False}, work_dir='out')

This will use your manual mixed precision model instead of full quantization(We will make this check automated in the future).

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

Yes，operators outside the range of FakeQuant and FakeDequant will be float 32, and a mixed precision model will be generated after convert.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I set rewrite_graph=False after my modifications and the following error rises up during conversion:
File "main.py", line 227, in main_quant
converter.convert()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+858a132143b7f4c6f3e1bfc95084a29e37ea8ef7-py3.6.egg/tinynn/converter/base.py", line 260, in convert
self.init_jit_graph()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+858a132143b7f4c6f3e1bfc95084a29e37ea8ef7-py3.6.egg/tinynn/converter/base.py", line 79, in init_jit_graph
script = torch.jit.trace(self.model, self.dummy_input)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 742, in trace
_module_class,
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 940, in trace_module
_force_outplace,
RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tuple instead. for dict, use a NamedTuple instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.

The original model (before being quantized) does return a dictionary. But above error does not appear before my modifications to the quantized model description. No idea what's the root cause.

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

Tomorrow I will upload a demo of mixed precision quantization :)

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

The mixed precision QAT example has been added, please update to the latest tinynn.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Yes, I just updated and would try it out.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I modified my codes accordingly. But the same error still exists:
File "main.py", line 227, in main_quant
converter.convert()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+a60ddb1432bfd81eebd2d8d232e3a67bb6d2c434-py3.6.egg/tinynn/converter/base.py", line 260, in convert
self.init_jit_graph()
File "/usr/local/lib/python3.6/dist-packages/TinyNeuralNetwork-0.1.0+a60ddb1432bfd81eebd2d8d232e3a67bb6d2c434-py3.6.egg/tinynn/converter/base.py", line 79, in init_jit_graph
script = torch.jit.trace(self.model, self.dummy_input)
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 742, in trace
_module_class,
File "/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py", line 940, in trace_module
_force_outplace,
RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tuple instead. for dict, use a NamedTuple instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.

code snippet:
fake_quant_0 = self.fake_quant_0(backbone_fpn_layer_blocks_0_conv_2)
#hm_hp_0 = self.hm_hp_0(backbone_fpn_layer_blocks_0_conv_2)
hm_hp_0 = self.hm_hp_0(fake_quant_0)
hm_hp_1 = self.hm_hp_1(hm_hp_0)
hm_hp_2 = self.hm_hp_2(hm_hp_1)
hm_hp_3 = self.hm_hp_3(hm_hp_2)
fake_dequant_0 = self.fake_dequant_0(hm_hp_3)

movenet_qat.zip

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

Please give me the original generated model file.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

FYR
test.zip

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

Can you run example correctly?

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

As for the example, I have not tested it yet since I have no cifar10 dataset currently(I can download it anyway). I don't find out any fatal issues in my attached py script either. So I'm still unsure what's the possible reason for the conversion error.

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

If you don’t have the cifar10 dataset, just use dummy_input to inference once.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I run the mixed qat example after some minor modification(path, batchsize etc.). The training goes well and the conversion is successful. Could you help point out any possible buggy codes in my attached file? I guess the big difference is that my model returns an output containing QAT and non-QAT while your model just returns QAT in your example.

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

from tinyneuralnetwork.

dinghuanghao commented on May 2, 2024

This is a bug caused by graph tracer, we will fix it soon

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Thanks for quick feedback and look forward to your fix.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@liamsun2019 A fix has been uploaded to remedy that. BTW, I updated the model description script for you, see movenet_qat.py.zip. For arithmetic operations to happen in non-quantized computation graph, it is important to write them back to torch.foo variants from the torch.nn.quantized.FloatFunctional().foo ones.

from tinyneuralnetwork.

For an int8 QAT model, the dequantized weights/bias are different from the original model about tinyneuralnetwork HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs