I got a QAT int8 per-channel tflite model. To check the accuracy, I compare the infere

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

With <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://g

outputs are different between a QAT tflite and corresponding de-quantized onnx model about tinyneuralnetwork HOT 19 CLOSED

alibaba commented on May 2, 2024

outputs are different between a QAT tflite and corresponding de-quantized onnx model

from tinyneuralnetwork.

Comments (19)

peterjc123 commented on May 2, 2024

@liamsun2019 As we all know, quantization is not lossless. I think it's pointless to perform such kind of comparison. There will certainly be some differences between the results of the quantized kernels and that of the floating kernels. For example, take the subgraph in the following picture as an example.

This is common problem of imbalanced scale values of the two operands in the add operator. Ideally, they should be close. Otherwise, the one with the smaller scale value will somehow be ignored to some extent. Since symmetric quantization is applied here, I suggest that you try out the asymmetric quantization because it contains the offset values, so it can handle biased distributions better.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Actually, I did train an asymmetric per-channel QAT network based on the same source model. But the resulted tflite model has exactly the same weights/bias with symmetric per-channel.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@liamsun2019 Are you sure you use the following config for the quantizer?

quantizer = QATQuantizer(model, dummy_input, config={'asymmetric': True, 'per_tensor': False, ...})

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I will double check that. The experiments were done a few days ago, perhaps not basing on the recent version.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I just tried an asymmetric per-channel QAT model. It turns out that some ops really have different scale/zero point. For instance, the add ops you illustrated. But the inference results are much worse than symmetric per-channel QAT model.

Here I summarize the scenario. I intend to finetune a pretrained full-precision model which works well. I freeze most layers and only train a few other layers. Take A as the tflite model with asymmetric per-channel and B as the tflite model with symmetric per-channel.

A and B have the same weights for all the layers while the bias are different.
A and B have different scale/zeropoint for some ops such as add.
The dequantized onnx models have the same weights/bias and both work well.
A has much worse inference results than B.

Item 1,2,3 are what's expected and should make sense. But item4 looks abnormal. I suppose A should be better. Any suggestions?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@liamsun2019 One thing is weird in your model. Actually I've set quant_min, quant_max as -127 and 127, but you can still see -128 in the weights.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Yes, I also found -128 in some weights. I notice that you really set quant_min=-127 and quant_max=127 in quantizer.py. My understanding is it's to avoid the risk of overflow. So what's the possible cause for this case?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

Looks like we could not set quant_min and quant_max this way, the observer has its own logic for re-calculating them. https://github.com/pytorch/pytorch/blob/4a8d4cde6589178e989db89d576108ba6d3e6e9a/torch/ao/quantization/utils.py#L192

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

With e35ef92, the weights generated are within the range [-127, 127]. Would you please try again? BTW, I'm just curious how does the model perform during QAT training?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

If things still don't work out with that patch, you may try bisecting the model, which should be fairly easy since you have the model descriptive script there.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

I'm on the way. Meanwhile, what's the meaning of 'bisecting', could you please explain it in more detail?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

Suppose you have the following model description file, you may return the intermediate tensors (e.g. a or b) instead of the original ones, so that you could figure out which part of the model is not working.

class Model(nn.Module):
    def forward(self, x):
        a = self.a(x)
        b = self.b(a)
        ....
        z = self.b(z)
        return z

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.

See #25 (comment). You may try bisecting to figure out which layer leads to accuracy loss.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

OK. Just confirm 'bisecting' should be done against the original model or the quanatized (handled by QATQuantizer.quantize ) model ?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

You can just do it on a trained quantized model. Just load the state dict with strict=False and it will be fine.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Before conducting the experiments, I have a few more questions since I am still not very clear about the point you mentioned

Suppose you have the following model description file, you may return the intermediate tensors (e.g. a or b) instead of the original ones, so that you could figure out which part of the model is not working.
class Model(nn.Module):
    def forward(self, x):
        a = self.a(x)
        b = self.b(a)
        ....
        z = self.b(z)
        return z

My current dilemma is that the int8 per-channel QAT tflite model has bad inference results compared to the de-quantized onnx model and u8 per-channel QAT even worse. Is the above sample for debugging the issue about which parts of the model contribute the most quantization accuracy loss?

from tinyneuralnetwork.

peterjc123 commented on May 2, 2024

@liamsun2019 Do you have a DingTalk account so that you can join our discussion group? This thread will grow too lengthy if you answer your questions here.

from tinyneuralnetwork.

liamsun2019 commented on May 2, 2024

Sure, I will get that done.

from tinyneuralnetwork.

outputs are different between a QAT tflite and corresponding de-quantized onnx model about tinyneuralnetwork HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs