GithubHelp home page GithubHelp logo

Comments (19)

peterjc123 avatar peterjc123 commented on May 2, 2024

@liamsun2019 As we all know, quantization is not lossless. I think it's pointless to perform such kind of comparison. There will certainly be some differences between the results of the quantized kernels and that of the floating kernels. For example, take the subgraph in the following picture as an example.
image
This is common problem of imbalanced scale values of the two operands in the add operator. Ideally, they should be close. Otherwise, the one with the smaller scale value will somehow be ignored to some extent. Since symmetric quantization is applied here, I suggest that you try out the asymmetric quantization because it contains the offset values, so it can handle biased distributions better.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

Actually, I did train an asymmetric per-channel QAT network based on the same source model. But the resulted tflite model has exactly the same weights/bias with symmetric per-channel.

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

@liamsun2019 Are you sure you use the following config for the quantizer?

quantizer = QATQuantizer(model, dummy_input, config={'asymmetric': True, 'per_tensor': False, ...})

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

I will double check that. The experiments were done a few days ago, perhaps not basing on the recent version.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

I just tried an asymmetric per-channel QAT model. It turns out that some ops really have different scale/zero point. For instance, the add ops you illustrated. But the inference results are much worse than symmetric per-channel QAT model.

Here I summarize the scenario. I intend to finetune a pretrained full-precision model which works well. I freeze most layers and only train a few other layers. Take A as the tflite model with asymmetric per-channel and B as the tflite model with symmetric per-channel.

  1. A and B have the same weights for all the layers while the bias are different.
  2. A and B have different scale/zeropoint for some ops such as add.
  3. The dequantized onnx models have the same weights/bias and both work well.
  4. A has much worse inference results than B.

Item 1,2,3 are what's expected and should make sense. But item4 looks abnormal. I suppose A should be better. Any suggestions?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

@liamsun2019 One thing is weird in your model. Actually I've set quant_min, quant_max as -127 and 127, but you can still see -128 in the weights.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

Yes, I also found -128 in some weights. I notice that you really set quant_min=-127 and quant_max=127 in quantizer.py. My understanding is it's to avoid the risk of overflow. So what's the possible cause for this case?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

Looks like we could not set quant_min and quant_max this way, the observer has its own logic for re-calculating them. https://github.com/pytorch/pytorch/blob/4a8d4cde6589178e989db89d576108ba6d3e6e9a/torch/ao/quantization/utils.py#L192

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

With e35ef92, the weights generated are within the range [-127, 127]. Would you please try again? BTW, I'm just curious how does the model perform during QAT training?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

If things still don't work out with that patch, you may try bisecting the model, which should be fairly easy since you have the model descriptive script there.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

I'm on the way. Meanwhile, what's the meaning of 'bisecting', could you please explain it in more detail?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

Suppose you have the following model description file, you may return the intermediate tensors (e.g. a or b) instead of the original ones, so that you could figure out which part of the model is not working.

class Model(nn.Module):
    def forward(self, x):
        a = self.a(x)
        b = self.b(a)
        ....
        z = self.b(z)
        return z

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.

See #25 (comment). You may try bisecting to figure out which layer leads to accuracy loss.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

OK. Just confirm 'bisecting' should be done against the original model or the quanatized (handled by QATQuantizer.quantize ) model ?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

You can just do it on a trained quantized model. Just load the state dict with strict=False and it will be fine.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

Before conducting the experiments, I have a few more questions since I am still not very clear about the point you mentioned

Suppose you have the following model description file, you may return the intermediate tensors (e.g. a or b) instead of the original ones, so that you could figure out which part of the model is not working.

class Model(nn.Module):
    def forward(self, x):
        a = self.a(x)
        b = self.b(a)
        ....
        z = self.b(z)
        return z

My current dilemma is that the int8 per-channel QAT tflite model has bad inference results compared to the de-quantized onnx model and u8 per-channel QAT even worse. Is the above sample for debugging the issue about which parts of the model contribute the most quantization accuracy loss?

from tinyneuralnetwork.

peterjc123 avatar peterjc123 commented on May 2, 2024

@liamsun2019 Do you have a DingTalk account so that you can join our discussion group? This thread will grow too lengthy if you answer your questions here.

from tinyneuralnetwork.

liamsun2019 avatar liamsun2019 commented on May 2, 2024

Sure, I will get that done.

from tinyneuralnetwork.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.