Comments (19)
@liamsun2019 As we all know, quantization is not lossless. I think it's pointless to perform such kind of comparison. There will certainly be some differences between the results of the quantized kernels and that of the floating kernels. For example, take the subgraph in the following picture as an example.
This is common problem of imbalanced scale values of the two operands in the add operator. Ideally, they should be close. Otherwise, the one with the smaller scale value will somehow be ignored to some extent. Since symmetric quantization is applied here, I suggest that you try out the asymmetric quantization because it contains the offset values, so it can handle biased distributions better.
from tinyneuralnetwork.
Actually, I did train an asymmetric per-channel QAT network based on the same source model. But the resulted tflite model has exactly the same weights/bias with symmetric per-channel.
from tinyneuralnetwork.
@liamsun2019 Are you sure you use the following config for the quantizer?
quantizer = QATQuantizer(model, dummy_input, config={'asymmetric': True, 'per_tensor': False, ...})
from tinyneuralnetwork.
I will double check that. The experiments were done a few days ago, perhaps not basing on the recent version.
from tinyneuralnetwork.
I just tried an asymmetric per-channel QAT model. It turns out that some ops really have different scale/zero point. For instance, the add ops you illustrated. But the inference results are much worse than symmetric per-channel QAT model.
Here I summarize the scenario. I intend to finetune a pretrained full-precision model which works well. I freeze most layers and only train a few other layers. Take A as the tflite model with asymmetric per-channel and B as the tflite model with symmetric per-channel.
- A and B have the same weights for all the layers while the bias are different.
- A and B have different scale/zeropoint for some ops such as add.
- The dequantized onnx models have the same weights/bias and both work well.
- A has much worse inference results than B.
Item 1,2,3 are what's expected and should make sense. But item4 looks abnormal. I suppose A should be better. Any suggestions?
from tinyneuralnetwork.
@liamsun2019 One thing is weird in your model. Actually I've set quant_min, quant_max as -127 and 127, but you can still see -128 in the weights.
from tinyneuralnetwork.
Yes, I also found -128 in some weights. I notice that you really set quant_min=-127 and quant_max=127 in quantizer.py. My understanding is it's to avoid the risk of overflow. So what's the possible cause for this case?
from tinyneuralnetwork.
Looks like we could not set quant_min and quant_max this way, the observer has its own logic for re-calculating them. https://github.com/pytorch/pytorch/blob/4a8d4cde6589178e989db89d576108ba6d3e6e9a/torch/ao/quantization/utils.py#L192
from tinyneuralnetwork.
With e35ef92, the weights generated are within the range [-127, 127]. Would you please try again? BTW, I'm just curious how does the model perform during QAT training?
from tinyneuralnetwork.
If things still don't work out with that patch, you may try bisecting the model, which should be fairly easy since you have the model descriptive script there.
from tinyneuralnetwork.
I'm on the way. Meanwhile, what's the meaning of 'bisecting', could you please explain it in more detail?
from tinyneuralnetwork.
Suppose you have the following model description file, you may return the intermediate tensors (e.g. a
or b
) instead of the original ones, so that you could figure out which part of the model is not working.
class Model(nn.Module):
def forward(self, x):
a = self.a(x)
b = self.b(a)
....
z = self.b(z)
return z
from tinyneuralnetwork.
My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.
from tinyneuralnetwork.
My experiment shows no -128 weights anymore for asymmetric per-channel case. Not sure if this is what you said "work out with that patch". Similar to former experiments, asymmetric per-channel performs much worse inference results compared to symmetric per-channel.
See #25 (comment). You may try bisecting to figure out which layer leads to accuracy loss.
from tinyneuralnetwork.
OK. Just confirm 'bisecting' should be done against the original model or the quanatized (handled by QATQuantizer.quantize ) model ?
from tinyneuralnetwork.
You can just do it on a trained quantized model. Just load the state dict with strict=False
and it will be fine.
from tinyneuralnetwork.
Before conducting the experiments, I have a few more questions since I am still not very clear about the point you mentioned
Suppose you have the following model description file, you may return the intermediate tensors (e.g.
a
orb
) instead of the original ones, so that you could figure out which part of the model is not working.class Model(nn.Module): def forward(self, x): a = self.a(x) b = self.b(a) .... z = self.b(z) return z
My current dilemma is that the int8 per-channel QAT tflite model has bad inference results compared to the de-quantized onnx model and u8 per-channel QAT even worse. Is the above sample for debugging the issue about which parts of the model contribute the most quantization accuracy loss?
from tinyneuralnetwork.
@liamsun2019 Do you have a DingTalk account so that you can join our discussion group? This thread will grow too lengthy if you answer your questions here.
from tinyneuralnetwork.
Sure, I will get that done.
from tinyneuralnetwork.
Related Issues (20)
- Meet Detailed error: Tensor-likes are not close! using TFLiteConverter HOT 2
- [Converter] Need transpose optimization HOT 2
- Float model failed to convert to TFLite
- [converter] map gather(+reshape) ops with seperate consecutive indices to split(unpack) ops
- tinynn.converter module not found! HOT 2
- [CI] several tests for modifier failed
- Whether to support pytorch to keras HOT 1
- TransposeConv wrong shape? HOT 15
- change input to INT8 after converting to tflite HOT 2
- [converter] implement torch's `aten::scaled_dot_product_attention` operator HOT 2
- Request: clamp would be more efficient to go to Bounded Relu than Maximum + Minimum HOT 3
- Do not support PReLU module? HOT 5
- torch.max not working HOT 2
- OneShotChannelPruner results in the miss of some operators HOT 4
- KeyError when executing quantization HOT 5
- PyTorch 转 TFLite 使用 int8 量化 HOT 4
- Does tinynn support following int16 quantization? HOT 1
- jit.trace succeed but tinynn tracer failed HOT 1
- It became larger after converting to tflite model HOT 4
- how to do Post-training integer quantization with int16 activation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tinyneuralnetwork.