yhhhli / apot_quantization Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation for the APoT quantization (ICLR 2020)
PyTorch implementation for the APoT quantization (ICLR 2020)
I use Imagenet to train apot, it's really time-consuming. it seems that a epoch needs 1 day (8 V100). Are there anythings wrong?
Hi~ I have a liitle puzzled about a difference between paper and code in quan_layer.py.
In paper, when B = 4 P0∈{0,2^0,2^-2,2^-4} (show in example in 2.2)
But in code quan_layer.py line 23: base_b.append(2 ** (-2 * i - 2)), when i range(3) we have P0∈{0,2^-2,2^-4,2^-6}
I am a little puzzled about it, thanks a lot!
Thank you for the great contribution!
We are experimenting now with your implementation of QuantConv2d and trying to integrate it for object detection, namely into RetinaNet. Therefore, I would like to ask you some questions to validate my assumptions.
You can directly point out, if I wrote something wrong in my assumptions.
Best regards,
pot量化的优势是位移代替乘法,但是在代码中并没有看到相关模拟位移的过程,请问是如何验证POT量化的硬件友好性?
另外,POT量化的目标是进行位移操作,然而在PC上仍然采用CONV卷积乘法,这样得到的精度可信吗?
Hello, I am doing the quantization work now, I tried to reproduce your results, but unfortunately the results are not the same. The full accuracy reported in your paper is 91.6, but the result in the code is 92.9, which is much larger than the result reported in the paper, but in the 4bit quantization, the paper reports 92.3. It is unfair to compare full precision with quantization. Please confirm whether the results reported in the paper are wrong. In addition, what you report in your paper is the effect of ending the training test set or the result of getting the best test set during the training. It can be seen from the result that your paper is different from the result of the code, please give an explanation. In addition, my repeated results are different, I would like to ask whether the training results with multiple cards and single cards will be much worse, provided that other parameters remain the same. thank you
This is more like a doubt and less like an issue. I am new to this field, so I am unable to understand, why the QuantConv() layers have not been implemented in the Bottleneck class. Only the BasicBlock witnesses the usage of the quantized convolutional layers.
Is there a specific reason for that? Does it affect the accuracy too much?
Dear @yhhhli,
This work looks awesome !
I was wondering if you would be interested in making a PR contribution to PyTorch Lightning.
It could a callback converting users model and apply this quantisation method for training the model.
It would fit really nicely with pruning, fine-tuning.
Best,
T.C
Hi,
Based on the provided pretrained model (res18_2bit), I got 64.690% and the quantized model size is 5MB (gzip) or 3.4MB (7zip). It is quite different from the results in your paper. Can you please point out why is that? I just run: python main.py -a resnet18 --bit 2 --pretrained resnet18_2bit.pth
Thanks
Hello, I have read your papar and your code. I found that in code the Build_power_value matched the describtion in paper. However, the weight_quantization_function passed the (bitwidth-1) value as B in Build_power_value, which means 4-bit quantization levels used 3-bit formula. I wondered whether I missed something in paper or code?
Greetings Authors,
I have just gown throw your paper, trying to understand your concept but am really wondered can I apply the APOT method on MOBILENET V_2?
kindly response.
Hello!
I found that without weight normalization, the network ceases to learn, and the loss is equal to nan. Could you please explain why this is happening and how it can be fixed?
Thanks for ur work. I have a question about uniform_quantization function line 142 is different from 138. And i want to know is that right ?
Thanks for your nice work and codes!
I found that files quant_layer.py in CIFAR10 and ImageNet are different. Can you please tell me why there exit two versions? BTW, I plan to utilize quantlize MobileNet V2. Which version is better for migration? Thank you.
Thanks for your great work!
But Compared with Resnet series for Imagenet, i will be more careful about some small model like mobilenet series or shufflenet series. And have you test the QAT function in some small model, is it useful?
Hello,
Thanks for your contribution in network quantzition field and your opensource code, I meet some problem in training ResNet18 model (quantizing both weights and activation to 4 bit ) on ImageNet dataset, which final best accuracy is about 68.1% . I kept all the hyper-parameters same with the code except for batch_size
due to the GPU capacity, 3 RTX2080Ti are used to training and the batch_size
is set as 196. I wonder if something was wrong in my training and I‘ll appreciate if you can provide pre-trained ResNet-18 model to help finding the problem.
hello, here are the two function of uniform quantization from CIFAR10 and Imagenet:
def uniform_quant(x, b=3): xdiv = x.mul(2 ** b - 1) xhard = xdiv.round().div(2 ** b - 1) return xhard
def uniform_quantization(tensor, alpha, bit, is_weight=True, grad_scale=None): if grad_scale: alpha = gradient_scale(alpha, grad_scale) data = tensor / alpha if is_weight: data = data.clamp(-1, 1) data = data * (2 ** (bit - 1) - 1) data_q = (data.round() - data).detach() + data data_q = data_q / (2 ** (bit - 1) - 1) * alpha else: data = data.clamp(0, 1) data = data * (2 ** bit - 1) data_q = (data.round() - data).detach() + data data_q = data_q / (2 ** (bit - 1) - 1) * alpha return data_q
I want to know why the input first need to multiply (2 ** bit - 1)??
multiply (2 ** bit - 1) then divide (2 ** b - 1) ,is it the number not changed?
thanks
Hi, I have a question for the reported accuracy.
For example, you got 70.75% and 66.46% with 5 bits and 2 bits for ResNet-18 on ImageNet, respectively.
In the paper, however, 70.9% and 67.3% with 5 bits and 2 bits are reported.
Can you explain what has made these differences?
Dear yhhli, could you tell me what is the Hyper-Params of mobilenet_v2 training?
hi, I was trying to train resnet18 on Imagenet with 8bit quantization. However, the loss is always Nan. Tried smaller learning rates and the results are the same. Note I tried running with batch size of 64 and 128 only. Although its low, it seemed to work with 5 bit quantization training. Any ideas why this could be?
I train Resnet18 a4w4 with the following command with the latest repo:
python main.py -a resnet18 --bit 4 --gpu 0 -b 256
The best top1 acc is only 69.01, why it is 1.7% lower (70.7) than the result reported in the paper?
Hi,
Do you have the specific design of the MUL (Multiplication) unit for APOT quantization?
We know that uniform(Int) quantization or POT quantization are friendly to hardware.
Assume that:
R = real number
S = Scale number
T = quantized number
R1 = S1 * T1
R2 = S2 * T2
Uniform quantization simply adopts the INT MUL unit:
T1 = m
T2 = n
So, we have:
R1 * R2 = (S1 * S2) * (m * n)
For POT:
T1 = 2^m
T2 = 2^n
So, we have:
R1 * R2 = (S1 * S2) * (2^m * 2^n)
= (S1 * S2) * 2^(m + n)
The POT is similar to the only-exponent float MUL.
However, for APOT, I have two questions about the MUL design. There are additive elements in the data.
Assume a 4-bit POT:
The first two bits decoder table:
00 | 01 | 10 | 11 |
---|---|---|---|
2^0 | 2^-1 | 2^-3 | 2^-5 |
And the last two bits:
00 | 01 | 10 | 11 |
---|---|---|---|
2^0 | 2^-2 | 2^-4 | 2^-6 |
For the first two bits, the decoder table is not continuous: 0, -1, -3, -5.
Aussume the two number in APOT:
0101: T1 = 2^-1 + 2^-2
1010: T2 = 2^-3 + 2^-4
T1 * T2 = (2^-1 + 2^-2) * (2^-3 + 2^-4)
= (2^-1 * 2^-3) + (2^-1 * 2^-4) + (2^-2 * 2^-3) + (2^-2 * 2^-4)
= 2^-4 + 2^-5 + 2^-5 + 2^-6
Obviously, the calculation has 4x (9x) add operations than POT in 4-bit (6-bit). And the result violates the definition of APOT, which won't have the same additive element in a number, such as 2^-5.
One direct solution is to convert a float with fake quantization. But is it a violation of the principle of quantization?
How did you calculate the number of MAC? Please share the code
I use the official MobilenetV2 in the torchvision.models.
Are there any special tricks to train mobilenet_v2?
i found that maybe in CIFAR10 part main.py ,the resume function was not defined
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.