zysxmu / intraq Goto Github PK

Pytorch implementation of our paper accepted by CVPR 2022 -- IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

Python 99.96% Shell 0.04%

acceleration compression zero-shot quantization

intraq's Introduction

CVPR 2022 paper - IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper

Requirements

Python >= 3.7.10

Pytorch == 1.7.1

Reproduce results

Stage1: Generate data.

cd data_generate

Please install all required package in requirements.txt.

"--save_path_head" in run_generate_cifar10.sh/run_generate_cifar100.sh is the path where you want to save your generated data pickle.

For cifar10/100

bash run_generate_cifar10.sh
bash run_generate_cifar100.sh

For ImageNet

"--save_path_head" in run_generate.sh is the path where you want to save your generated data pickle.

"--model" in run_generate.sh is the pre-trained model you want (also is the quantized model). You can use resnet18/mobilenet_w1/mobilenetv2_w1.

bash run_generate.sh

Stage2: Train the quantized network

cd ..

Modify "qw" and "qa" in cifar10_resnet20.hocon/cifar100_resnet20.hocon/imagenet.hocon to select desired bit-width.
Modify "dataPath" in cifar10_resnet20.hocon/cifar100_resnet20.hocon/imagenet.hocon to the real dataset path (for construct the test dataloader).
Modify the "Path_to_data_pickle" in main_direct.py (line 122 and line 135) to the data_path and label_path you just generate from Stage1.
Use the below commands to train the quantized network. Please note that the model that generates the data and the quantized model should be the same.

For cifar10/100

python main_direct.py --model_name resnet20_cifar10 --conf_path cifar10_resnet20.hocon --id=0

python main_direct.py --model_name resnet20_cifar100 --conf_path cifar100_resnet20.hocon --id=0

For ImageNet, you can choose the model by modifying "--model_name" (resnet18/mobilenet_w1/mobilenetv2_w1)

python main_direct.py --model_name resnet18 --conf_path imagenet.hocon --id=0

Evaluate pre-trained models

The pre-trained models and corresponding logs can be downloaded here

Please make sure the "qw" and "qa" in *.hocon, *.hocon, "--model_name" and "--model_path" are correct.

For cifar10/100

python test.py --model_name resnet20_cifar10 --model_path path_to_pre-trained model --conf_path cifar10_resnet20.hocon

python test.py --model_name resnet20_cifar100 --model_path path_to_pre-trained model --conf_path cifar100_resnet20.hocon

For ImageNet

python test.py --model_name resnet18/mobilenet_w1/mobilenetv2_w1 --model_path path_to_pre-trained model --conf_path imagenet.hocon

Results of pre-trained models are shown below:

Model	Bit-width	Dataset	Top-1 Acc.
resnet18	W4A4	ImageNet	66.47%
resnet18	W5A5	ImageNet	69.94%
mobilenetv1	W4A4	ImageNet	51.36%
mobilenetv1	W5A5	ImageNet	68.17%
mobilenetv2	W4A4	ImageNet	65.10%
mobilenetv2	W5A5	ImageNet	71.28%
resnet-20	W3A3	cifar10	77.07%
resnet-20	W4A4	cifar10	91.49%
resnet-20	W3A3	cifar100	64.98%
resnet-20	W4A4	cifar100	48.25%

intraq's People

Contributors

Stargazers

Watchers

Forkers

666dzy666

intraq's Issues

How to reproduce the 3w3a 77.07% on CIFAR-10 ? Why S loss is nan??

In my experiments, I can only get 72.70% on 3w3a CIFAR-10, which is far from the result reported in the paper

Confusion about step_S settings

Thanks for your work!
You mentioned in the paper that "Both learning rates are decayed by 0.1 every 100 fine-tuning epochs." But I noticed that in the code, you set the LR tuning strategy to [20,40,60] for CIFAR10. Looking forward to some explanation from you on this.
Thank you again.

Results in paper with DSG method

Hi, thanks for your work!
I was wondering why the result of DSG in the Table.3 of your paper is lower than ZeroQ? Since in their paper they claim that DSG performs better than ZeroQ. As far as I know, there was no official open-source code for DSG, but I noticed in your paper (Section 4.1) that you used open-source code. So, did you reproduce DSG personally or had I misunderstood something? Maybe there's something wrong with DSG?

Thanks for your reply

Why the accuracy of ZeroQ is much higher than the result reported in GDFQ？

Hello!
I was wondering why the result of ZeroQ you reported in the Table.1 of your paper (W4A4, 60.68%on ImageNet) is so high? It's even higher than GDFQ. I clone ZeroQ's code and the result is similar to GDFQ's paper(~26%). Besides, since ZeroQ's synthetic data don't have its label(without IL), how do you perform fine-tuning on 4-bit ResNet18(caption of Table.1)?
Looking forward to your reply! Thx!

Questions on marginal distance constraints in the code

Thanks for sharing your code. It was very helpful in understanding your interesting work.
I have some questions about your code.

For generating the data for cifar100, it seems that the marginal distance constraints (loss_cosineDistance and loss_cosineDistance_upper in distill_data.py) are both 0 during the data generation process.
Is this a code error or am I missing something?

Also, could you explain why you used 1-CosineSimilarilty instead of CosineSimilarity in your code?

Thanks in advance!

Comparison with previous data-free quantization methods

Thanks for your good work on data-free quantization and the release of the code.

Zero-shot adversarial quantization(ZAQ) is also a data-free quantization method with adversarial exploration. I notice that both IntraQ and AIT are not compared to the ZAQ method. Are they solving different problems or having different experiment settings?

Thanks for your tips.

About generate_data.py error warning

The question of below is sloved. mkdir hardsample.

Question about GAN in training process.

Thanks for your work！I notice that in network fine-tuning stage，you used a GAN network to generate data and used the data to train. But this was not mentioned in the paper. If you have time to explain the reason, I'd be very grateful.

About Generator

Hi, thanks for your work!
It seems that there is a generator being trained in trainer_direct.py, while the paper says the data for fine-tuning is obtained by optimizing Gaussian noise but not a generator. I am not familiar with zero-shot quantization, and this makes me confused. Could you please help me figure it out? Thanks!

zysxmu / intraq Goto Github PK

intraq's Introduction

CVPR 2022 paper - IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper

Requirements

Reproduce results

Stage1: Generate data.

Stage2: Train the quantized network

Evaluate pre-trained models

intraq's People

Contributors

Stargazers

Watchers

Forkers

intraq's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs