huawei-noah / efficient-computing Goto Github PK

View Code? Open in Web Editor NEW

1.1K 21.0 198.0 101.12 MB

Efficient computing methods developed by Huawei Noah's Ark Lab

Python 16.88% Shell 0.02% Jupyter Notebook 82.71% CMake 0.03% C++ 0.36%

knowledge-distillation model-compression binary-neural-networks pruning quantization self-supervised

efficient-computing's Introduction

Efficient Computing

This repo is a collection of Efficient-Computing methods developed by Huawei Noah's Ark Lab.

Data-Efficient-Model-Compression is a series of compression methods with no or little training data.
BinaryNetworks: Binary neural networks including AdaBin (ECCV22).
Distillation: Knowledge distillation methods including ManifoldKD (NeurIPS22) and VanillaKD (NeurIPS23).
Pruning: Network pruning methods including GAN-pruning (ICCV19), SCOP (NeurIPS20), ManiDP (CVPR21), and RPG (NeurIPS23).
Quantization: Model quantization methods including DynamicQuant (CVPR22).
Self-supervised: self-supervised learning including FastMIM and LocalMIM (CVPR23).
TrainingAcceleration: Accelerating neural network training via NetworkExpansion (CVPR23).
Detection: Efficient object detectors including Gold-YOLO (NeurIPS23).
LowLevel: Efficient low level vision models including IPG (CVPR24).

efficient-computing's People

Contributors

Stargazers

Watchers

Forkers

hantingchen cnxqscn joelone fengxingxiang xiaoye77 wu-alice mbyase yyht andudu carlfeichen lunwk qq2737499951 febiubiu templeblock liuweiping2020 zhyhy tqdavid mornydew levizhu yunhewang leo-xxx batermj 18106574249 yifei87 tojiajun mingsun-tse rogerzhangzz wtsitp chengaopro yddd2333 tzq2doc xueshang-liulp wqz960 peterzhousz justinb1eber frankblood autogyro liuwenhaha compliceu nkgfirecream xrosliang piaodangdang luthorli andrew-zhu chenwei1024 nononowow pycccleomessi xiaoshingshing dd-guo alwayslj hell-to-heaven framework-learner superzhang1984 qcdaxion robot-ai-machinelearning jawaechan cwkang1998 linhduongtuan zhanzheng8585 tummywang happen2me llmhao lijaiwei20180818 zhyj3038 zyxzju lilujunai banana1024 shuai-xie zhulingling1995 salary-only-17k chengmuni66 wh-forker zumbalamambo jeffgan99 90651698 zhaohui-yang jidezhu zeta1999 zhangfeixiang222 pjirayu xubin1994 roryshao liuhaolinwen wnov mldl qigongsun flamato mszlean zhenyuanlin golden-slumbers-1997 albertipot csyml niurc machinecf takuyasasaki734 mountains-high llbbcc liyaangy liu-zhenhua iamweiweishi

efficient-computing's Issues

Teacher model's performance fluctuates on USPS dataset.

I am trying to do an adaptation task with DAFL. I tried to train the LeNet5 on MNIST, but after I applied it to USPS dataset, a strange thing occurred: the performance of trained LeNet5 varied largely on USPS dataset when I trained LeNet5 with the same hyperparameters for several times, accuracy varied from 40% to 80%. Under this circumstance, it's hard to fix a baseline for adaptation.

It is a little off the topic as DAFL is designed for distillation, but can anyone give me a light on why this happens?

Is this algorithm avaliable for Imagenet?

About DAFL

Efficient-Computing/Data-Efficient-Model-Compression/DAFL/DAFL-train.py

Lines 64 to 68 in c43e133

 self.conv_blocks1 = nn.Sequential( 

 nn.Conv2d(128, 128, 3, stride=1, padding=1), 

 nn.BatchNorm2d(128, 0.8), 

 nn.LeakyReLU(0.2, inplace=True), 

 )

Hi. your work is inspired. I have some questions about a claim in the paper and the code, which are not big deals.

First, in the code. I found that the eps argument of torch.nn.BatchNorm2d is set to 0.8, which is quite bigger than the default value 1e-5. I wanna know whether this setting is marginal, or it helps generator training.

Second, in the paper, Section 3.1 claimed that Since filters in the teacher DNNs have been trained to extract intrinsic patterns in training data, feature maps tend to receive higher activation value if input images are real rather than some random vectors. I am confuse about the connection between the number of L1-norm of $f_T^i$ and the authenticity of input images. Can you explain why or lead me to some references?

That's all. Thanks for your excellent work again and look forward to your reply!

Compute avg_loss wrongly

L199 avg_loss /= len(data_test) should be avg_loss/=len(data_test_loader).
BTW, when will you open source your latest work Learning Student Networks in the Wild?

您好,最近拜读您的DAFL 代码

介绍是无需训练数据的网络压缩技术 DAFL
但我在看代码的时候发现在
DAFL-train.py中有行代码为

teacher = torch.load(opt.teacher_dir + 'teacher').cuda()  # 获得教师网络

而此处的teacher 是加载的 teacher-train.py 中保存的代码,请问是否需要先训练teacher ,然后再根据teacher 训练student

对于这篇论文我的理解是先创建一个大参数网络teacher,然后再定义一个小网络student,再利用一个生成器Generator,将随机数转换为图片格式, 经过teacher和student 两个网络不断拟合结果,让student和teacher的输出尽量一致.最终保存student作为压缩网络.

但是下面的代码我又没看太懂,想哭

        pred = outputs_T.data.max(1)[1]
        loss_activation = -features_T.abs().mean()  # 激活损失
        loss_one_hot = criterion(outputs_T,pred)  # 交叉熵计算损失
        softmax_o_T = torch.nn.functional.softmax(outputs_T, dim=1).mean(dim=0)
        loss_information_entropy = (softmax_o_T * torch.log10(softmax_o_T)).sum()  # 损失信息熵
        loss = loss_one_hot * opt.oh + loss_information_entropy * opt.ie + loss_activation * opt.a  # 损失 加权重

请问对吗?刚刚接触网络压缩,如有不准确,烦请指导下我.
非常感谢

训练速度过慢的问题？

MNIST从原本的10epcoh在DAFL框架下要训练200epoch，CIFAR-10要训练2000epoch。而且GAN的训练也耗时，经过实验比标准KD训练时间长了20-30倍。

About the back-propagation of kd_loss

The kd_loss is used for the training of student network. However, it is also back propagated to the encoder during training, which is not same as the loss function in the paper.

训练CIFAR-10，如果调小batch，lr怎样调节合适？

accuracy一直在50%左右

有关于 DAFL-SR的

您好
哈哈哈，还是我，我没有在您这个仓里找到DAFL-SR的代码，是您还未公布，还是我太蠢，还未找到
十分想拜读一下您的代码，如若您公开，将十分感谢
期待您的代码

Student Network 和 Generator 更新的问题

can't find teacher model

after running python teacher-train.py.There's no model in /cache/models/

I am confusing about the "img = (img + 1) * 0.5" in the DFSR

Thanks for sharing your excellent works. But I cannot figure out the usage about the "img = (img + 1) * 0.5" in the zsnet.py of DFSR

Question about stable accuracy on CIFAR100

Thank you for providing the code for your amazing work! I have some question about the accuracy of experiments on CIFAR 100.

Firstly, I can get similar results of CIFAR-10 with scripts on the README.md. Like my accuracy of teacher is 0.9518, and that of DAFL is 0.9241.

However, I cannot get the accuracy around 74% of DAFL in CIFAR-100 as in your paper. I follow the same parameters in readme. That is,
python DAFL-train.py --dataset cifar100 --channels 3 --n_epochs 2000 --batch_size 1024 --lr_G 0.02 --lr_S 0.1 --latent_dim 1000 --oh 0.05 --ie 20 --act 0.01.

I run the DAFL code twice, the first time I got 53.31% accuracy, while the second time I got 62.39%. Both of them are far away from 74% in the Table. I did use your teacher training script and the accuracy is 0.771, which is a little bit lower than yours. And I did use 1024 batch size here, as I know that a small batch size may cause a huge difference.

The log for my second training is

[Epoch 1/2000] [loss_oh: 1.565716] [loss_ie: -1.733893] [loss_a: -0.346861] [loss_kd: 0.835829]
[Epoch 2/2000] [loss_oh: 1.836624] [loss_ie: -1.873039] [loss_a: -0.330029] [loss_kd: 0.749510]
...
[Epoch 799/2000] [loss_oh: 1.751102] [loss_ie: -1.998407] [loss_a: -0.336999] [loss_kd: 0.165117]
[Epoch 800/2000] [loss_oh: 1.767649] [loss_ie: -1.998450] [loss_a: -0.337684] [loss_kd: 0.163865]
[Epoch 801/2000] [loss_oh: 1.750000] [loss_ie: -1.998442] [loss_a: -0.337093] [loss_kd: 0.126873]
...
[Epoch 1598/2000] [loss_oh: 1.734455] [loss_ie: -1.997838] [loss_a: -0.337273] [loss_kd: 0.084451]
[Epoch 1599/2000] [loss_oh: 1.737263] [loss_ie: -1.998453] [loss_a: -0.337822] [loss_kd: 0.080239]
[Epoch 1600/2000] [loss_oh: 1.697831] [loss_ie: -1.998273] [loss_a: -0.337703] [loss_kd: 0.081537]
...
[Epoch 1997/2000] [loss_oh: 1.739230] [loss_ie: -1.998205] [loss_a: -0.337063] [loss_kd: 0.053484]
[Epoch 1998/2000] [loss_oh: 1.729897] [loss_ie: -1.998376] [loss_a: -0.337201] [loss_kd: 0.051445]
[Epoch 1999/2000] [loss_oh: 1.730324] [loss_ie: -1.997864] [loss_a: -0.337307] [loss_kd: 0.053925]

Also, this model achieved 0.4108 around 800 epochs, and 0.5790 around 1600 epochs.

I run it with Pytorch 1.4 + cuda 10.1.

It seems that the results are not very stable on CIFAR-100. Thus, I would like to know whether this thing is normal on CIFAR-100. And I also want to know whether my 62% accuracy is reasonable.

By the way, could you please share your training log of CIFAR-100 and hyper parameters?

I really appreciate it if you could help me! Look forward to your reply.

训练cifar100的学生网络时，精确度很低，最后保持0.01

[Epoch 181/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000303]
Test Avg. Loss: 19101.246094, Accuracy: 0.010000
[Epoch 182/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000302]
Test Avg. Loss: 19077.779297, Accuracy: 0.010000
[Epoch 183/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000300]
Test Avg. Loss: 19057.632812, Accuracy: 0.010000
[Epoch 184/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000299]
Test Avg. Loss: 19041.076172, Accuracy: 0.010000
[Epoch 185/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000298]
Test Avg. Loss: 19022.105469, Accuracy: 0.010000
[Epoch 186/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000297]
Test Avg. Loss: 18999.925781, Accuracy: 0.010000
[Epoch 187/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000295]
Test Avg. Loss: 18975.453125, Accuracy: 0.010000
[Epoch 188/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000293]
Test Avg. Loss: 18949.039062, Accuracy: 0.010000
[Epoch 189/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000293]
Test Avg. Loss: 18920.974609, Accuracy: 0.010000
[Epoch 190/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000292]
Test Avg. Loss: 18893.566406, Accuracy: 0.010000
[Epoch 191/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000290]
Test Avg. Loss: 18866.751953, Accuracy: 0.010000
[Epoch 192/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000289]
Test Avg. Loss: 18838.226562, Accuracy: 0.010000
[Epoch 193/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000288]
Test Avg. Loss: 18808.751953, Accuracy: 0.010000
[Epoch 194/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000286]
Test Avg. Loss: 18779.259766, Accuracy: 0.010000
[Epoch 195/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000286]
Test Avg. Loss: 18749.382812, Accuracy: 0.010000
[Epoch 196/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000285]
Test Avg. Loss: 18719.992188, Accuracy: 0.010000
[Epoch 197/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000283]
Test Avg. Loss: 18692.478516, Accuracy: 0.010000
[Epoch 198/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000283]
Test Avg. Loss: 18664.859375, Accuracy: 0.010000
[Epoch 199/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000281]
Test Avg. Loss: 18637.296875, Accuracy: 0.010000

> > 因为我电脑最多支持128的batchsize，所以我把batchsize设置为128，这样会影响精确度么

因为我电脑最多支持128的batchsize，所以我把batchsize设置为128，这样会影响精确度么
会的最好还是用1024的batch size

请问1024的batchsize需要多大的显存

Originally posted by @tangjialiang-jj in #10 (comment)

我跑cifar10 两个1080ti就可

关于CVPR2023《Toward Accurate Post-Training Quantization for Image Super Resolution》代码问题

作者您好，我阅读了您的CVPR2023论文《Toward Accurate Post-Training Quantization for Image Super Resolution》，摘要里提到了开源了torch和MindSpore的代码，但是这两个链接都失效了，请问下代码链接是更换了网址还是有别的原因呢？感谢！

[DynamicQuant] Questions about pretrained weight + Some inconsistency

Hello,

I found your CVPR paper to be a very interesting study.
I'm currently trying to reproduce your work using your repository, and I would be grateful if you could give me some advice.

I have a question about the first and last layers in ResNet.
It seems like the code uses full-precision (FP) for both layers, but don't they need to be quantized to 8 bits?

Secondly, I was wondering whether FP pretrained weights are used for DQnet training or if DQnet is trained from scratch without using any pretrained weights.

Thank you for your time and assistance!

学生网络是否有加载预训练模型

你好，最近拜读了你的几篇关于知识蒸馏的论文，是很好的工作。
想请问一下你的两篇工作，CVPR2021 ‘Learning Student Networks in the Wild’和ICCV ‘Data-Free Learning of Student Networks’
在训练学生网络的时候是否有用到ImageNet上的预训练模型呢
在原论文中没有看到相关的描述，也可能是我没注意到

GPT4image Image description for short text

Thanks for your excellent work. can you release the short descriptions for all 1,281,167 training images using [MiniGPT-4] 7B model? Can you release the short descriptions code?

Question about implementation of "DQNet"

Hello, I read you paper in CVPR 2022 .

I have a question about your implementation.

in resnet_quan_imagenet.py , The one-hot vector, which is the number of bits selection for each layer, seems enter to the same value for all layers.

        one_hot = F.gumbel_softmax(feat, tau=1, hard=True, eps=1e-20)
        
        for m in self.layer2:
            x = m(x, one_hot)
        for m in self.layer3:
            x = m(x, one_hot)
        for m in self.layer4:
            x = m(x, one_hot)

In this case, aren't all the layers' bitwidth selected in the same bitwidth?

DAFL超参数设置问题

您好，想请问您在celeb数据集上进行实验时，各loss的系数是如何设置。另外这些超参数的设置有什么技巧吗？谢谢。

can`t find the code for 'ReNAS: Relativistic Evaluation of Neural Architecture Search'

is it moved to another repository?

[PTQ4SR] Unfair comparison in paper?

In the CVPR2023 paper, "Toward Accurate Post-Training Quantization for Image Super Resolution",
Table 5 (Sec. 4.2) of the paper seems unfair.

PAMS and FQSR are implemented on EDSR-baseline model (layers=16, dimension size=64),
while PTQ4SR (this works) is implemented on EDSR model (layers=32, dimension size=256).

Since the accuracy of EDSR and EDSR-baseline is largely different, isn't it unfair to compare methods with different backbones?

Are there comparisons made using the same EDSR-baseline backbone?

Looking forward to your code release!

DynamicQuant: The code is inconsistent with Figure 2.

The code is inconsistent with Figure 2 in Instance-Aware Dynamic Neural Network Quantization.

关于实例感知动态量化神经网络的问题

非常感谢您分享您论文的实现代码，我想问一个比较愚蠢的问题：为什么全连接层的输出刚好就是比特候选中相应比特被选择的logits呢？

开源代码问题

你好，我想问一下“Instance-Aware Dynamic Neural Network Quantization”这篇论文的代码公布在哪里了，在paperwithcode跟着连接过来但是找不到了

Any instructions about the CelebA experiments?

Hi, Thanks for your very helpful codes. I have two issues. (1) Do you have any README instructions about how to do the CelebA experiments in the paper? (2) Could you release the trained teacher and student models? I trained myself the CelebA teacher model but found the test accuracy is surprisingly higher, up to ~90%, using the AlexNet-based architecture. I don't know if I do it right or actually sth wrong... Thank you very much!!

Code release

The code seems to be missing.
Can you please release the code.

Thanks

Cannot reproduce the results on CelebA + AlexNet

Hi,

Great thanks for this inspiring work! I have some trouble reproducing the results on CelebA AlexNet. Previously, I reached out to the lead author and got this setting for the CelebA experiment: batch_size 512, lr_G 0.002, lr_S 0. 0002, oh 2, ie 5, a 0.1, latent_dim 100. The generator is alleged to be the same as the one already on GitHub with opt.img_size = 224, opt.channels = 3. Because I don't have enough GPUs to support 512 batch size, I can only do 384 (which I thought should be fairly okay since the problem is essentially a 2-class classification problem. 384 is large enough to approximate the probability). However, when I run the CelebA experiment and find the student network does not converge at all. Part of the log looks like this:

[loss_oh: 0.015645] [loss_ie: -0.034770] [loss_a: -0.122255] [loss_kd: 0.599464] -- Epoch 0 Step 0
[loss_oh: 0.229535] [loss_ie: -0.220245] [loss_a: -0.038812] [loss_kd: 0.044472] -- Epoch 0 Step 10
[loss_oh: 0.257547] [loss_ie: -0.232547] [loss_a: -0.034920] [loss_kd: 0.003552] -- Epoch 0 Step 20
[loss_oh: 0.264858] [loss_ie: -0.235487] [loss_a: -0.034071] [loss_kd: 0.003799] -- Epoch 0 Step 30
[loss_oh: 0.266129] [loss_ie: -0.236011] [loss_a: -0.033889] [loss_kd: 0.002580] -- Epoch 0 Step 40
[loss_oh: 0.263640] [loss_ie: -0.235056] [loss_a: -0.034160] [loss_kd: 0.001985] -- Epoch 0 Step 50
[loss_oh: 0.265512] [loss_ie: -0.235800] [loss_a: -0.033916] [loss_kd: 0.001788] -- Epoch 0 Step 60
[loss_oh: 0.260362] [loss_ie: -0.233735] [loss_a: -0.034578] [loss_kd: 0.001534] -- Epoch 0 Step 70
[loss_oh: 0.259178] [loss_ie: -0.233260] [loss_a: -0.034730] [loss_kd: 0.001352] -- Epoch 0 Step 80
[loss_oh: 0.259632] [loss_ie: -0.233448] [loss_a: -0.034670] [loss_kd: 0.001075] -- Epoch 0 Step 90
[loss_oh: 0.265067] [loss_ie: -0.235633] [loss_a: -0.033966] [loss_kd: 0.001200] -- Epoch 0 Step 100
[loss_oh: 0.263322] [loss_ie: -0.234941] [loss_a: -0.034185] [loss_kd: 0.001177] -- Epoch 0 Step 110
Acc1 0.5042 Epoch 0 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002
[loss_oh: 0.261791] [loss_ie: -0.234332] [loss_a: -0.034381] [loss_kd: 0.001027] -- Epoch 1 Step 0
[loss_oh: 0.262940] [loss_ie: -0.234793] [loss_a: -0.034234] [loss_kd: 0.001158] -- Epoch 1 Step 10
[loss_oh: 0.260839] [loss_ie: -0.233952] [loss_a: -0.034503] [loss_kd: 0.001056] -- Epoch 1 Step 20
[loss_oh: 0.263265] [loss_ie: -0.234922] [loss_a: -0.034204] [loss_kd: 0.000924] -- Epoch 1 Step 30
[loss_oh: 0.261663] [loss_ie: -0.234280] [loss_a: -0.034408] [loss_kd: 0.000847] -- Epoch 1 Step 40
[loss_oh: 0.262933] [loss_ie: -0.234793] [loss_a: -0.034244] [loss_kd: 0.000844] -- Epoch 1 Step 50
[loss_oh: 0.261370] [loss_ie: -0.234169] [loss_a: -0.034442] [loss_kd: 0.000802] -- Epoch 1 Step 60
[loss_oh: 0.261339] [loss_ie: -0.234159] [loss_a: -0.034447] [loss_kd: 0.000674] -- Epoch 1 Step 70
[loss_oh: 0.265101] [loss_ie: -0.235667] [loss_a: -0.033964] [loss_kd: 0.000684] -- Epoch 1 Step 80
[loss_oh: 0.262763] [loss_ie: -0.234734] [loss_a: -0.034266] [loss_kd: 0.000634] -- Epoch 1 Step 90
[loss_oh: 0.262526] [loss_ie: -0.234639] [loss_a: -0.034297] [loss_kd: 0.000670] -- Epoch 1 Step 100
[loss_oh: 0.260210] [loss_ie: -0.233700] [loss_a: -0.034608] [loss_kd: 0.000737] -- Epoch 1 Step 110
Acc1 0.5042 Epoch 1 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002
[loss_oh: 0.263285] [loss_ie: -0.234939] [loss_a: -0.034213] [loss_kd: 0.000686] -- Epoch 2 Step 0
[loss_oh: 0.262520] [loss_ie: -0.234635] [loss_a: -0.034308] [loss_kd: 0.000636] -- Epoch 2 Step 10
[loss_oh: 0.262507] [loss_ie: -0.234629] [loss_a: -0.034317] [loss_kd: 0.000611] -- Epoch 2 Step 20
[loss_oh: 0.263110] [loss_ie: -0.234877] [loss_a: -0.034235] [loss_kd: 0.000559] -- Epoch 2 Step 30
[loss_oh: 0.260770] [loss_ie: -0.233930] [loss_a: -0.034542] [loss_kd: 0.000588] -- Epoch 2 Step 40
[loss_oh: 0.262082] [loss_ie: -0.234462] [loss_a: -0.034376] [loss_kd: 0.000677] -- Epoch 2 Step 50
[loss_oh: 0.261914] [loss_ie: -0.234396] [loss_a: -0.034401] [loss_kd: 0.000535] -- Epoch 2 Step 60
[loss_oh: 0.261639] [loss_ie: -0.234283] [loss_a: -0.034436] [loss_kd: 0.000573] -- Epoch 2 Step 70
[loss_oh: 0.260946] [loss_ie: -0.234001] [loss_a: -0.034535] [loss_kd: 0.000598] -- Epoch 2 Step 80
[loss_oh: 0.262788] [loss_ie: -0.234748] [loss_a: -0.034292] [loss_kd: 0.000538] -- Epoch 2 Step 90
[loss_oh: 0.261890] [loss_ie: -0.234385] [loss_a: -0.034417] [loss_kd: 0.000526] -- Epoch 2 Step 100
[loss_oh: 0.263590] [loss_ie: -0.235067] [loss_a: -0.034201] [loss_kd: 0.000606] -- Epoch 2 Step 110
Acc1 0.5042 Epoch 2 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002

So basically the accuracy is blind guessing and there is no sign that it's gonna get better. I don't know what I have missed.

My environment: Ubuntu1604, pytorch1.3.
I trained my own CelebA teacher with accuracy 81.88% (comparable with 81.59% reported in the paper, so the data should be set up right).

Hopefully, could you tell me where it goes wrong or post an example in the README (like how to set the hyper-parameters) of the CelebA experiment so that we can easily reproduce the results? It would be even better if you can share the trained teacher model. Thanks!

Best,

Where is the paper Network Expansion For Practical Training Acceleration?

Where is the paper Network Expansion For Practical Training Acceleration?
When will release the paper?

Missing Code in GitHub Repository

I'm unable to find the code for the paper "Towards Accurate Post-Training Quantization for Image Super Resolution". Can you please release the code soon?

Will there be an experiment on ImageNet?

Your paper is very instructive. Can you provide a training model to test it? Thank you! It would be better if there were more detailed steps to reproduce the operation. Thank you.

Your paper is very instructive. Can you provide a training model to test it? Thank you!

It would be better if there were more detailed steps to reproduce the operation. Thank you.

===========
你们的论文很有启发意义，请问能提供训练模型测试一下吗？谢谢！
如果有更详细的操作复现步骤就更好了，谢谢！

In the Win10 environment, "DAFL-train.py" adds the following code to solve the problem

The "teacher" model has been successfully trained, but errors occurred while running "DAFL-train.py",anybody know what's going on? Thank you！

"D:\Program Files\Python368\python.exe" B:/PyTorch/DAFL/DAFL-train.py
[Epoch 0/200] [loss_oh: 0.306945] [loss_ie: -0.662719] [loss_a: -1.541127] [loss_kd: 1.733359]
[Epoch 0/200] [loss_oh: 0.391318] [loss_ie: -0.769967] [loss_a: -1.384207] [loss_kd: 1.527557]

Traceback (most recent call last):
File "", line 1, in
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "D:\Program Files\Python368\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\Program Files\Python368\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\Program Files\Python368\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
### File "B:\PyTorch\DAFL\DAFL-train.py", line 190, in
for i, (images, labels) in enumerate(data_test_loader):
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 193, in iter
return _DataLoaderIter(self)
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 469, in init
w.start()
File "D:\Program Files\Python368\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\Program Files\Python368\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Program Files\Python368\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Program Files\Python368\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "D:\Program Files\Python368\lib\multiprocessing\queues.py", line 105, in get
raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
### File "B:/PyTorch/DAFL/DAFL-train.py", line 190, in
for i, (images, labels) in enumerate(data_test_loader):
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 519, in _try_get_batch
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 16596) exited unexpectedly

进程已结束，退出代码 1

您好，想咨询一下AdaBin方法模型的配置问题。

前辈您好，我最近看您的论文感觉很棒，看了一下代码，有一点疑问，论文中在imagenet数据集下resnet18网络的结果是63.1，跑这个实验的时候self.deep_stem是设置的True还是False？

Potential bug: misusage in information entropy

https://github.com/huawei-noah/DAFL/blob/48c2ade7f029df50009f8ca0afcd9cd9d694c2de/DAFL-train.py#L179

Here in the code you used log10 for information entropy loss. However the cross entropy loss implemented in PyTorch uses nature log. The math op should be used consistently across different loss terms (one hot loss and cross entropy loss), otherwise there would be an unexpected ln 10 scaling.

关于DASR中的教师网络问题

您好，DASR中教师网络不应该是训练好的模型吗，为什么在代码中还需要训练一遍教师网络，教师网络中的train.h5和测试集又分别是如何制作出来的呢，望回复，谢谢。

loss_kd reduces to zero

Hi, thanks for releasing the code! I simply ran the training code on mnist.

When training the student network, after a certain training steps, loss_kd reduces to zero. The test accuracy drops by ~1% at the same time. I checked the generated images, it seems that the generator falls into a bad local minima. All the generated images are same.

In this case, the generator is useless for learning the student network. Is this a common phenomenon or a bug?

Entropy in DAFL

Hello. Thank you for your great work.

For calculating the information entropy loss in the DAFL_train.py, minus is not needed?

Entropy = - summation (p * ln(p))

Implementation code

Hello, I read you paper in CVPR 2022 impressively.
Instance-Aware Dynamic Neural Network Quantization

And, I try to re-implement your framework and find your open code repository.

In this paper, the code repository was presented like below.
https://github.com/huawei-noah/Efficient-Computing : cannot find code repository. there are other papers implementation.
https://gitee.com/mindspore/models/tree/master/research/cv/DynamicQuant: does not work url address

Isn't the code prepared yet? Could you tell me your code repository of the paper?

Then, It will be very helpful.

Thanks.

birealnet.py的69行

您好！请问self.alpha_a*(x + self.beta_a)是不是应该是self.alpha_a*x + self.beta_a?

I can not find The implementation of Lerning Student Network in the wild

The paper LSNW said that its implementation is available here.

[DynamicQuant] Isn't the model size about 3 times larger than the baseline?

Hello,
Thanks for sharing your code.
Your paper "Instance-Aware Dynamic Neural Network Quantization" is interesting!

I have one question on the model size of your work.

It seems that a layer of the DynamicQuant network has to hold 3 convolution weights with different bits in order to dynamically select one for each input. (when there is 3 bit-width candidates)
This seems to enlarge the model size by nearly 3 times of uniform-bit quantization, although the Bit-FLOPs are maintained.
I wonder if this impacts the inference time.

If I am wrong, please correct me!

How to handle NAS_Bench_201 arch as tensors

Hi, many thanks for your wonderful work.

I have noticed an arch encoding method in the ReNAS which encodes an adjacent matrix with the operation and vertex flops as well as params into a 19×7×7 tensor. This works for nasbench101 but is not ok for nasbench201 since the latter views the edges as operations and nodes as the sums of the feature maps, which is just the opposite to the nasbench101: nodes as the operations.

So the question is how to handle the nasbench201 into tensors since in the paper you have also conducted experiments on 201 dataset.

Many thanks to your cool and yyds works again!

The code for PTQ4SR

Hi, I'm very interested in your paper. Can you please release the code soon?
Thank you.

{AttributeError}'VisionTransformer' object has no attribute 'dist_token'

@huawei-noah-admin @ggjy Thanks for your project.
When I Training on ImageNet-1K

To train a DeiT-Tiny student with a Cait-S24 teacher, run:
python -m torch.distributed.launch --nproc_per_node=8 main.py --distributed --model deit_tiny_patch16_224 --teacher-model cait_s24_224 --distillation-type soft --distillation-alpha 1 --distillation-beta 1 --w-sample 0.1 --w-patch 4 --w-rand 0.2 --K 192 --s-id 0 1 2 3 8 9 10 11 --t-id 0 1 2 3 20 21 22 23 --drop-path 0

customized_forward.py

def vit_forward_features(self, x, require_feat: bool = False):
x = self.patch_embed(x)
cls_token = self.cls_token.expand(x.shape[0], -1, -1)
if self.dist_token is None:
x = torch.cat((cls_token, x), dim=1)

{AttributeError}'VisionTransformer' object has no attribute 'dist_token'

图像生成器（Generator）中的数据分布信息从何而来？

您好，最近在拜读你们发表在CVPR 2021的文章 Data-Free Knowledge Distillation For Image Super-Resolution，在其中对于图像生成器的论述有一段是where x is the training sample and p_x(x) is the distribution of the original dataset.，但是我好像并没有在文中看到这里原始数据集的分布 p_x(x) 是从哪里获得或者计算或者设定的，也好像在你们的仓库中并没有看到对应这篇文章的源码，也可能是我学艺不精看漏了。

因此想请问一下你们，这部分的原始数据集分布p_x(x)是从哪里得到的呢？

Questions

Hello,

Could you explain me the need for doing this?

Efficient-Computing/BinaryNetworks/AdaBin/cifar10/utils/binarylib.py

Line 67 in 5731ac7

x = (x-self.beta_a)/self.alpha_a
In inference (with no training), the gradient_approx() works simply as a sign(), correct?

Thanks in advance

DAFL 网络设置问题

你好请问一下，我的分类网络最后一层加了softmax，但是这样进行DAFL训练达不到效果，请问你们有类似问题嘛

Hyperparameters to reproduce the results in paper

Hi should we follow the optimizer setting (NAG) in paper or use Adam and SGD in the code? And the learning rate strategy? Using default code i can achieve 84% accuracy on cifar-10 which still can be improved.

	self.conv_blocks1 = nn.Sequential(
	nn.Conv2d(128, 128, 3, stride=1, padding=1),
	nn.BatchNorm2d(128, 0.8),
	nn.LeakyReLU(0.2, inplace=True),
	)