GithubHelp home page GithubHelp logo

huawei-noah / efficient-computing Goto Github PK

View Code? Open in Web Editor NEW
1.1K 21.0 198.0 101.12 MB

Efficient computing methods developed by Huawei Noah's Ark Lab

Python 16.88% Shell 0.02% Jupyter Notebook 82.71% CMake 0.03% C++ 0.36%
knowledge-distillation model-compression binary-neural-networks pruning quantization self-supervised

efficient-computing's Introduction

Efficient Computing

This repo is a collection of Efficient-Computing methods developed by Huawei Noah's Ark Lab.

efficient-computing's People

Contributors

dingning97 avatar gaffey avatar ggjy avatar hantingchen avatar haoqing-wang avatar iamhankai avatar liu-zhenhua avatar lose4578 avatar marsrocky avatar tuzhijun avatar xiaoshingshing avatar xinghaochen avatar xjwu1024 avatar yehuitang avatar yuchuantian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efficient-computing's Issues

Teacher model's performance fluctuates on USPS dataset.

I am trying to do an adaptation task with DAFL. I tried to train the LeNet5 on MNIST, but after I applied it to USPS dataset, a strange thing occurred: the performance of trained LeNet5 varied largely on USPS dataset when I trained LeNet5 with the same hyperparameters for several times, accuracy varied from 40% to 80%. Under this circumstance, it's hard to fix a baseline for adaptation.

It is a little off the topic as DAFL is designed for distillation, but can anyone give me a light on why this happens?

About DAFL

self.conv_blocks1 = nn.Sequential(
nn.Conv2d(128, 128, 3, stride=1, padding=1),
nn.BatchNorm2d(128, 0.8),
nn.LeakyReLU(0.2, inplace=True),
)

Hi. your work is inspired. I have some questions about a claim in the paper and the code, which are not big deals.

First, in the code. I found that the eps argument of torch.nn.BatchNorm2d is set to 0.8, which is quite bigger than the default value 1e-5. I wanna know whether this setting is marginal, or it helps generator training.

Second, in the paper, Section 3.1 claimed that Since filters in the teacher DNNs have been trained to extract intrinsic patterns in training data, feature maps tend to receive higher activation value if input images are real rather than some random vectors. I am confuse about the connection between the number of L1-norm of $f_T^i$ and the authenticity of input images. Can you explain why or lead me to some references?

That's all. Thanks for your excellent work again and look forward to your reply!

Compute avg_loss wrongly

L199 avg_loss /= len(data_test) should be avg_loss/=len(data_test_loader).
BTW, when will you open source your latest work Learning Student Networks in the Wild?

您好,最近拜读您的DAFL 代码

介绍是 无需训练数据的网络压缩技术 DAFL
但我在看代码的时候发现 在
DAFL-train.py中 有 行代码为

teacher = torch.load(opt.teacher_dir + 'teacher').cuda()  # 获得教师网络

而此处的teacher 是加载的 teacher-train.py 中保存的代码,请问是否需要先训练teacher ,然后再根据teacher 训练student

对于这篇论文我的理解是 先创建一个大参数网络teacher,然后再定义一个小网络student,再利用一个生成器Generator,将随机数转换为图片格式, 经过teacher和student 两个网络不断拟合结果,让student和teacher的输出尽量一致.最终保存student作为压缩网络.

但是 下面 的代码我又没看太懂,想哭

        pred = outputs_T.data.max(1)[1]
        loss_activation = -features_T.abs().mean()  # 激活损失
        loss_one_hot = criterion(outputs_T,pred)  # 交叉熵计算损失
        softmax_o_T = torch.nn.functional.softmax(outputs_T, dim=1).mean(dim=0)
        loss_information_entropy = (softmax_o_T * torch.log10(softmax_o_T)).sum()  # 损失信息熵
        loss = loss_one_hot * opt.oh + loss_information_entropy * opt.ie + loss_activation * opt.a  # 损失 加权重

请问对吗?刚刚接触网络压缩,如有不准确,烦请指导下我.
非常感谢

训练速度过慢的问题?

MNIST从原本的10epcoh在DAFL框架下要训练200epoch,CIFAR-10要训练2000epoch。而且GAN的训练也耗时,经过实验比标准KD训练时间长了20-30倍。

About the back-propagation of kd_loss

The kd_loss is used for the training of student network. However, it is also back propagated to the encoder during training, which is not same as the loss function in the paper.

有关于 DAFL-SR的

您好
哈哈哈,还是我,我没有在您这个仓里找到DAFL-SR的代码,是您还未公布,还是我太蠢,还未找到
十分想拜读一下您的代码,如若您公开,将十分感谢
期待您的代码

Question about stable accuracy on CIFAR100

Thank you for providing the code for your amazing work! I have some question about the accuracy of experiments on CIFAR 100.

Firstly, I can get similar results of CIFAR-10 with scripts on the README.md. Like my accuracy of teacher is 0.9518, and that of DAFL is 0.9241.

However, I cannot get the accuracy around 74% of DAFL in CIFAR-100 as in your paper. I follow the same parameters in readme. That is,
python DAFL-train.py --dataset cifar100 --channels 3 --n_epochs 2000 --batch_size 1024 --lr_G 0.02 --lr_S 0.1 --latent_dim 1000 --oh 0.05 --ie 20 --act 0.01.

I run the DAFL code twice, the first time I got 53.31% accuracy, while the second time I got 62.39%. Both of them are far away from 74% in the Table. I did use your teacher training script and the accuracy is 0.771, which is a little bit lower than yours. And I did use 1024 batch size here, as I know that a small batch size may cause a huge difference.

The log for my second training is

[Epoch 1/2000] [loss_oh: 1.565716] [loss_ie: -1.733893] [loss_a: -0.346861] [loss_kd: 0.835829]
[Epoch 2/2000] [loss_oh: 1.836624] [loss_ie: -1.873039] [loss_a: -0.330029] [loss_kd: 0.749510]
...
[Epoch 799/2000] [loss_oh: 1.751102] [loss_ie: -1.998407] [loss_a: -0.336999] [loss_kd: 0.165117]
[Epoch 800/2000] [loss_oh: 1.767649] [loss_ie: -1.998450] [loss_a: -0.337684] [loss_kd: 0.163865]
[Epoch 801/2000] [loss_oh: 1.750000] [loss_ie: -1.998442] [loss_a: -0.337093] [loss_kd: 0.126873]
...
[Epoch 1598/2000] [loss_oh: 1.734455] [loss_ie: -1.997838] [loss_a: -0.337273] [loss_kd: 0.084451]
[Epoch 1599/2000] [loss_oh: 1.737263] [loss_ie: -1.998453] [loss_a: -0.337822] [loss_kd: 0.080239]
[Epoch 1600/2000] [loss_oh: 1.697831] [loss_ie: -1.998273] [loss_a: -0.337703] [loss_kd: 0.081537]
...
[Epoch 1997/2000] [loss_oh: 1.739230] [loss_ie: -1.998205] [loss_a: -0.337063] [loss_kd: 0.053484]
[Epoch 1998/2000] [loss_oh: 1.729897] [loss_ie: -1.998376] [loss_a: -0.337201] [loss_kd: 0.051445]
[Epoch 1999/2000] [loss_oh: 1.730324] [loss_ie: -1.997864] [loss_a: -0.337307] [loss_kd: 0.053925]

Also, this model achieved 0.4108 around 800 epochs, and 0.5790 around 1600 epochs.

I run it with Pytorch 1.4 + cuda 10.1.

It seems that the results are not very stable on CIFAR-100. Thus, I would like to know whether this thing is normal on CIFAR-100. And I also want to know whether my 62% accuracy is reasonable.

By the way, could you please share your training log of CIFAR-100 and hyper parameters?

I really appreciate it if you could help me! Look forward to your reply.

训练cifar100的学生网络时,精确度很低,最后保持0.01

[Epoch 181/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000303]
Test Avg. Loss: 19101.246094, Accuracy: 0.010000
[Epoch 182/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000302]
Test Avg. Loss: 19077.779297, Accuracy: 0.010000
[Epoch 183/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000300]
Test Avg. Loss: 19057.632812, Accuracy: 0.010000
[Epoch 184/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000299]
Test Avg. Loss: 19041.076172, Accuracy: 0.010000
[Epoch 185/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000298]
Test Avg. Loss: 19022.105469, Accuracy: 0.010000
[Epoch 186/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000297]
Test Avg. Loss: 18999.925781, Accuracy: 0.010000
[Epoch 187/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000295]
Test Avg. Loss: 18975.453125, Accuracy: 0.010000
[Epoch 188/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000293]
Test Avg. Loss: 18949.039062, Accuracy: 0.010000
[Epoch 189/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000293]
Test Avg. Loss: 18920.974609, Accuracy: 0.010000
[Epoch 190/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000292]
Test Avg. Loss: 18893.566406, Accuracy: 0.010000
[Epoch 191/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000290]
Test Avg. Loss: 18866.751953, Accuracy: 0.010000
[Epoch 192/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000289]
Test Avg. Loss: 18838.226562, Accuracy: 0.010000
[Epoch 193/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000288]
Test Avg. Loss: 18808.751953, Accuracy: 0.010000
[Epoch 194/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000286]
Test Avg. Loss: 18779.259766, Accuracy: 0.010000
[Epoch 195/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000286]
Test Avg. Loss: 18749.382812, Accuracy: 0.010000
[Epoch 196/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000285]
Test Avg. Loss: 18719.992188, Accuracy: 0.010000
[Epoch 197/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000283]
Test Avg. Loss: 18692.478516, Accuracy: 0.010000
[Epoch 198/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000283]
Test Avg. Loss: 18664.859375, Accuracy: 0.010000
[Epoch 199/200] [loss_oh: 0.000758] [loss_ie: -0.003434] [loss_a: -0.325599] [loss_kd: 0.000281]
Test Avg. Loss: 18637.296875, Accuracy: 0.010000

[DynamicQuant] Questions about pretrained weight + Some inconsistency

Hello,

I found your CVPR paper to be a very interesting study.
I'm currently trying to reproduce your work using your repository, and I would be grateful if you could give me some advice.

I have a question about the first and last layers in ResNet.
It seems like the code uses full-precision (FP) for both layers, but don't they need to be quantized to 8 bits?

Secondly, I was wondering whether FP pretrained weights are used for DQnet training or if DQnet is trained from scratch without using any pretrained weights.

Thank you for your time and assistance!

学生网络是否有加载预训练模型

你好,最近拜读了你的几篇关于知识蒸馏的论文,是很好的工作。
想请问一下你的两篇工作,CVPR2021 ‘Learning Student Networks in the Wild’和ICCV ‘Data-Free Learning of Student Networks’
在训练学生网络的时候是否有用到ImageNet上的预训练模型呢
在原论文中没有看到相关的描述,也可能是我没注意到

GPT4image Image description for short text

Thanks for your excellent work. can you release the short descriptions for all 1,281,167 training images using [MiniGPT-4] 7B model? Can you release the short descriptions code?

Question about implementation of "DQNet"

Hello, I read you paper in CVPR 2022 .

I have a question about your implementation.

in resnet_quan_imagenet.py , The one-hot vector, which is the number of bits selection for each layer, seems enter to the same value for all layers.

        one_hot = F.gumbel_softmax(feat, tau=1, hard=True, eps=1e-20)
        
        for m in self.layer2:
            x = m(x, one_hot)
        for m in self.layer3:
            x = m(x, one_hot)
        for m in self.layer4:
            x = m(x, one_hot)

In this case, aren't all the layers' bitwidth selected in the same bitwidth?

DAFL超参数设置问题

您好,想请问您在celeb数据集上进行实验时,各loss的系数是如何设置。另外这些超参数的设置有什么技巧吗?谢谢。

[PTQ4SR] Unfair comparison in paper?

In the CVPR2023 paper, "Toward Accurate Post-Training Quantization for Image Super Resolution",
Table 5 (Sec. 4.2) of the paper seems unfair.

  • PAMS and FQSR are implemented on EDSR-baseline model (layers=16, dimension size=64),

  • while PTQ4SR (this works) is implemented on EDSR model (layers=32, dimension size=256).

Since the accuracy of EDSR and EDSR-baseline is largely different, isn't it unfair to compare methods with different backbones?

Are there comparisons made using the same EDSR-baseline backbone?

Looking forward to your code release!

开源代码问题

你好,我想问一下“Instance-Aware Dynamic Neural Network Quantization”这篇论文的代码公布在哪里了,在paperwithcode跟着连接过来但是找不到了

Any instructions about the CelebA experiments?

Hi, Thanks for your very helpful codes. I have two issues. (1) Do you have any README instructions about how to do the CelebA experiments in the paper? (2) Could you release the trained teacher and student models? I trained myself the CelebA teacher model but found the test accuracy is surprisingly higher, up to ~90%, using the AlexNet-based architecture. I don't know if I do it right or actually sth wrong... Thank you very much!!

Code release

Hi

The code seems to be missing.
Can you please release the code.

Thanks

Cannot reproduce the results on CelebA + AlexNet

Hi,

Great thanks for this inspiring work! I have some trouble reproducing the results on CelebA AlexNet. Previously, I reached out to the lead author and got this setting for the CelebA experiment: batch_size 512, lr_G 0.002, lr_S 0. 0002, oh 2, ie 5, a 0.1, latent_dim 100. The generator is alleged to be the same as the one already on GitHub with opt.img_size = 224, opt.channels = 3. Because I don't have enough GPUs to support 512 batch size, I can only do 384 (which I thought should be fairly okay since the problem is essentially a 2-class classification problem. 384 is large enough to approximate the probability). However, when I run the CelebA experiment and find the student network does not converge at all. Part of the log looks like this:

[loss_oh: 0.015645] [loss_ie: -0.034770] [loss_a: -0.122255] [loss_kd: 0.599464] -- Epoch 0 Step 0
[loss_oh: 0.229535] [loss_ie: -0.220245] [loss_a: -0.038812] [loss_kd: 0.044472] -- Epoch 0 Step 10
[loss_oh: 0.257547] [loss_ie: -0.232547] [loss_a: -0.034920] [loss_kd: 0.003552] -- Epoch 0 Step 20
[loss_oh: 0.264858] [loss_ie: -0.235487] [loss_a: -0.034071] [loss_kd: 0.003799] -- Epoch 0 Step 30
[loss_oh: 0.266129] [loss_ie: -0.236011] [loss_a: -0.033889] [loss_kd: 0.002580] -- Epoch 0 Step 40
[loss_oh: 0.263640] [loss_ie: -0.235056] [loss_a: -0.034160] [loss_kd: 0.001985] -- Epoch 0 Step 50
[loss_oh: 0.265512] [loss_ie: -0.235800] [loss_a: -0.033916] [loss_kd: 0.001788] -- Epoch 0 Step 60
[loss_oh: 0.260362] [loss_ie: -0.233735] [loss_a: -0.034578] [loss_kd: 0.001534] -- Epoch 0 Step 70
[loss_oh: 0.259178] [loss_ie: -0.233260] [loss_a: -0.034730] [loss_kd: 0.001352] -- Epoch 0 Step 80
[loss_oh: 0.259632] [loss_ie: -0.233448] [loss_a: -0.034670] [loss_kd: 0.001075] -- Epoch 0 Step 90
[loss_oh: 0.265067] [loss_ie: -0.235633] [loss_a: -0.033966] [loss_kd: 0.001200] -- Epoch 0 Step 100
[loss_oh: 0.263322] [loss_ie: -0.234941] [loss_a: -0.034185] [loss_kd: 0.001177] -- Epoch 0 Step 110
Acc1 0.5042 Epoch 0 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002
[loss_oh: 0.261791] [loss_ie: -0.234332] [loss_a: -0.034381] [loss_kd: 0.001027] -- Epoch 1 Step 0
[loss_oh: 0.262940] [loss_ie: -0.234793] [loss_a: -0.034234] [loss_kd: 0.001158] -- Epoch 1 Step 10
[loss_oh: 0.260839] [loss_ie: -0.233952] [loss_a: -0.034503] [loss_kd: 0.001056] -- Epoch 1 Step 20
[loss_oh: 0.263265] [loss_ie: -0.234922] [loss_a: -0.034204] [loss_kd: 0.000924] -- Epoch 1 Step 30
[loss_oh: 0.261663] [loss_ie: -0.234280] [loss_a: -0.034408] [loss_kd: 0.000847] -- Epoch 1 Step 40
[loss_oh: 0.262933] [loss_ie: -0.234793] [loss_a: -0.034244] [loss_kd: 0.000844] -- Epoch 1 Step 50
[loss_oh: 0.261370] [loss_ie: -0.234169] [loss_a: -0.034442] [loss_kd: 0.000802] -- Epoch 1 Step 60
[loss_oh: 0.261339] [loss_ie: -0.234159] [loss_a: -0.034447] [loss_kd: 0.000674] -- Epoch 1 Step 70
[loss_oh: 0.265101] [loss_ie: -0.235667] [loss_a: -0.033964] [loss_kd: 0.000684] -- Epoch 1 Step 80
[loss_oh: 0.262763] [loss_ie: -0.234734] [loss_a: -0.034266] [loss_kd: 0.000634] -- Epoch 1 Step 90
[loss_oh: 0.262526] [loss_ie: -0.234639] [loss_a: -0.034297] [loss_kd: 0.000670] -- Epoch 1 Step 100
[loss_oh: 0.260210] [loss_ie: -0.233700] [loss_a: -0.034608] [loss_kd: 0.000737] -- Epoch 1 Step 110
Acc1 0.5042 Epoch 1 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002
[loss_oh: 0.263285] [loss_ie: -0.234939] [loss_a: -0.034213] [loss_kd: 0.000686] -- Epoch 2 Step 0
[loss_oh: 0.262520] [loss_ie: -0.234635] [loss_a: -0.034308] [loss_kd: 0.000636] -- Epoch 2 Step 10
[loss_oh: 0.262507] [loss_ie: -0.234629] [loss_a: -0.034317] [loss_kd: 0.000611] -- Epoch 2 Step 20
[loss_oh: 0.263110] [loss_ie: -0.234877] [loss_a: -0.034235] [loss_kd: 0.000559] -- Epoch 2 Step 30
[loss_oh: 0.260770] [loss_ie: -0.233930] [loss_a: -0.034542] [loss_kd: 0.000588] -- Epoch 2 Step 40
[loss_oh: 0.262082] [loss_ie: -0.234462] [loss_a: -0.034376] [loss_kd: 0.000677] -- Epoch 2 Step 50
[loss_oh: 0.261914] [loss_ie: -0.234396] [loss_a: -0.034401] [loss_kd: 0.000535] -- Epoch 2 Step 60
[loss_oh: 0.261639] [loss_ie: -0.234283] [loss_a: -0.034436] [loss_kd: 0.000573] -- Epoch 2 Step 70
[loss_oh: 0.260946] [loss_ie: -0.234001] [loss_a: -0.034535] [loss_kd: 0.000598] -- Epoch 2 Step 80
[loss_oh: 0.262788] [loss_ie: -0.234748] [loss_a: -0.034292] [loss_kd: 0.000538] -- Epoch 2 Step 90
[loss_oh: 0.261890] [loss_ie: -0.234385] [loss_a: -0.034417] [loss_kd: 0.000526] -- Epoch 2 Step 100
[loss_oh: 0.263590] [loss_ie: -0.235067] [loss_a: -0.034201] [loss_kd: 0.000606] -- Epoch 2 Step 110
Acc1 0.5042 Epoch 2 (after update) (Best_Acc1 0.5042 @ Epoch 0) lr 0.0002

So basically the accuracy is blind guessing and there is no sign that it's gonna get better. I don't know what I have missed.

My environment: Ubuntu1604, pytorch1.3.
I trained my own CelebA teacher with accuracy 81.88% (comparable with 81.59% reported in the paper, so the data should be set up right).

Hopefully, could you tell me where it goes wrong or post an example in the README (like how to set the hyper-parameters) of the CelebA experiment so that we can easily reproduce the results? It would be even better if you can share the trained teacher model. Thanks!

Best,

Missing Code in GitHub Repository

I'm unable to find the code for the paper "Towards Accurate Post-Training Quantization for Image Super Resolution". Can you please release the code soon?

Your paper is very instructive. Can you provide a training model to test it? Thank you! It would be better if there were more detailed steps to reproduce the operation. Thank you.

Your paper is very instructive. Can you provide a training model to test it? Thank you!

It would be better if there were more detailed steps to reproduce the operation. Thank you.

===========
你们的论文很有启发意义,请问能提供训练模型测试一下吗?谢谢!
如果有更详细的操作复现步骤就更好了,谢谢!

In the Win10 environment, "DAFL-train.py" adds the following code to solve the problem

The "teacher" model has been successfully trained, but errors occurred while running "DAFL-train.py",anybody know what's going on? Thank you!

image
image

"D:\Program Files\Python368\python.exe" B:/PyTorch/DAFL/DAFL-train.py
[Epoch 0/200] [loss_oh: 0.306945] [loss_ie: -0.662719] [loss_a: -1.541127] [loss_kd: 1.733359]
[Epoch 0/200] [loss_oh: 0.391318] [loss_ie: -0.769967] [loss_a: -1.384207] [loss_kd: 1.527557]

Traceback (most recent call last):
File "", line 1, in
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "D:\Program Files\Python368\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\Program Files\Python368\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\Program Files\Python368\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
### File "B:\PyTorch\DAFL\DAFL-train.py", line 190, in
for i, (images, labels) in enumerate(data_test_loader):

File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 193, in iter
return _DataLoaderIter(self)
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 469, in init
w.start()
File "D:\Program Files\Python368\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\Program Files\Python368\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\Program Files\Python368\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Program Files\Python368\lib\multiprocessing\popen_spawn_win32.py", line 33, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "D:\Program Files\Python368\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 511, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File "D:\Program Files\Python368\lib\multiprocessing\queues.py", line 105, in get
raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
### File "B:/PyTorch/DAFL/DAFL-train.py", line 190, in
for i, (images, labels) in enumerate(data_test_loader):

File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 576, in next
idx, batch = self._get_batch()
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 553, in _get_batch
success, data = self._try_get_batch()
File "D:\Program Files\Python368\lib\site-packages\torch\utils\data\dataloader.py", line 519, in _try_get_batch
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 16596) exited unexpectedly

进程已结束,退出代码 1

关于DASR中的教师网络问题

您好,DASR中教师网络不应该是训练好的模型吗,为什么在代码中还需要训练一遍教师网络,教师网络中的train.h5和测试集又分别是如何制作出来的呢,望回复,谢谢。

loss_kd reduces to zero

Hi, thanks for releasing the code! I simply ran the training code on mnist.
image
When training the student network, after a certain training steps, loss_kd reduces to zero. The test accuracy drops by ~1% at the same time. I checked the generated images, it seems that the generator falls into a bad local minima. All the generated images are same.
image
In this case, the generator is useless for learning the student network. Is this a common phenomenon or a bug?

Entropy in DAFL

Hello. Thank you for your great work.

For calculating the information entropy loss in the DAFL_train.py, minus is not needed?

Entropy = - summation (p * ln(p))

image

Implementation code

Hello, I read you paper in CVPR 2022 impressively.
Instance-Aware Dynamic Neural Network Quantization

And, I try to re-implement your framework and find your open code repository.

In this paper, the code repository was presented like below.
https://github.com/huawei-noah/Efficient-Computing : cannot find code repository. there are other papers implementation.
https://gitee.com/mindspore/models/tree/master/research/cv/DynamicQuant: does not work url address

Isn't the code prepared yet? Could you tell me your code repository of the paper?

Then, It will be very helpful.

Thanks.

birealnet.py的69行

您好!请问self.alpha_a*(x + self.beta_a)是不是应该是self.alpha_a*x + self.beta_a?

[DynamicQuant] Isn't the model size about 3 times larger than the baseline?

Hello,
Thanks for sharing your code.
Your paper "Instance-Aware Dynamic Neural Network Quantization" is interesting!

I have one question on the model size of your work.

It seems that a layer of the DynamicQuant network has to hold 3 convolution weights with different bits in order to dynamically select one for each input. (when there is 3 bit-width candidates)
This seems to enlarge the model size by nearly 3 times of uniform-bit quantization, although the Bit-FLOPs are maintained.
I wonder if this impacts the inference time.

If I am wrong, please correct me!

How to handle NAS_Bench_201 arch as tensors

Hi, many thanks for your wonderful work.

I have noticed an arch encoding method in the ReNAS which encodes an adjacent matrix with the operation and vertex flops as well as params into a 19×7×7 tensor. This works for nasbench101 but is not ok for nasbench201 since the latter views the edges as operations and nodes as the sums of the feature maps, which is just the opposite to the nasbench101: nodes as the operations.

So the question is how to handle the nasbench201 into tensors since in the paper you have also conducted experiments on 201 dataset.

Many thanks to your cool and yyds works again!

The code for PTQ4SR

Hi, I'm very interested in your paper. Can you please release the code soon?
Thank you.

{AttributeError}'VisionTransformer' object has no attribute 'dist_token'

@huawei-noah-admin @ggjy Thanks for your project.
When I Training on ImageNet-1K

To train a DeiT-Tiny student with a Cait-S24 teacher, run:
python -m torch.distributed.launch --nproc_per_node=8 main.py --distributed --model deit_tiny_patch16_224 --teacher-model cait_s24_224 --distillation-type soft --distillation-alpha 1 --distillation-beta 1 --w-sample 0.1 --w-patch 4 --w-rand 0.2 --K 192 --s-id 0 1 2 3 8 9 10 11 --t-id 0 1 2 3 20 21 22 23 --drop-path 0

customized_forward.py

def vit_forward_features(self, x, require_feat: bool = False):
x = self.patch_embed(x)
cls_token = self.cls_token.expand(x.shape[0], -1, -1)
if self.dist_token is None:
x = torch.cat((cls_token, x), dim=1)

{AttributeError}'VisionTransformer' object has no attribute 'dist_token'

图像生成器(Generator)中的数据分布信息从何而来?

您好,最近在拜读你们发表在CVPR 2021的文章 Data-Free Knowledge Distillation For Image Super-Resolution,在其中对于图像生成器的论述有一段是where x is the training sample and p_x(x) is the distribution of the original dataset.,但是我好像并没有在文中看到这里原始数据集的分布 p_x(x) 是从哪里获得或者计算或者设定的,也好像在你们的仓库中并没有看到对应这篇文章的源码,也可能是我学艺不精看漏了。

因此想请问一下你们,这部分的原始数据集分布p_x(x)是从哪里得到的呢?

DAFL 网络设置问题

你好请问一下,我的分类网络最后一层加了softmax,但是这样进行DAFL训练达不到效果,请问你们有类似问题嘛

Hyperparameters to reproduce the results in paper

Hi should we follow the optimizer setting (NAG) in paper or use Adam and SGD in the code? And the learning rate strategy? Using default code i can achieve 84% accuracy on cifar-10 which still can be improved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.