eric-mingjie / rethinking-network-pruning Goto Github PK
View Code? Open in Web Editor NEWRethinking the Value of Network Pruning (Pytorch) (ICLR 2019)
License: MIT License
Rethinking the Value of Network Pruning (Pytorch) (ICLR 2019)
License: MIT License
I am trying to prune simple LeNet5 model using L1-norm pruning and CIFAR10 dataset. The model has 6 kernels in the first layer and 16 in the second convolutional layer. The output of the last convolutional layer of the original model is 16x6x6 and the number of nodes in the first dense layer is 120 which makes a matrics of [576, 120]. The output dimension of the last convolutional layer after pruning (5 kernels are pruned from first layer and 8 from second layer) is 6x6x8 that makes a matrics of [288, 120]. But while training it is giving dimension mismatch error. The problem is in copying weights from original model to pruned model in the dense layer. Here is the code where weights are being copied.
size mismatch, m1: [2 x 288], m2: [8 x 120] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:290
Any Suggestions ???
Hi, Thank you for sharing a good experiment.
I have a question about the loss function of network slimming.
The paper shows the training objective as shown below.
But codes only use cross entropy function when training after pruning. Is this the right implementation?
Please explain if I misunderstood.
Why do the thinet model which in your repo not apply the algorithm in the origin paper?Maybe I didn`t find it ?
Dear author,
I just found an issue that with the current algorithm, the threshold for the pruning is selected based on the batch norm scaling factors of all layers. Hence, it is possible that all the scaling factors of certain layer are below the threshold and hence all channels in that layer is masked. In such cases, the mask implementation is blocking the data flow inside the neural network.
I encounter this problem when setting the pruning percentage to be 0.5 as shown in the readme file and I got almost 0% accuracy after first round of pruning.
Could you please advice if this is the correct method? Should I use fine tuning to recover the accuracy or I should decrease the pruning ratio first and do the pruning progressively?
As the ratio of 0.5 is suggested in the code, can I check do you also encounter the similar situation of 0% accuracy for the first round of pruning with 0.5 percentage?
Thank you so much for your reply and advice.
Hi, thanks for the great work! I think it is really a valuable observation that looing for the optimal structure should be the real value of channel pruning.
But I have a question: when you compare the performance of the Fine-tuned, Scratch-E and Scratch-B, it seems finetuning only takes a small number of epochs. For example Cifar10, the finetuning only takes 40 epochs while scratch-E takes 160 epochs. Can it be the reason why Scratch models outperform the Finetuned ones?
From my experiments, I find finetuning 40 epochs is really not enough, especially for the higher pruning ratios.
I think it is maybe not so equal to comapre two models with so different training epochs.
I tried to compute the flops of resnet torchvision
models (imagenet/network-slimming/compute_flops.py). However, the result is 24.64G Flops, which conflict with https://github.com/albanie/convnet-burden#image-classification-architectures (4G FLOPs).
Code to reproduce:
import torchvision
from compute_flops import count_model_param_flops
from models.resnet import resnet50
from vgg import slimmingvgg as vgg11
def main():
model_torchvision = torchvision.models.resnet50()
flops_torchvision = count_model_param_flops(model_torchvision, 224)
print(flops_torchvision)
pass
if __name__ == '__main__':
main()
Output:
+ Number of FLOPs: 24.64G
tensor(2.4636e+10)
In Network Slimming, the repo use BN_grad_zero
to add the mask on the network. I think this should only be used in finetune phase. Why use it at training phase?
The number of out channel of conv1
in preresnet is fixed (16). However, when using custom config for preresnet, the in_channels
of the first layer in the layer1
may not be 16 (depends on cfg
). There will be an error.
rethinking-network-pruning/cifar/network-slimming/models/preresnet.py
Lines 70 to 78 in 74166a7
Dear author,
I am trying to prune a resnet-56 on cifar10 using network slimming.
python resprune.py --dataset cifar10 --depth 56 --percent 0.8 --model
~/results_def/resnet56/baseline/model_best.pth.tar --save ~/results_def/resnet56/pruned80/
Does this mean there is no path between input and output? Shouldn't it still work given every layer would be an identity mapping?
What should I do in case I want to reprooduce the results for aggresive pruning?
We trained the network slimming model with the command https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/imagenet/network-slimming/README.md#train-with-sparsity, and prune with 50%. However, we could not prune the same result as models you provided.
More specifically, in our result, the classifier.1.weight
was pruned to 0 channels, and the classifier.4.weight
almost keeps all original channels.
Pruning result:
layer index: 4 total channel: 64 remaining channel: 26
layer index: 8 total channel: 128 remaining channel: 86
layer index: 12 total channel: 256 remaining channel: 111
layer index: 15 total channel: 256 remaining channel: 182
layer index: 19 total channel: 512 remaining channel: 171
layer index: 22 total channel: 512 remaining channel: 176
layer index: 26 total channel: 512 remaining channel: 295
layer index: 29 total channel: 512 remaining channel: 328
layer index: 34 total channel: 4096 remaining channel: 0
layer index: 37 total channel: 4096 remaining channel: 4096
I have used vgg11_bn in cifar10 but the result was total differient from this paper. In this paper Network architectures obtained by pruning 60% channels on VGG-16 (in total 13 conv-layers) using Network Slimming is pretty workable, and the 9,10,11,12,13 conv layers are pruned a lot. But in my experiment I find that the higher the gamma average of the later layers. So after pruning, almost all the front layers have been pruned, and the acc is much lower. Have you met this situation?
Hello,
I want to know the architecture of the VGG-16 used on CIFAR-10 dataset , it contains 13 convolution layers followed by an average pooling layer and one fully connected layer?
regards
Dear author:
If I want to prune VGG-16 model using L1-norm based channel pruning method in ImageNet.
Should I prune the shallow or deep convolutional layer first?
I read some related papers, but it don't seem to mention this one in the article.
Or is it simply based on sensitivity of layer and experience?
Best Regard.
@liuzhuang13 @Eric-mingjie @quelleG Thanks for the sharing the wonderful work , i just have few queries .
I notice that PyTorch will apply the weight decay on all trainable parameters, including BatchNorm
. In training Network Slimming on ImageNet, the weight decay is 1e-4
which is 10x larger than the sparsity 1e-5
. Does this affect the effectiveness of the sparsity loss? Could I set the weight decay as 0 for bn layers? Are there any experiment results on 0 weight decay bn layers?
Hi, thanks for the great work. I have a question about the experiments in predefined structured pruning methods. I am not sure I am understanding the paper correctly.
For predefined structured pruning methods, given the pruning ratio (e.g. 50%), the only difference in different methods is how to find the "least important" channels to prune. But after pruning, they all will result in the same structured pruned models. According to the paper, all these pruned models should have the same performance, even when training from scratch. My question is, if this is true, does this mean that it is meaningless to do those predefined structured pruning since they all lead to the same pruned models which has the same performance. One can just construct a ResNet_0.5x and train from scratch and it will have the same performance as the predefined structured pruning methods. I am looking forward to your reply.
The accuracy of pruned VGG19 with sparse rate of 0.5 (before finetune) becomes 10.23, and rises to 72.30 after finetune. This is natural.
However, ResNet164 with sparse rate of 0.5 has exactly same accuracy with original model (75.55), which I think is weird, and accuracy drops after finetuning (75.41). Is this right? I check whether the model size actually decreased, and I found no problem.
Is this result natural?
If I want to prune VGG model using l1 norm pruning method and CIFAR dataset I have to run:
1- main.py
2-Vggprune.py
3-main_finetune.py
Because when i start with Vggprune.py I obtained a test accuracy with 10% of the model and the same test acuuracy of the newmodel (pruned model).
Also, I don't undestand this line:
out_channels = m.weight.data.shape[0]
And why the choice of : start_mask = torch.ones(3) , is it because the in_channels are 3 ??
When calculating the threshold, the weight ordering of all bn layers is used. Is this reasonable?
Is there such a phenomenon:
① The first value of the network is closer to the image pixel value, and the last layer is closer to the category probability. bn's weight is not necessarily the same.
② There is a shortcut in the middle of the network. After the two convolution pixel values are superimposed, the weight parameter becomes larger. May affect bn's weight.
在计算阈值时,将使用所有bn层的权重排序。 这合理吗?
是否存在这样的现象:
①网络最前面的数值,更靠近图像像素值,最后一层更靠近类别概率。bn的weight不一定分布相同。
②在网络中间有shortcut,两个卷积像素值叠加后,weight参数变大。可能会影响bn的weight。
issues
期待您的回复。十分感谢。
after network-slimming,the size of modell are the same as premodel?
In the original paper, the authors have applied L1 norm on the scaling factors of the batchnorm. However, in your code, you have obtained a threshold to prune out the nodes with batchNorm scaling factors less than that threshold (thre).
It seems like you have not applied the L1 norm in your code.
PLease let me know if I am missing anything.
Hi,I want to pure my model on tracking task. I‘m using resnet22 and my tracking datasets, train it by my program.So I have a trouble ,when should I train it? If I want to achieve a 60% pruning rate, Then I should first prune the randomly weighted model to 60% and then train it, or gradually prune the model to do a fine tuning and pruning process.
look forward your help
Hi, I have one more question.
When vgg19 is pruned (70%) as a guide, the channel remains as shown below.
layer index: 3 total channel: 64 remaining channel: 45
layer index: 6 total channel: 64 remaining channel: 64
layer index: 10 total channel: 128 remaining channel: 128
layer index: 13 total channel: 128 remaining channel: 128
layer index: 17 total channel: 256 remaining channel: 256
layer index: 20 total channel: 256 remaining channel: 256
layer index: 23 total channel: 256 remaining channel: 249
layer index: 26 total channel: 256 remaining channel: 184
layer index: 30 total channel: 512 remaining channel: 36
layer index: 33 total channel: 512 remaining channel: 6
layer index: 36 total channel: 512 remaining channel: 2
layer index: 39 total channel: 512 remaining channel: 0
layer index: 43 total channel: 512 remaining channel: 0
layer index: 46 total channel: 512 remaining channel: 0
layer index: 49 total channel: 512 remaining channel: 5
layer index: 52 total channel: 512 remaining channel: 292
Since the remaining channels of index 39,43 and 46 are zero, an error "IndexError: index 0 is out of bounds for dimension 0 with size 0" occurs https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/cifar/network-slimming/vggprune.py#L125
zero-remaining channels means that it is not trained. This is a big problem.
Is there anything else I need to do to run 70% like the paper result?
Or do I have to apply the mask implementation you mentioned #44 (comment) to do 70% pruning?
This method(mask imp) was also used, but there were still the zero-remaining channels because the pruning method, which eliminates channels below a threshold, was the same. This was not an appropriate solution.
Is there a special way to prevent the zero-remaining channels?
Two errors that I have found:
--arch
in prune.py
, but it is used in README--prune
in prune.py
because it is used in the following code, see https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/imagenet/network-slimming/prune.py#L81Did you test the code before the release?
Hi, I found the default number of epochs in network-slimming(scratch training VGG-11) for Imagenet (which is 90 in the code) is different from the original paper, which is 60.
Hi,
I notice that this cfg used to prune vgg16 model has a slightly different configuration than the original vgg16 cfg.
vgg16_cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512]
Why did you choose to perform pruning only on the first and the last six conv layers? Is there any reason for this?
anyone konws what the line128 model.tarin() and line 154 model.eval() in network-slimming/main.py are meant for. i did not find the defination of these two function,can i just delete them,thanks
Hello. I got a question while reproducing your interesting experiment.
In section 2 of https://arxiv.org/pdf/1708.06519.pdf, "Scaling Factors and Sparsity-induced Penalty" shows below equation.
Question:
g(γ) means L1 norm, but https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/imagenet/network-slimming/main_finetune.py#L187 applys torch.sign like "m.weight.grad.data.add_(sparsity * torch.sign(m.weight.data))" not L1 norm
So isn't it right to use "m.weight.grad.data.add_(sparsity * m.weight.data.abs()))" for updateBN?
How to do the pruning step for batchNorm1D layers in an ANN, where you would be using the weights directly rather than using a mask?
If possible, a sample code on the nn.Linear layer and the BatchNorm1d layer would be really helpful!
When I use the same code for batchNorm1D, I get :
Traceback (most recent call last):
File "MLPprune.py", line 156, in
end_mask = cfg_mask[layer_id_in_cfg]
IndexError: list index out of range
First of all, I would like to say thanks to authors for sharing their excellent research work. However, part of the code is kind of confusing me. The main concern is about the training/validation/test splits. I'm wondering that is this a common approach to use the test set to find the best model (e.g. CIFAR10 experiments)?
In other words, is it necessary to explicitly use a validation set to choose the best model?
Also, is this something conventional and widely adopted in other studies as shown in the reimplemented code?
A quick question. In LTH experiments in https://github.com/Eric-mingjie/rethinking-network-pruning/blob/master/cifar/lottery-ticket/weight-level/lottery_ticket.py#L293 gradients are zeroed for the weights that are masked out. But gradients of these zeroed weights take part in the backward pass. In other words the backward pass taken seems not equivalent to a backward pass of the corresponding thin network initialized using the lottery ticket. Is this intentional, or maybe I misunderstood something? Thanks!
I want to ask the skip in pruning.py. Why choose these layer to prune? Cause I want to prune my own resnet101 and I wonder if there are any rules about how to choose layers to prune.
Thanks~
While trying to implement MLPprune.py similar to vggprune.py, I get the following error:
Traceback (most recent call last):
File "MLPprune.py", line 174, in
test(model)
File "MLPprune.py", line 123, in test
output = model(data)
...
...
> RuntimeError: size mismatch, m1: [256 x 3072], m2: [3 x 128]
My code for MLP is the same as vggprune.py, except for changing model.arch.
Where to make the change in the vggprune.py model?
count_flops:你的flops计算的代码有问题。
Hello,
Thanks for an interesting paper. I was looking at the accuracy of Scratch B models compared to big-unpruned networks and it seems Scratch B is performing better than unpruned networks most of the time. This seems to be counter intuitive as the bigger network if could be trained effectively should outperform smaller networks. Do you think the difference is statistically significant?
Thanks
The async keyword argument in conversion calls is deprecated in PyTorch >= 0.4.0, and it has been replaced by non_blocking. This is necessary because async is a keyword in Python >= 3.7
flake8 testing of https://github.com/Eric-mingjie/rethinking-network-pruning on Python 3.7.0
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./cifar/weight-level/cifar_finetune.py:229:63: E999 SyntaxError: invalid syntax
inputs, targets = inputs.cuda(), targets.cuda(async=True)
^
./cifar/weight-level/cifar_B.py:273:63: E999 SyntaxError: invalid syntax
inputs, targets = inputs.cuda(), targets.cuda(async=True)
^
./cifar/weight-level/cifar.py:232:63: E999 SyntaxError: invalid syntax
inputs, targets = inputs.cuda(), targets.cuda(async=True)
^
./cifar/weight-level/cifar_E.py:265:63: E999 SyntaxError: invalid syntax
inputs, targets = inputs.cuda(), targets.cuda(async=True)
^
./imagenet/regression-pruning/compute_flops.py:91:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Conv2d):
^
./imagenet/regression-pruning/compute_flops.py:93:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Linear):
^
./imagenet/regression-pruning/compute_flops.py:95:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.BatchNorm2d):
^
./imagenet/regression-pruning/compute_flops.py:97:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.ReLU):
^
./imagenet/regression-pruning/compute_flops.py:99:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/regression-pruning/compute_flops.py:99:71: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/regression-pruning/compute_flops.py:101:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Upsample):
^
./imagenet/regression-pruning/compute_flops.py:110:22: F821 undefined name 'torch'
input = Variable(torch.rand(3,input_res,input_res).unsqueeze(0), requires_grad = True)
^
./imagenet/regression-pruning/main_E.py:197:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/regression-pruning/main_B.py:211:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/regression-pruning/models/vgg_5x.py:8:1: F822 undefined name 'vgg16_official' in __all__
__all__ = [
^
./imagenet/thinet/compute_flops.py:91:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Conv2d):
^
./imagenet/thinet/compute_flops.py:93:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Linear):
^
./imagenet/thinet/compute_flops.py:95:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.BatchNorm2d):
^
./imagenet/thinet/compute_flops.py:97:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.ReLU):
^
./imagenet/thinet/compute_flops.py:99:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/thinet/compute_flops.py:99:71: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/thinet/compute_flops.py:101:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Upsample):
^
./imagenet/thinet/compute_flops.py:110:22: F821 undefined name 'torch'
input = Variable(torch.rand(3,input_res,input_res).unsqueeze(0), requires_grad = True)
^
./imagenet/thinet/main_E.py:206:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/thinet/main_B.py:226:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/l1-norm-pruning/main_finetune.py:206:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/l1-norm-pruning/compute_flops.py:91:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Conv2d):
^
./imagenet/l1-norm-pruning/compute_flops.py:93:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Linear):
^
./imagenet/l1-norm-pruning/compute_flops.py:95:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.BatchNorm2d):
^
./imagenet/l1-norm-pruning/compute_flops.py:97:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.ReLU):
^
./imagenet/l1-norm-pruning/compute_flops.py:99:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/l1-norm-pruning/compute_flops.py:99:71: F821 undefined name 'torch'
if isinstance(net, torch.nn.MaxPool2d) or isinstance(net, torch.nn.AvgPool2d):
^
./imagenet/l1-norm-pruning/compute_flops.py:101:32: F821 undefined name 'torch'
if isinstance(net, torch.nn.Upsample):
^
./imagenet/l1-norm-pruning/compute_flops.py:110:22: F821 undefined name 'torch'
input = Variable(torch.rand(3,input_res,input_res).unsqueeze(0), requires_grad = True)
^
./imagenet/l1-norm-pruning/main_E.py:203:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/l1-norm-pruning/prune.py:83:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/l1-norm-pruning/main_B.py:216:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/network-slimming/main.py:206:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/network-slimming/main_finetune.py:212:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/network-slimming/main_E.py:214:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/network-slimming/prune.py:142:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
./imagenet/network-slimming/main_B.py:220:34: E999 SyntaxError: invalid syntax
target = target.cuda(async=True)
^
17 E999 SyntaxError: invalid syntax
24 F821 undefined name 'torch'
1 F822 undefined name 'vgg16_official' in __all__
42
hello, @liuzhuang13 @Eric-mingjie ,have you ever do the pruning of mobilenetV2?
I try to prune mobilenetV2 with several methods, it seems hard to train the pruned model to convergence in imagenet.
我对您的工作感到很感兴趣,也在进行相关的研究,在实验的过程中,发现了一些有意思的现象,希望能与您进行讨论。
for m in model.modules():
if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
mask = (m.weight.data != 0)
mask = mask.float().cuda()
m.weight.grad.data.mul_(mask)
m.bias.grad.data.mul_(mask)
这部分代码是您用来阻止已经置零的权重进行进一步的梯度更新,这是很不错的想法,我也想在自己的工作中加入这部分代码,但是在pytorch中发现这部分代码并不能绝对的阻止权重的更新。虽然直观上他应该可以阻止参数的更新,但是实际上,那些已经置零的通道仍然会进行少量的更新。这样的一个直接的影响就是,应该失效的通道仍然在默默的发挥作用。不知道您是否注意到这种情况,期待您的回复。
Hi, thanks for sharing the code.
May I know how do you get the following pruning scheme for resent_2x in regression-pruning:
cfg_2x = [35, 64, 55, 101, 51, 39, 97, 50, 37, 144, 128, 106, 205, 105, 72, 198, 105, 72, 288, 128, 110, 278, 256, 225, 418, 209, 147, 407, 204, 158, 423, 212, 155, 412, 211, 148, 595, 256, 213, 606, 512, 433, 1222, 512, 437, 1147, 512, 440]
Is it from the original code, or you implement yourself?
BTW, it seems that according to compute_flops.py, the reduction of flops is not 2 folds, though the name is resnet_2x.py
I want to ask the skip in pruning.py. Why choose these layer to prune? Cause I want to prune my own resnet101 and I wonder if there are any rules about how to choose layers to prune.
Thanks~
Is there a code that implements the mask? Regression-pruning
Hi, thanks for your hard work!
I am curious about the experiments train from scratch
in your paper. Specifically, for example, if you prune by magnitude in fine-grained, do you directly prune the weights by using the information from initiated weights (i.e. sort the initialized weights and choose the minimum top-k individual weights and prune them) ? Or use the pre-trained model's weights and prune the model and then re-initialized the remained weights to re-train them?
I'm curious about the implementation of pruning algorithm for weight level pruning. Specifically, to my understanding, in cifar/weight_level/ file, you first train a model, prune it and then fine tune it.
My question lies at the code: cifar/weight-level/cifar_finetune.py, at line 246 to 251. Correct me if I'm wrong, but it seems that at each of the training iteration, you check the weight of the Conv2d and mask out the gradient for those weights that are zero. My question is, in addition to the weights that are zeroed out in the pruning phase, is it possible that the number of zeroed weight increases as the training proceed? If so, then your code seems to freeze these unpruned weights to zero. Thanks for any further feedback.
Hi,
I have followed the code here and run the sparse training code as below:
python main.py --arch vgg11_bn --s 0.00001 --save [PATH TO SAVE RESULTS] [IMAGENET]
After the training, the accuracy is 71.4% which is fine. However, The pruning results is almost 0 with the 0.5 pruning ratio. As I decrease the pruning ratio to be 0.2, the top 1 accuracy increase to 15% which also far below the expectation. Could you please advice this is normal or there could be something wrong?
I would like a one time pruning and do not want to prune iterative.
Thanks for your reply.
Best regards,
Traceback (most recent call last):
File "main.py", line 166, in
train(epoch)
File "main.py", line 127, in train
avg_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
Hi, thank you for this great work.
How would one reproduce Figure 4 from your paper i.e "The average sparsity pattern"?
Thanks
Hey,
The link for the weight level pruned resnet 50 @60% is the same for the finetuned model and scratch-E. I think the link for the finetuned version is wrong.
ResNet-50 | 60% | finetune | 76.09 | 92.91 | pytorch model (195 MB) <------ Wrong link
ResNet-50 | 60% | scratch-E | 73.69 | 91.61 | pytorch model (195 MB)
Could you update the link if you have it? I would like to run some experiments with the model.
Best,
Marton
I notice that the ImageNet of vgg initializes the weight as 0.5
which is different from the implementation of torchvision
https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py#L55
Are there any reasons? Thanks!
I've installed the correct requirements. But after running this:
python main.py --dataset cifar10 --arch vgg --depth 16
I'm getting the following error:
Traceback (most recent call last):
File "main.py", line 166, in <module>
train(epoch)
File "main.py", line 125, in train
output = model(data)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/jeferson/repo/rethinking-network-pruning/cifar/l1-norm-pruning/models/vgg.py", line 56, in forward
x = self.feature(x)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/jeferson/repo/rethinking-network-pruning/repense/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
Am I doing something wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.