dingxiaoh / resrep Goto Github PK

View Code? Open in Web Editor NEW

286.0 286.0 36.0 201 KB

ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting (ICCV 2021)

License: MIT License

Python 100.00%

resrep's People

Contributors

Stargazers

Watchers

resrep's Issues

Breakdown of code

Firstly thank you for the amazing work
If you could provide a short breakdown of the code and how the succeeding strategy is decided.
It'll prove to be a great help.

resnet-50

could you offer your pruned resnet-50.hdf5 file and not pruned one. it saves a lot of time

about the determination of pruning rate

How does ResRep automatically select the pruning rate for each layer without layer crashing? Could I ask where the implementation of this part of the code is? Thank you!

Questions about viewing experiment data

After the experiment is reproduced, check the saved experiment data, Loss, Top_1 accuracy. Open the saved file in txt format, and show errors. How to view experimental data?

the model can not convergence when adding compactor

Hi author, thank you for your grate work.
I implemented the resrep pruner in tensorflow environment according to your paper and code. below is what I did:
firstly I add compactors to the conv layers to be pruned. and then I restore the parameter from original model be to pruned. I confirmed the restored model have the same output with the original model. I also implemented others works such as adding regularization to compactor's grad , mask schedule, etc.
after that I begin to train the model, but I found the model can't convergence, the loss is become bigger as the training on going( the mask is not set yet as the steps is not enough)，it seems like grad explosion？

did you meet the same issue before？

my environment：
tf2.2
network: mobilenetv2 + fpn + centernet. (the compactors only added to the mobilenetv2 backbone)

thank you！

Maybe a small mistake in builder.py

The definition of Conv2dReLU in builder.py.

Line 87: Maybe you wanna return the result not the conv.

Question about the compensate_beta in pruning depthwise separable layer?

Dear author, thank u for your excellent work and code. Recently when I read the convert code, I cannot completely understand the logic behand the beta compensate, can you explain it or point out the reference?
BTW, why we only need to compensate beta for pointwise layer rather than both the depthwise and pointwise layers?

ResRep/rr/resrep_convert.py

Lines 201 to 203 in aac6ed3

 for pri in pruned_ids: 

 compensate_beta = np.abs(fol_dw_beta_value[pri]) * (pw_kernel_value[:, pri, 0, 0] * pw_gamma_value / np.sqrt(pw_var_value + 1e-5)) # TODO because of relu 

 pw_beta_value += compensate_beta

ResRep with DBB

你好，请问一下，已经使用过DBB训练方式，再使用ResRep效果还明显吗？

Questions

Hello, I'm a pruning newbie.
Thanks for sharing your great research.

I have a few questions.

Many open source pruning codes use the public classification dataset. but, I want to pruning a mobilenet based object detection model using a custom dataset. Isn't this a problem for using this ResRep?
If that's possible, I'd really appreciate it if you could give me an example of how to proceed.

Thank you sincerely.

Do you have a version that removes HDF5 related code？

How to inference with weights finish_converted.hdf5 ?

Hi @DingXiaoH :
thanks to your great masterpiece .
and as i wanna check the finished_converted.hdf5 that if the acc has dropped after folding conv and fusing bn ? so i need to load finished_converted.hdf5 to a engine or model then do inference to eval .
do u have any idea ?

mask和论文不符吗？

for compactor_param, mask in compactor_mask_dict.items():####################单独设置compactor的梯度信息,加上lasso_grad梯度
compactor_param.grad.data = mask * compactor_param.grad.data
lasso_grad = compactor_param.data * ((compactor_param.data ** 2).sum(dim=(1, 2, 3), keepdim=True) ** (-0.5))###########这个mask是乘以的loss第二项，和论文不同
compactor_param.grad.data.add_(resrep_config.lasso_strength, lasso_grad)

if not if_accum_grad:
    if gradient_mask_tensor is not None:##################gradient_mask_tensor一直为None
        for name, param in net.named_parameters():
            if name in gradient_mask_tensor:
                param.grad = param.grad * gradient_mask_tensor[name]
    optimizer.step()###############每次只有第二项会mask
    optimizer.zero_grad()
acc, acc5 = torch_accuracy(pred, label, (1,5))

What is the function of pacesetter_dict?

Dear author,

Thanks for you contribution!

When I was reading the code and trying to run my own custom ResNet18 with rr/exp_resrep.py, I am confused by some variables.

For example, in rr/exp_resrep.py, there is a variable named pacesetter_dict, which is from the function of resnet_bottleneck_follow_dict. pacesetter_dict is fed to resrep_config. However, when I check the ResRepConfig and ResRepBuilder, I only see that the resrep_config.target_layers is explicitly used. I may miss some parts but I have not found how is pacesetter_dict used.

So I want to ask

does it matter I give a random pacesetter_dict when I implement other models?
What is the meaning of pacesetter_dict and resnet_bottleneck_follow_dict?

Affects of latency / speedup

Thank you for sharing this great repo and paper.
I wonder if you examined the affects of ResRep pruning on the latency or throughput of a model?

It is not uncommon that pruning, even at a large scale doesn't yield much improvement on the model's inference speed, was ResRep able to have a profound impact on the inference speed?

Thanks in advanced.

about hdf5 file

what should I do to get a pruned architecture? or after pruning, run a test ?

How to change "finish_converted.hdf5" to pytorch model?

Thanks for your great work, could you please provide a script to change the trained "finish_converted.hdf5" to pytorch model?

Pacesetter, target_layers and succeeding strategy

Thank you for sharing the code !
The idea of the paper is awesome and I want to use the pruning method on other networks.
However, when I try to read the codes, I don't know the meaning of those three variables.
Pacesetter, target_layers, succeeding strategy

pacesetter
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 10: 0, 12: 0, 14: 0, 16: 0, 18: 0, 19: 19, 21: 19, 23: 19, 25: 19, 27: 19, 29: 19, 31: 19, 33: 19, 35: 19, 37: 19, 38: 38, 40: 38, 42: 38, 44: 38, 46: 38, 48: 38, 50: 38, 52: 38, 54: 38, 56: 38}
target_layers
[1, 3, 5, 7, 9, 11, 13, 15, 17, 20, 22, 24, 26, 28, 30, 32, 34, 36, 39, 41, 43, 45, 47, 49, 51, 53, 55]
succeeding_strategy
{1: 2, 3: 4, 5: 6, 7: 8, 9: 10, 11: 12, 13: 14, 15: 16, 17: 18, 20: 21, 22: 23, 24: 25, 26: 27, 28: 29, 30: 31, 32: 33, 34: 35, 36: 37, 39: 40, 41: 42, 43: 44, 45: 46, 47: 48, 49: 50, 51: 52, 53: 54, 55: 56, 0: 1, 2: 3, 4: 5, 6: 7, 8: 9, 10: 11, 12: 13, 14: 15, 16: 17, 18: [19, 20], 21: 22, 23: 24, 25: 26, 27: 28, 29: 30, 31: 32, 33: 34, 35: 36, 37: [38, 39], 40: 41, 42: 43, 44: 45, 46: 47, 48: 49, 50: 51, 52: 53, 54: 55, 56: 57}

I am wondering how can I set those variables if I want to use other network like VGG or shuffleNet
Thank you so much !

Question about optimizer strategy

Hi! Thanks for your great work and I have a few small questions:

"sgd_optimizer" function in rr/resrep_train.py, the weight_decay of bias and bn are set to weight_decay_bias（i.e 0）and apply_lr of bias is set to 2 * lr , I confuse that such a setting is noly for resnet50 training or a general trick for resrep pruning? if the resrep method is applied to other model , such weight_decay and ir seetings will still be necessary？
2）why introduce the channel selection limit channels and whether exists a criteria for its value setting?

	for pri in pruned_ids:
	compensate_beta = np.abs(fol_dw_beta_value[pri]) * (pw_kernel_value[:, pri, 0, 0] * pw_gamma_value / np.sqrt(pw_var_value + 1e-5)) # TODO because of relu
	pw_beta_value += compensate_beta

dingxiaoh / resrep Goto Github PK

resrep's People

Contributors

Stargazers

Watchers

Forkers

resrep's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs