The idea is to have a more advanced Filter Pruning method to be able to show SOTA resu

Branch with experiments code: <a href="https://github.com/mkaglins/nncf_pytorch/tree/m

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

LeGR was re-implemented in NNCF on the branch: <a href="https://github.com/mkaglins/nn

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reimplementation of Filter Pruning Method from LeGR paper about nncf HOT 12 CLOSED

openvinotoolkit commented on May 28, 2024

Reimplementation of Filter Pruning Method from LeGR paper

from nncf.

Comments (12)

mkaglins commented on May 28, 2024

Branch with experiments code: https://github.com/mkaglins/nncf_pytorch/tree/mkaglins/RL_experiments

from nncf.

vshampor commented on May 28, 2024

@mkaglins what is the status of this issue?

from nncf.

mkaglins commented on May 28, 2024

LeGR was re-implemented in NNCF on the branch: https://github.com/mkaglins/nncf_pytorch/tree/mkaglins/legr_impl

And baseline results for MobileNet v2 on CIFAR100 from this paper were reproduced successfully.

Global ranking coefficients were trained with settings from paper and LeGR github ( https://github.com/cmu-enyac/LeGR) and the following results were obtained:

Pruning Rate	Original acc	80%	87%	90%
mobilenet_v2 Top1@acc	73.47%	73.64%	72.26%	71.2%

To reproduce results next changes/settings in NNCF were made:

MobilenetV2 model architecture and pretrained weights from https://github.com/cmu-enyac/LeGR
Dataset (CIFAR-100) normalization params and train/validation transformations from https://github.com/cmu-enyac/LeGR
Pruning quota = 0.1(algorithm can prune only 90% of every layer)
Allow pruning last convolution (in current NNCF settings this is not possible)

from nncf.

AlexKoff88 commented on May 28, 2024

@mkaglins, do we have results for Geomean method to compare with what you got on CIFAR?

from nncf.

mkaglins commented on May 28, 2024

No, this model with weights is from LeGR github. But I will run such an experiment to compare.

from nncf.

mkaglins commented on May 28, 2024

Filter Pruning algorithm (with geomen magnitude method) and same target flops pruning rate showed significately worse result than LeGR:

with pruning rate= 80%, top1@acc = 68.6%
with pruning rate= 90%, top1@acc = 65%
Experiment was conducted with same MobilenetV2 pretrained weights, dataset params and fine-tuning scheme as in LeGR case.

from nncf.

lzrvch commented on May 28, 2024

@mkaglins So the results are specific to thes particular MobileNetV2 weights? What about the ones from, say, torchvision? Would there be such a significant gap between LeGR and geomean+uniform results?

from nncf.

mkaglins commented on May 28, 2024

@vanyalzr such experiments to compare LeGR with the current Filter Pruning algorithms on different models are planned and in progress.

from nncf.

mkaglins commented on May 28, 2024

Further experiments are planned:

Experiments to compare LeGR with the current FP algorithm:

LeGR comparison with current FP algorithm on Imagenet on already exposed pruned models (resnet18, resnet34, resnet50, googlenet, unet) (currently experiment on resnet18 are in progress)
LeGR comparison with current FP algorithm on CIFAR-100 (resnet18, resnet50, inceptionv3, mobilenetv2)

Also some experiments for potential LeGR improvement:

LeGR with fewer generation numbers (200 instead of 400)
LeGR with progressive fine-tuning (exponential scheduler)
LeGR with Batch-Norm adaptation instead of small training to estimate pruned model accuracy
LeGR with the configuration from geomean pruning added to evolution algorithm search space.

from nncf.

mkaglins commented on May 28, 2024

Experiments results summary:

Imagenet:

Resnet18, FLOPs pruning rate=30%

Algorithm	Original model	Filter pruning + geomean	LeGR
top1@acc	69.64	68.72	69.43

LeGR shows significantly better results on resnet-18, Imagenet.

CIFAR-100:

Algorithms descriptions:

Filter pruning with geomean – current Filter Pruning algorithm with the geometric median as filter importance function.
LeGR – LeGR algorithm trained on the bigger pruning rate (0.8) and after used trained ranking coefficients to prune and fine-tune model with different pruning rates. Three trials are made to test algorithm stability.
LeGR, 200 generations – LeGR algorithm (same as LeGR above), but with 200 generations of the evolution algorithm instead of default 400.
LeGR, progressive - LeGR algorithm (same as LeGR above), but with progressive fine-tuning (exponential scheduler and 15 pruning steps).

Resnet-18-cifar results:

Original acc = 75.51

LeGR:

Algorithm\FLOPs PR	0.4	0.6	0.8
Filter pruning + geomean	74,97	74,29	68,69
LeGR	74,12	73,77	68,25
LeGR	74,69	74,07	68,52
LeGR	74,83	73,56	72,17
MEAN(LeGR)	74,55	73,80	69,65
STD(LeGR)	0,38	0,26	2,19

LeGR, 200 generations:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, 200 generations	74,60	72,95	71,81
LeGR, 200 generations	74,33	73,13	69,43
LeGR, 200 generations	74,48	73,37	68,55
MEAN(LeGR, 200 generations)	74,47	73,15	69,93
STD(LeGR, 200 generations)	0,14	0,21	1,69

LeGR with progressive fine-tuning:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, progressive	74,30	74,20	72,94
LeGR, progressive	75,27	74,47	73,14
LeGR, progressive	74,58	74,01	73,38
MEAN(LeGR, progressive)	74,72	74,23	73,15
STD(LeGR, progressive)	0,50	0,23	0,22

Resnet-50-cifar results:

Original acc=75.1%

LeGR:

Algorithm\FLOPs PR	0.4	0.6	0.8
Filter pruning + geomean	75,05	75,05	74,53
LeGR	75,46	75,53	75,11
LeGR	75,94	75,17	74,62
LeGR	75,58	75,33	75,05
MEAN(LeGR)	75,66	75,34333	74,92667
STD(LeGR)	0,2498	0,18037	0,26727

LeGR, 200 generations:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, 200 generations	75,75	75,28	74,98
LeGR, 200 generations	75,47	75,39	74,82
LeGR, 200 generations	75,66	75,47	74,49
MEAN(LeGR, 200 generations)	75,62667	75,38	74,76333
STD(LeGR, 200 generations)	0,142945	0,095394	0,249867

LeGR with progressive fine-tuning:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, progressive	75,02	75,56	74,51
LeGR, progressive	75,22	74,92	75,05
LeGR, progressive	75,33	75,17	74,71
MEAN(LeGR, progressive)	75,19	75,21667	74,75667
STD(LeGR, progressive)	0,157162	0,322542	0,273008

Inception_v3 results:

Original acc=77.7%

LeGR:

Algorithm\FLOPs PR	0.4	0.6	0.8
Filter pruning + geomean	78,17	77,68	75,51
LeGR	78,1	76,63	74,9
LeGR	77,8	78	73,59
MEAN(LeGR)	77,95	77,315	74,245
STD(LeGR)	0,212132	0,968736	0,92631

LeGR, 200 generations:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, 200 generations	78,02	77,6	75,59
LeGR, 200 generations	78,01	76,86	75,73
LeGR, 200 generations	77,88	77,24	75,72
MEAN(LeGR, 200 generations)	77,97	77,23333	75,68
STD(LeGR, 200 generations)	0,078102	0,370045	0,078102

LeGR with progressive fine-tuning:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, progressive	78,05	77,79	77,25
LeGR, progressive	78,13	78,13	77,07
LeGR, progressive	78,09	78,07	77,80
MEAN(LeGR, progressive)	78,09	78,00	77,37
STD(LeGR, progressive)	0,04	0,18	0,38

MobilenetV2 results:

Original acc=65,65%

LeGR:

Algorithm\FLOPs PR	0.4	0.6	0.8
Filter pruning + geomean	62,68	55,46	45,78
LeGR	66,02	63,62	56,70
LeGR	65,52	62,51	54,29
LeGR	65,32	63,54	55,79
MEAN(LeGR)	65,62	63,22	55,59
STD(LeGR)	0,36	0,62	1,22

LeGR, 200 generations:

Algorithm\FLOPs PR	0.4	0.6	0.8
LeGR, 200 generations	65,64	63,40	54,36
LeGR, 200 generations	65,41	62,43	54,90
LeGR, 200 generations	65,39	63,48	52,75
MEAN(LeGR, 200 generations)	65,48	63,10	54,00
STD(LeGR, 200 generations)	0,14	0,58	1,12

from nncf.

mkaglins commented on May 28, 2024

Results summary:

LeGR vs Filter Pruning:

There is no definite result of comparison: on resnet18 LeGR is better on 0.8 pruning rates and worse on other, on resnet-50 LeGR is significantly better with all pruning rate values, on inception_v3 LrGR is unstable and on average worse than original Filter pruning on all pruning rates.
Large variance of results across different trials

LeGR vs LeGR with 200 generations of evolution algorithm:

LeGR with 200 generations shows on the average better (or comparable) results on all models
Variance of final accuracy is significantly less in LeGR with 200 generations case

LeGR vs LeGR with progressive fine-tuning:

Progressive fine-tuning shows on average better results than LeGR (much better on the biggest pruning rate 0.8)
Progressive fine-tuning shows much less variance of final results

from nncf.

vshampor commented on May 28, 2024

LeGR went into the code base in #501.

from nncf.

Reimplementation of Filter Pruning Method from LeGR paper about nncf HOT 12 CLOSED

Comments (12)

Experiments results summary:

Imagenet:

CIFAR-100:

Results summary:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs