Comments (12)
Branch with experiments code: https://github.com/mkaglins/nncf_pytorch/tree/mkaglins/RL_experiments
from nncf.
@mkaglins what is the status of this issue?
from nncf.
LeGR was re-implemented in NNCF on the branch: https://github.com/mkaglins/nncf_pytorch/tree/mkaglins/legr_impl
And baseline results for MobileNet v2 on CIFAR100 from this paper were reproduced successfully.
Global ranking coefficients were trained with settings from paper and LeGR github ( https://github.com/cmu-enyac/LeGR) and the following results were obtained:
Pruning Rate | Original acc | 80% | 87% | 90% |
---|---|---|---|---|
mobilenet_v2 Top1@acc | 73.47% | 73.64% | 72.26% | 71.2% |
To reproduce results next changes/settings in NNCF were made:
- MobilenetV2 model architecture and pretrained weights from https://github.com/cmu-enyac/LeGR
- Dataset (CIFAR-100) normalization params and train/validation transformations from https://github.com/cmu-enyac/LeGR
- Pruning quota = 0.1(algorithm can prune only 90% of every layer)
- Allow pruning last convolution (in current NNCF settings this is not possible)
from nncf.
@mkaglins, do we have results for Geomean method to compare with what you got on CIFAR?
from nncf.
No, this model with weights is from LeGR github. But I will run such an experiment to compare.
from nncf.
Filter Pruning algorithm (with geomen magnitude method) and same target flops pruning rate showed significately worse result than LeGR:
- with pruning rate= 80%, top1@acc = 68.6%
- with pruning rate= 90%, top1@acc = 65%
Experiment was conducted with same MobilenetV2 pretrained weights, dataset params and fine-tuning scheme as in LeGR case.
from nncf.
@mkaglins So the results are specific to thes particular MobileNetV2 weights? What about the ones from, say, torchvision? Would there be such a significant gap between LeGR and geomean+uniform results?
from nncf.
@vanyalzr such experiments to compare LeGR with the current Filter Pruning algorithms on different models are planned and in progress.
from nncf.
Further experiments are planned:
Experiments to compare LeGR with the current FP algorithm:
- LeGR comparison with current FP algorithm on Imagenet on already exposed pruned models (resnet18, resnet34, resnet50, googlenet, unet) (currently experiment on resnet18 are in progress)
- LeGR comparison with current FP algorithm on CIFAR-100 (resnet18, resnet50, inceptionv3, mobilenetv2)
Also some experiments for potential LeGR improvement:
- LeGR with fewer generation numbers (200 instead of 400)
- LeGR with progressive fine-tuning (exponential scheduler)
- LeGR with Batch-Norm adaptation instead of small training to estimate pruned model accuracy
- LeGR with the configuration from geomean pruning added to evolution algorithm search space.
from nncf.
Experiments results summary:
Imagenet:
Resnet18, FLOPs pruning rate=30%
Algorithm | Original model | Filter pruning + geomean | LeGR |
---|---|---|---|
top1@acc | 69.64 | 68.72 | 69.43 |
LeGR shows significantly better results on resnet-18, Imagenet.
CIFAR-100:
Algorithms descriptions:
- Filter pruning with geomean – current Filter Pruning algorithm with the geometric median as filter importance function.
- LeGR – LeGR algorithm trained on the bigger pruning rate (0.8) and after used trained ranking coefficients to prune and fine-tune model with different pruning rates. Three trials are made to test algorithm stability.
- LeGR, 200 generations – LeGR algorithm (same as LeGR above), but with 200 generations of the evolution algorithm instead of default 400.
- LeGR, progressive - LeGR algorithm (same as LeGR above), but with progressive fine-tuning (exponential scheduler and 15 pruning steps).
Resnet-18-cifar results:
Original acc = 75.51
LeGR:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
Filter pruning + geomean | 74,97 | 74,29 | 68,69 |
LeGR | 74,12 | 73,77 | 68,25 |
LeGR | 74,69 | 74,07 | 68,52 |
LeGR | 74,83 | 73,56 | 72,17 |
MEAN(LeGR) | 74,55 | 73,80 | 69,65 |
STD(LeGR) | 0,38 | 0,26 | 2,19 |
LeGR, 200 generations:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, 200 generations | 74,60 | 72,95 | 71,81 |
LeGR, 200 generations | 74,33 | 73,13 | 69,43 |
LeGR, 200 generations | 74,48 | 73,37 | 68,55 |
MEAN(LeGR, 200 generations) | 74,47 | 73,15 | 69,93 |
STD(LeGR, 200 generations) | 0,14 | 0,21 | 1,69 |
LeGR with progressive fine-tuning:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, progressive | 74,30 | 74,20 | 72,94 |
LeGR, progressive | 75,27 | 74,47 | 73,14 |
LeGR, progressive | 74,58 | 74,01 | 73,38 |
MEAN(LeGR, progressive) | 74,72 | 74,23 | 73,15 |
STD(LeGR, progressive) | 0,50 | 0,23 | 0,22 |
Resnet-50-cifar results:
Original acc=75.1%
LeGR:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
Filter pruning + geomean | 75,05 | 75,05 | 74,53 |
LeGR | 75,46 | 75,53 | 75,11 |
LeGR | 75,94 | 75,17 | 74,62 |
LeGR | 75,58 | 75,33 | 75,05 |
MEAN(LeGR) | 75,66 | 75,34333 | 74,92667 |
STD(LeGR) | 0,2498 | 0,18037 | 0,26727 |
LeGR, 200 generations:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, 200 generations | 75,75 | 75,28 | 74,98 |
LeGR, 200 generations | 75,47 | 75,39 | 74,82 |
LeGR, 200 generations | 75,66 | 75,47 | 74,49 |
MEAN(LeGR, 200 generations) | 75,62667 | 75,38 | 74,76333 |
STD(LeGR, 200 generations) | 0,142945 | 0,095394 | 0,249867 |
LeGR with progressive fine-tuning:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, progressive | 75,02 | 75,56 | 74,51 |
LeGR, progressive | 75,22 | 74,92 | 75,05 |
LeGR, progressive | 75,33 | 75,17 | 74,71 |
MEAN(LeGR, progressive) | 75,19 | 75,21667 | 74,75667 |
STD(LeGR, progressive) | 0,157162 | 0,322542 | 0,273008 |
Inception_v3 results:
Original acc=77.7%LeGR:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
Filter pruning + geomean | 78,17 | 77,68 | 75,51 |
LeGR | 78,1 | 76,63 | 74,9 |
LeGR | 77,8 | 78 | 73,59 |
MEAN(LeGR) | 77,95 | 77,315 | 74,245 |
STD(LeGR) | 0,212132 | 0,968736 | 0,92631 |
LeGR, 200 generations:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, 200 generations | 78,02 | 77,6 | 75,59 |
LeGR, 200 generations | 78,01 | 76,86 | 75,73 |
LeGR, 200 generations | 77,88 | 77,24 | 75,72 |
MEAN(LeGR, 200 generations) | 77,97 | 77,23333 | 75,68 |
STD(LeGR, 200 generations) | 0,078102 | 0,370045 | 0,078102 |
LeGR with progressive fine-tuning:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, progressive | 78,05 | 77,79 | 77,25 |
LeGR, progressive | 78,13 | 78,13 | 77,07 |
LeGR, progressive | 78,09 | 78,07 | 77,80 |
MEAN(LeGR, progressive) | 78,09 | 78,00 | 77,37 |
STD(LeGR, progressive) | 0,04 | 0,18 | 0,38 |
MobilenetV2 results:
Original acc=65,65%LeGR:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
Filter pruning + geomean | 62,68 | 55,46 | 45,78 |
LeGR | 66,02 | 63,62 | 56,70 |
LeGR | 65,52 | 62,51 | 54,29 |
LeGR | 65,32 | 63,54 | 55,79 |
MEAN(LeGR) | 65,62 | 63,22 | 55,59 |
STD(LeGR) | 0,36 | 0,62 | 1,22 |
LeGR, 200 generations:
Algorithm\FLOPs PR | 0.4 | 0.6 | 0.8 |
---|---|---|---|
LeGR, 200 generations | 65,64 | 63,40 | 54,36 |
LeGR, 200 generations | 65,41 | 62,43 | 54,90 |
LeGR, 200 generations | 65,39 | 63,48 | 52,75 |
MEAN(LeGR, 200 generations) | 65,48 | 63,10 | 54,00 |
STD(LeGR, 200 generations) | 0,14 | 0,58 | 1,12 |
from nncf.
Results summary:
LeGR vs Filter Pruning:
- There is no definite result of comparison: on resnet18 LeGR is better on 0.8 pruning rates and worse on other, on resnet-50 LeGR is significantly better with all pruning rate values, on inception_v3 LrGR is unstable and on average worse than original Filter pruning on all pruning rates.
- Large variance of results across different trials
LeGR vs LeGR with 200 generations of evolution algorithm:
- LeGR with 200 generations shows on the average better (or comparable) results on all models
- Variance of final accuracy is significantly less in LeGR with 200 generations case
LeGR vs LeGR with progressive fine-tuning:
- Progressive fine-tuning shows on average better results than LeGR (much better on the biggest pruning rate 0.8)
- Progressive fine-tuning shows much less variance of final results
from nncf.
LeGR went into the code base in #501.
from nncf.
Related Issues (20)
- Compressed models that call torch.is_floating_point() during inference are traced with runtime error.
- nncf + ultralytics yolov8 training-time compression HOT 7
- Ultralytics yolov8 QAT example HOT 1
- [Good First Issue] [NNCF] Make NNCF common utils code pass mypy checks HOT 23
- [Good First Issue] [NNCF] Make NNCF common accuracy aware training code pass mypy checks HOT 17
- [Good First Issue] [NNCF] Make NNCF common tensor statistics code pass mypy checks HOT 9
- [Good First Issue] [NNCF] Make NNCF common pruning code pass mypy checks HOT 14
- [Good First Issue] [NNCF] Make NNCF common graph code pass mypy checks HOT 26
- [Good First Issue] [NNCF] Make NNCF common sparsity code pass mypy checks HOT 6
- Thanks to our Contributors HOT 1
- [Good First Issue][NNCF]: Add INT8 weight compression conformance test for Tinyllama-1.1b PyTorch model HOT 19
- [Good First Issue][NNCF]: Fixing NNCFGraph export for visualization in Netron HOT 6
- Why doesn't the size and precision of the model change after INT4 quantization? HOT 2
- [Good First Issue][NNCF]: Optimize memory footprint by removing redundant collected statistics HOT 8
- [Good First Issue][NNCF]: Dump actual_subset_size to ov.Model HOT 8
- [Good First Issue][NNCF]: dump the ignored scope more gracefully HOT 4
- [Good First Issue][NNCF]: check number of u8, u4 constants in weight compression tests HOT 10
- PTQ of Fast R-CNN crashes in PyTorch backend HOT 1
- [Good First Issue][NNCF]: fix invalid error reporting in JSON schema HOT 19
- [Good First Issue][NNCF]: Add tests for torch device utils HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nncf.