GithubHelp home page GithubHelp logo

openvinotoolkit / nncf Goto Github PK

View Code? Open in Web Editor NEW
828.0 828.0 205.0 48.46 MB

Neural Network Compression Framework for enhanced OpenVINO™ inference

License: Apache License 2.0

Python 99.31% C++ 0.08% Cuda 0.39% C 0.01% PureBasic 0.11% Makefile 0.09%
bert classification compression deep-learning hawq mixed-precision-training mmdetection nlp object-detection onnx openvino pruning pytorch quantization quantization-aware-training semantic-segmentation sparsity tensorflow transformers

nncf's People

Contributors

0de554k avatar a-ignatyev avatar alexanderdokuchaev avatar alexkoff88 avatar alexsu52 avatar andrey-churkin avatar andreyanufr avatar asenina avatar daniaffch avatar daniil-lyakhov avatar evgeniya-egupova avatar gadylshintr avatar jpablomch avatar kodiaqq avatar kshpv avatar ksilligan avatar l-bat avatar ljaljushkin avatar lzrvch avatar maximproshin avatar mkaglins avatar negvet avatar nikita-savelyevv avatar p-wysocki avatar pfinashx avatar skholkin avatar vinnamkim avatar vshampor avatar vuiseng9 avatar wonjuleee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nncf's Issues

Pytorch 1.6.0 seem to leak memory in conv2d

I'm using Pytorch 1.6.0, and there is only one conv, my codes are as the follows:
inputs = torch.randn(1, 3, 512, 512).cuda()
conv = torch.nn.Conv2d(3, 64, (7, 7), stride=2, padding=3, bias=False).cuda()
output = conv(inputs)
Before execute the conv operation, the GPU memory usage from the nvidia-smi is 1019MB, after the conv operation the GPU memory usage is 1429, and the conv operation consume about 410MB, I know the im2col may consume a huge memory when input size is large. What i can't understand is that after the conv operation the GPU usage not become low. I think there is maybe memory leak in conv2d or there is something wrong in my experiment?

How to cite your work?

Hi,
do you have a publication or arxiv for NNCF? How should I cite your work properly?

Can i use it to train a model of ssd512 only?

I used it to train a model of ssd512_vgg, but it crashed because of NotImplementedError of compression_ctrl.compression_level().I did not config a compression algorithm in ssd512_vgg_voc.json, can i do it in this way?
INFO:nncf:Creating compression algorithm: NoCompressionAlgorithmBuilder
WARNING:nncf:Graphviz is not installed - only the .dot model visualization format will be used. Install pygraphviz into your Python environment and graphviz system-wide to enable PNG rendering.
Training ssd_vgg on coco dataset...
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
0: iter 0 epoch 0 || Loss: 2.728 || Time 0.4711s || lr: 0.0001 || CR loss: 0
0: iter 10 epoch 0 || Loss: 2.714 || Time 2.046s || lr: 0.0001 || CR loss: 0
0: iter 20 epoch 0 || Loss: 3.65 || Time 1.951s || lr: 0.0001 || CR loss: 0
0: iter 30 epoch 0 || Loss: 3.013 || Time 1.964s || lr: 0.0001 || CR loss: 0
0: iter 40 epoch 0 || Loss: 2.639 || Time 1.952s || lr: 0.0001 || CR loss: 0
0: iter 50 epoch 0 || Loss: 2.53 || Time 1.956s || lr: 0.0001 || CR loss: 0
0: iter 60 epoch 0 || Loss: 2.034 || Time 1.957s || lr: 0.0001 || CR loss: 0
0: iter 70 epoch 0 || Loss: 1.776 || Time 1.953s || lr: 0.0001 || CR loss: 0
0: iter 80 epoch 0 || Loss: 1.496 || Time 1.965s || lr: 0.0001 || CR loss: 0
0: iter 90 epoch 0 || Loss: 1.95 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 100 epoch 0 || Loss: 1.523 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 110 epoch 0 || Loss: 2.282 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 120 epoch 0 || Loss: 1.326 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 130 epoch 0 || Loss: 1.398 || Time 1.967s || lr: 0.0001 || CR loss: 0
0: iter 140 epoch 0 || Loss: 1.422 || Time 1.981s || lr: 0.0001 || CR loss: 0
0: iter 150 epoch 0 || Loss: 1.011 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 160 epoch 0 || Loss: 1.024 || Time 1.976s || lr: 0.0001 || CR loss: 0
0: iter 170 epoch 0 || Loss: 1.283 || Time 1.974s || lr: 0.0001 || CR loss: 0
0: iter 180 epoch 0 || Loss: 1.035 || Time 1.977s || lr: 0.0001 || CR loss: 0
0: iter 190 epoch 0 || Loss: 0.9065 || Time 1.991s || lr: 0.0001 || CR loss: 0
0: iter 200 epoch 0 || Loss: 1.312 || Time 1.995s || lr: 0.0001 || CR loss: 0
0: iter 210 epoch 0 || Loss: 1.238 || Time 1.976s || lr: 0.0001 || CR loss: 0
Traceback (most recent call last):
File "main.py", line 378, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 188, in main_worker
train(net, compression_ctrl, train_data_loader, test_data_loader, criterion, optimizer, config, lr_scheduler)
File "main.py", line 301, in train
compression_level = compression_ctrl.compression_level()
File "/home/mechmind/projects/nncf_pytorch/nncf/compression_method_api.py", line 166, in compression_level
raise NotImplementedError()

Revise mixed-precision related content

I have 3 comments/proposal:

  • Please add compression_ratio into the template file inside the Quantization readme
  • Create a separate folder for the samples of configs on the same level as quantization, pruning, etc. Let's call it mixed_precision
  • Move all the hawq-related configs into this folder and minimize the scope of parameters in this config removing as much as possible and let them be the defaults ones.

Saving and Loading compressed model in pytorch as pytorch model object

I am facing a issue that when I try to torch.save(model, model_path), it is throwing TypeError: can't pickle odict_values objects error. For my project I want to save it as a torch compressed model object and load it for doing prediction on new images. If anyone can help me out here, it would be really great

compression_loss

compression_loss always equal zero in my trainling process, does it have same problems?

Slower inference with INT8 for NNCF compared to Post-Training Optimization Toolkit and FP32

Hi, thank you for providing these useful tools. Currently, I'm working on INT8 quantization on both NNCF and POT. I've noticed that the inference time of POT is faster than FP32, which totally makes sense; however, the inference time of NNCF not only is slower than POT but also slower than the original FP32. The benchmark tool results are as follows:

Original FP32:

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 57.23 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 294.24 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1842 iterations
Duration: 60044.23 ms
Latency: 32.15 ms
Throughput: 30.68 FPS

========================================================================

POT INT8:

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 86.67 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 411.73 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 3245 iterations
Duration: 60032.85 ms
Latency: 18.25 ms
Throughput: 54.05 FPS

===========================================================================

NNCF INT8:

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 114.43 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 599.62 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'result.1' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'result.1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1291 iterations
Duration: 60082.78 ms
Latency: 46.00 ms
Throughput: 21.49 FPS

===========================================================================

These results are conducted on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz .

We have found that there is a difference between the IR model of NNCF and POT, where the FakeQuantize Layer and the activation function happen to be in the opposite order, which leads to more parameters in FakeQuantize Layer in NNCF. The Neutron visualization results show as follows:

POT INT8:
pot

NNCF INT8:
nncf

Quantify model acceleration

Hello, how does the quantified model (int8) compare with the original model (fp32) in the acceleration of the inference process? Thank you!

Revise quantization levels for weights

The original problem is that QauntizeLinear and DequantizeLinear operation from ONNX do not support a shrunk range of quantization levels. So we can correctly export only levels of 256 for weights and activations. On the other hand 255 levels for weights were introduced to workaround the saturation issue on AVX targets. However, we do not know how it really affects accuracy and helps.

My proposal is to have the full range of 2^bits levels but use a different workaround for saturations about which we know that it really works. It is about using 128 levels of 256 in the following way:

y = w*a = [sw * wq] * [sa * aq] * 1/sw * 1/sa = [sw * wq / 2] * [sa * aq] * 2/sw * 1/sa

It means that we divide weights by the factor of 2.0 and adjust output scales of Dequatize operation (output_high and output_low in FQ) multiplying it by 2.0.

This is relevant for INT8 only!

We need to plan this for the next release. cc'ed @alexsu52, @kchechil

mmdetection onnx convertation

Hi! I've trained Cascade RCNN model with mmdetection by using your patch and modifying config. So for converting to onnx I've used in-build converter script in mmdetection. Looks like I've got the same model as before optimization.

Is it necessary to use your converter to onnx? How I can do it with mmdetection training pipeline?

I decided what models are same because after converting to opevino IR I've got the same inference performance. Also my onnx graph hasn't contain anything like 'FakeQuantize' layers.

Thanks,
Vladimir

Training Performance Degradation

We observed a drop in training time for about 28%. Details as follows.

For 30epochs of Resnet50 fine-tuning, the elapsed time gap between two commits is 6hrs.

python examples/classification/main.py \
    -m train \
    --config examples/classification/configs/quantization/resnet50_imagenet_mixed_int_manual.json \
    --data <imagenet_dataset_path> \
    --workers 16 \
    --log-dir ./resnet50_train_run

Environment A (commit: a0c1c2b): 43mins per epoch

mkdir nncf-a0c1c2bf && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a0c1c2bf 
pip install -r requirements.txt

0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.289 (0.297**) Data: 0.000 (0.002) CE_loss: 2.0819 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.0819 (2.2970) Acc@1: 54.054 (48.777) Acc@5: 67.568 (73.302)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.287 (0.297**) Data: 0.000 (0.002) CE_loss: 2.5287 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.5287 (2.2970) Acc@1: 54.054 (48.778) Acc@5: 70.270 (73.305)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.324 (0.297**) Data: 0.001 (0.002) CE_loss: 2.4854 (2.2966) CR_loss: 0.0000 (0.0000) Loss: 2.4854 (2.2966) Acc@1: 45.946 (48.784) Acc@5: 75.676 (73.311)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.288 (0.297**) Data: 0.000 (0.002) CE_loss: 2.7068 (2.2965) CR_loss: 0.0000 (0.0000) Loss: 2.7068 (2.2965) Acc@1: 37.838 (48.788) Acc@5: 62.162 (73.311)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.301 (0.297**) Data: 0.001 (0.002) CE_loss: 2.3907 (2.2962) CR_loss: 0.0000 (0.0000) Loss: 2.3907 (2.2962) Acc@1: 45.946 (48.794) Acc@5: 64.865 (73.316)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.281 (0.297**) Data: 0.000 (0.002) CE_loss: 2.2093 (2.2957) CR_loss: 0.0000 (0.0000) Loss: 2.2093 (2.2957) Acc@1: 51.351 (48.805) Acc@5: 72.973 (73.324)

Environment B (commit: a27da4f): 55mins per epoch

mkdir nncf-a27da4fb && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a27da4fb 
pip install -r requirements.txt

0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.072 (0.080) CE_loss: 2.0623 (2.2975) CR_loss: 0.0000 (0.0000) Loss: 2.0623 (2.2975) Acc@1: 56.757 (48.684) Acc@5: 75.676 (73.327)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.449 (0.382**) Data: 0.156 (0.080) CE_loss: 2.1209 (2.2974) CR_loss: 0.0000 (0.0000) Loss: 2.1209 (2.2974) Acc@1: 43.243 (48.684) Acc@5: 78.378 (73.327)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.073 (0.080) CE_loss: 1.9419 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 1.9419 (2.2970) Acc@1: 59.459 (48.691) Acc@5: 81.081 (73.334)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.368 (0.382**) Data: 0.073 (0.080) CE_loss: 2.2480 (2.2967) CR_loss: 0.0000 (0.0000) Loss: 2.2480 (2.2967) Acc@1: 45.946 (48.696) Acc@5: 72.973 (73.338)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.386 (0.382**) Data: 0.085 (0.080) CE_loss: 2.2206 (2.2964) CR_loss: 0.0000 (0.0000) Loss: 2.2206 (2.2964) Acc@1: 56.757 (48.701) Acc@5: 72.973 (73.344)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.346 (0.382**) Data: 0.068 (0.080) CE_loss: 1.9694 (2.2959) CR_loss: 0.0000 (0.0000) Loss: 1.9694 (2.2959) Acc@1: 48.649 (48.710) Acc@5: 78.378 (73.352)

Common Setup for Both Environment

Hardware: Xeon-Gold, 4xV100
python: 3.7.6
torch: 1.6.0
cuda:10.2

Performance gap in mmdetection

I'm sorry. I am back.
I have been tried retinanet_r50_fpn_1x_int8.py in mmdetection, there is nothing wrong in training and evaluation.
But i only got this result in coco2017 val, docs showed that retinanet can reached 34.7 or 35.3 average box mAP on the coco_2017_val dataset.
My result are as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.260
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.272
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.258
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.425
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.453
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.260
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600

I didn't change anything in config file expect changing samples_per_gpu from 6 to 4 because my gpu can't allocate enough memory. My cuda version is 10.2, pytorch is 1.6.0.
If you need any other information please contact me

Quantize Mask-RCNN

Quantize Mask-RCNN to INT8 so that it has <1% acc drop comparing to FP32.
This includes generation of the following models in ONNX format as output:

  • model with FakeQuantize
  • model with QuantizeLinear/DequantizeLinear

Accuracy results are needed as well.

cc @AlexKoff88

OpenVINO test

Hi,

I managed to take one of the detection models and successfully converted it with the mo_onnx.py (provided by the OpenVino toolkit) to generate the binary and the xml file. However, I have not found any documentation on how to run such models on OpenVino. I have already run models from tensorflow object detection but I'm interested in running your quantized model on OpenVino. So, If you could provide some sample scripts on this, that would be great. Thank you.

Incompatibility with python 3.8

I noticed incompatibility of NNCF with python 3.8.
The problem occurs during installation of one of the dependencies of NNCF and it seems to be caused by the fact that platform.linux_distribution was removed in Python 3.8:

  ​Downloading matplotlib-3.0.3.tar.gz (36.6 B)
    ​ERROR: Command errored out with exit status :
     ​command: /opt/home/k8sworker/cibuilds/impt/nncf_for_digits-9/src/model_templates/.venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-0zqb86kn/matplotlib/pip-egg-info
        ​cwd: /tmp/pip-install-0zqb86kn/matplotlb/
    ​Complete output (51 line):
    ​Traceback (most recent call last)
      ​File "<string>", line 1, in <module
      ​File "/tmp/pip-install-0zqb86kn/matplotlib/setup.py", line 225, in <module>
       ​msg = pkg.install_help_msg(
      ​File "/tmp/pip-install-0zqb86kn/matplotlib/setupext.py", line 650, in install_help_msg
       ​release = platform.linux_distribution()[0].lowe()
    ​AttributeError: module 'platform' has no attribute 'linux_distributin'```

MMDetection fine tuning error

I am getting the following error on running the demo retinanet_r50_fpn_1x_int8.py example. Any suggestions of what could be causing it?
args_kwargs_tuple = data_loader.get_inputs(loaded_item) File "/home/.conda/envs/nncf2/lib/python3.6/site-packages/nncf-1.4.1-py3.6.egg/nncf/initialization.py", line 56, in get_inputs raise NotImplementedError NotImplementedError

I had followed the master branch of nncf and mmdet commit id: c77ccbbf235c0eb50a4440698eefc2ae199f837f

[Quantization] Support for fusing non-relu activations

In the pattern-based approach, NNCF interprets non-relu activations as a single operation for which the input must be quantized. That is, the Fake Quantize operation is inserted into the graph before non-relu activations. This blocks fusing non-relu activation to a core operation like conv.

Percentile-based initialization fails in per-channel quantization case

File "/nncf/initialization.py", line 170, in _apply_initializers initializer.apply_init() File "/nncf/quantization/init_range.py", line 223, in apply_init self.quantize_module.apply_minmax_init(mins_tensor, maxs_tensor, self.log_module_name) File "/nncf/quantization/layers.py", line 293, in apply_minmax_init self.scale.masked_scatter_(torch.gt(abs_max, SCALE_LOWER_THRESHOLD), abs_max) RuntimeError: invalid argument 2: source nElements must be == mask1 elements at /pytorch/aten/src/THC/generic/THCTensorMasked.cu:134

Should cover this case in pre-commit tests.

Locked when training

it seems that the program will produce the directory with "/tmp/torch_extensions", the will be locked when running in the second time if the first time failed.

KeyError: 'quantization_range_init_args'

This error occurs when i quantize FP32 pretrained model,is this a bug?
Traceback (most recent call last):
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 961, in init_range
range_init_args = self.quantization_config.get_extra_struct(QuantizationRangeInitArgs)
File "/home/mechmind/projects/nncf_pytorch/nncf/config.py", line 56, in get_extra_struct
return self.__nncf_extra_structs[struct_cls.get_id()]
KeyError: 'quantization_range_init_args'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 381, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 152, in main_worker
compression_ctrl, net = create_model(config, resuming_model_state_dict)
File "main.py", line 239, in create_model
compression_ctrl, compressed_model = create_compressed_model(ssd_net, config.nncf_config, resuming_model_sd)
File "/home/mechmind/projects/nncf_pytorch/nncf/model_creation.py", line 126, in create_compressed_model
compression_ctrl = compressed_model.commit_compression_changes()
File "/home/mechmind/projects/nncf_pytorch/nncf/nncf_network.py", line 416, in commit_compression_changes
return self._builders[0].build_controller(self)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 200, in build_controller
self._hw_precision_constraints)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 816, in init
self.initialize_quantizer_params()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 893, in initialize_quantizer_params
self.init_range()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 964, in init_range
'Should run range initialization as specified via config,'
ValueError: Should run range initialization as specified via config,but the initializing data loader is not provided as an extra struct. Refer to NNCFConfig.register_extra_structs and the QuantizationRangeInitArgs class

compress loss = 0

after integrate nncf into mmdetection, when training efficientnet (classification task).
compression_loss = compression_ctrl.loss()

compression_loss = 0

Subclass QuantizerConfig for INT-N and BFP specialization

Some class-level specialization might be in order here, otherwise we end up with a situation when INT-N only uses a half of the available config structs, and certain quantizer configs won't correspond to any real quantizer.
class IntNQuantizerConfig(QuantizerConfig):, class BFPQuantizerConfig(QuantizerConfig): - what do you think?

Originally posted by @vshampor in #137 (comment)

compressed model

How can get the compressed model and find the compression ratio which is an important concern in deep compression?

examples/classification error

Hi,Run the main.py of examples/classification and encountered an error: FileNotFoundError: [Errno 2] No such file or directory:'/home/sky/anaconda3/lib/python3.7/site-packages/nncf-1.3.2- py3.7.egg/nncf/extensions/src/quantization/cpu/functions_cpu.cpp',
How to deal with this error, thank you!

Mixed Precision missquantization

While training object detection model through hawq config, I realized there are much more int8 activation quantizations then int8 weight quantizations. According to netron some convolutions take int8 activations and int4 weights. I suppose that's not how things should be. What do you think?
image

NNCF skips insert position for FakeQuanizer

Comparing models graphs of compressed ssd300 by NNCF and POT it was noticed that they are not the same however one of the requirements to NNCF is to build the POT-like graph. Moreover, it seems that NNCF didn't put several FakeQuanztizers where I expected. There are two images below:

  1. There is illustrated the start point of the NNCF-compressed model graph. I suppose that there should be one more FakeQuantizer (I underlined a location by a red pen)
  2. There is one more place where NNCF differs with POT graphs. The same model's location are underlined by red pen. NNCF didn't put any FakeQuantizer there, while POT did. The left image corresponds to POT and right to NNCF

If you would like to have a look at the full model's graphs. Please contact me, I will share them here or privately.

ssd300_int8_LI (3)
InkedPOT_NNCF_graphs_ssd300_int8_LI (3)
Left image - POT; right image - NNCF

How to get int8 IR model (via int8 onnx model)? OpenVINO mo.py --data-type options only supports fp16,fp32

I want to finetune a detection model with int8-awareness and convert it to INT8 IR model to achieve acceleration.
The problem is that I cannot find the way to export INT8 IR model.

I found below statement here, but the tutorial link is just a link to OpenVINO top page.

To export a model to OpenVINO IR and run it using Intel Deep Learning Deployment Toolkit please refer to this tutorial.

I also searched precision-related pages like this in OpenVINO Developer Guides,
but could not find any helpful info.

I know ModelOptmizer tool mo.py can convert onnx to IR model,
but --data-type options only supports fp16 and fp32.

Long graph processing for quantization DENSENET161

Long work of creating a compressed model of the quantization algorithm for DENSENET161 (looks to me like a loop while processing a graph)

Steps to reproduce:
0. Create python3.6 env

  1. Install nncf (use instructions from README)
  2. Run in terminal: python examples/classification/main.py --config examples/classification/configs/quantization/densenet161_imagenet_custom_quant_pattern.json --data <path_to_dataset>

Variability in SSD Mixed-Precision Performance

We observed large variability in performance with SSD300(VGG) when we tried different combination of precision for weight and activation in test mode. From the collected number below, the best and worst are about 15X gap, inference per batch (size: 128) is 30secs for Int2 weights and Int8 activation for the worst case. The performance should impact fine-tuning mode as well.

NNCF Version: Develop branch with commit 2a681b8
Similar observations with v1.4
Baseline config: https://github.com/openvinotoolkit/nncf_pytorch/blob/develop/examples/object_detection/configs/ssd300_vgg_voc_int8.json
Platform: V100 GPU

weights activations detection elapse
8 8 Detect for batch: 8/39 1.847s Detect for batch: 9/39 1.844s Detect for batch: 10/39 1.876s
8 4 Detect for batch: 8/39 8.792s Detect for batch: 9/39 9.021s Detect for batch: 10/39 8.928s
8 2 Detect for batch: 8/39 17.09s Detect for batch: 9/39 17.80s Detect for batch: 10/39 17.87s
4 8 Detect for batch: 8/39 2.283s Detect for batch: 9/39 2.105s Detect for batch: 10/39 2.296s
4 4 Detect for batch: 8/39 8.285s Detect for batch: 9/39 9.583s Detect for batch: 10/39 7.425s
4 2 Detect for batch: 8/39 11.31s Detect for batch: 9/39 11.85s Detect for batch: 10/39 12.61s
2 8 Detect for batch: 8/39 29.40s Detect for batch: 9/39 30.75s Detect for batch: 10/39 29.30s
2 4 Detect for batch: 8/39 5.684s Detect for batch: 9/39 5.703s Detect for batch: 10/39 5.539s
2 2 Detect for batch: 8/39 5.954s Detect for batch: 9/39 6.040s Detect for batch: 10/39 6.159s

Merge activation quantizers after HAWQ init in propagation mode.

Currently, merge activation quantizers is always happen in the case of consistent bit-width of all affected quantizers.
For example, as in the diagram below.
image

But HAWQ may choose a more accurate configuration when the merge is not possible
image

Before implementing this feature, some research of possible performance gain for both schemes is required (consider overhead for re-quantizations and compare which configuration is faster)

@asenina @vshampor @AlexKoff88

cifar10

I don't know how to train the CIFAR10 dataset, it always reports an error when there is no val folder, can someone tell me?

How to get mmdetection ssd300_coco_int8 quantized model?

I followed the branch and was running the ssd300_coco_int8 quantization aware training. I wanted to know how I can get the int8 models. I ran
python tools/train.py configs/nncf_compression/ssd/ssd300_coco_int8.py
and it creates a output folder inside which there are .pth files. But when I load these, it contains weights of type torch.cuda.FloatTensor which is 32 bit floating point. Please tell how I can get the (torch.int8) int8 model weights.

"Class conv_transpose2d is not found" when exporting a pruning-optimized model

I tried pruning optimization (pruning only) for my detection model.
I got following error when calling compression_ctrl.export_model()

  File "nncf_pytorch/nncf/compression_method_api.py", line 213, in export_model
    self.prepare_for_export()
  File "nncf_pytorch/nncf/pruning/filter_pruning/algo.py", line 204, in prepare_for_export
    model_pruner.prune_model()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 392, in prune_model
    self.mask_propagation()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 315, in mask_propagation
    cls = self.get_class_by_type_name(node_type)()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 303, in get_class_by_type_name
    raise RuntimeError("Class {} is not found".format(type_name))
RuntimeError: Class conv_transpose2d is not found

Is it a bug or torch.nn.ConvTranspose2d not supported?

Pruning itself seems working judging from the training log of Mask zero %, PR, Filter PR columns printed by print_statistics function is above 0.

Two question during using retinanet_r50_fpn_1x_int8 in mmdetection

When I trying to train retinanet_r50_fpn_1x_int8 demo in mmdetection, training process has no problem, when it come into evaluation it encounter problem as follows:

File "/home/amax/projects/mech_learning/tools/train.py", line 216, in main
meta=meta)
File "/home/amax/projects/mech_learning/mmdet/apis/train.py", line 149, in train_detector
compression_ctrl=compression_ctrl)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 46, in train
self.call_hook('after_train_epoch')
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self)
File "/home/amax/projects/mech_learning/mmdet/core/evaluation/eval_hooks.py", line 27, in after_train_epoch
results = single_gpu_test(runner.model, self.dataloader, show=False)
File "/home/amax/projects/mech_learning/mmdet/apis/test.py", line 36, in single_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/debug.py", line 82, in decorated
retval = forward_func(self, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/nncf_network.py", line 366, in forward
retval = self.get_nncf_wrapped_model()(*args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 156, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/single_stage.py", line 111, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 569, in get_bboxes
scale_factor, cfg, rescale)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 647, in _get_bboxes_single
cfg.max_per_img)
File "/home/amax/projects/mech_learning/mmdet/core/post_processing/bbox_nms.py", line 40, in multiclass_nms
bboxes = bboxes[valid_mask]
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 41, in wrapped
result = operator_info.custom_trace_fn(operator, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/patch_pytorch.py", line 71, in call
"input and output tensor count mismatch!".format(operator.name))
RuntimeError: Unable to forward trace through operator getitem - input and output tensor count mismatch!

Should I set --no-validate during training?

Second question, After training one epoch i got a checkpoint file and use the same config file for evaluation, when loading the model i got this error :
unexpected key in source state_dict: nncf_module.backbone.conv1.weight, nncf_module.backbone.conv1.pre_ops.0.op._num_bits, ...
missing keys in source state_dict: backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, ...
So the key in model is all mismatch and result is empty:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 14.3 task/s, elapsed: 350s, ETA: 0s
Evaluating bbox...
Loading and preparing results...
The testing results of the whole dataset is empty.

Is there something wrong in my operation ?

BatchNorm adaptation results

Some additional results regarding #41.

Forgetting (5 batches with momentum = 0.9, then 10 batches with momentum = 0.1) works as well as original resetting to zero and using 200 batches.

Iterative update of BN statistics layer by layer does not give an accuracy boost. Statistics from previous layers for a given layer are updated on-the-go due to rolling stats calculation and that is sufficient to get good accuracy.

Model Pruning algo info Accuracy@1 Accuracy@5
ResNet18 (BN adapted original, 200 steps) geometric median criterion, pruning target = 30% 33.582 59.336
ResNet18 (BN adapted w/ forgetting, 10 steps) geometric median criterion, pruning target = 30% 33.976 58.908
ResNet18 (BN adapted iteratively, 20 steps for each BN node) geometric median criterion, pruning target = 30% 33.830 58.712
Model Quantization bitwidths Quantization mode Range initializer Accuracy@1 Accuracy@5
ResNet18 (BN adapted original, 200 steps) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 66.866 87.476
ResNet18 (BN adapted w/ forgetting, 10 steps) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 66.798 87.490
ResNet18 (BN adapted iteratively, 20 steps for each BN node) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 66.832 87.480
MobilenetV2 (BN adapted original, 200 steps) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 65.216 86.304
MobilenetV2 (BN adapted w/ forgetting, 10 steps) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 65.112 86.170
MobilenetV2 (BN adapted iteratively, 20 steps for each BN node) a8w4 asymmetric, per-channel for weights mean min max, 100 batches 65.026 86.292

NMS CUDA kernel fails when it's running on multiple processes and on different GPUs.

NMS CUDA kernel fails when it's running on multiple processes and different GPU (even without wrapping by DistributedDataParallel and dist.init_process_group)

RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at line:
THCudaCheck(cudaMemcpy(&mask_host[0],
                    mask_dev,
                    sizeof(unsigned long long) * boxes_num * col_blocks,
                    cudaMemcpyDeviceToHost));

The same error occurs in multi process DistributedDataParallel mode on multiple GPU when the kernel is running after dist.init_process_group and before wrapping by DistributedDataParallel.

It's OK in a single GPU mode and when it's running from a single process on multiple GPU in DataParallel mode.

The workaround is to call the kernel after wrapping by DistributedDataParallel. But this kernel can be called on the creation of the compressed model which can happen before the wrapping by DistributedDataParallel only. This is where this issue comes from. I wanted to run create_compressed_model for SSD_VGG model in evaluation mode. This mode calls NMS and fails with the mentioned error.

Compression models in the evaluation may reduce training time by not quantizing auxiliary training branches and prevent errors with corrupting BatchNorm statistics on calling dummy_forward with random inputs for the model in training mode.

@alexsu52 @vshampor @vanyalzr @AlexKoff88

Quantize Pointrend

I compressed a model of pointrend based on mmdet-2.2.1,the training looks normal, but a error occurs when i convert it to onnx using functions of pytorch2onnx.py in mmdet-2.3.1 (commit id:6495391) . It seems like _bbox_forward() and _mask_forward() have some problems, do you know how to fix it? the error info is as follows:

File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 138, in forward_test
return self.forward_dummy(imgs[0])
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/two_stage.py", line 101, in forward_dummy
roi_outs = self.roi_head.forward_dummy(x, proposals)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 60, in forward_dummy
bbox_results = self._bbox_forward(x, rois)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 139, in _bbox_forward
x[:self.bbox_roi_extractor.num_inputs], rois)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 73, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois
)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 144, in forward
self.sample_num, self.aligned)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 30, in forward
aligned)
RuntimeError: roi_width >= 0 && roi_height >= 0 INTERNAL ASSERT FAILED at "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/src/cpu/roi_align_v2.cpp":134, please report a bug to PyTorch. ROIs in ROIAlign cannot have non-negative size!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.