openvinotoolkit / nncf Goto Github PK
View Code? Open in Web Editor NEWNeural Network Compression Framework for enhanced OpenVINO™ inference
License: Apache License 2.0
Neural Network Compression Framework for enhanced OpenVINO™ inference
License: Apache License 2.0
I'm using Pytorch 1.6.0, and there is only one conv, my codes are as the follows:
inputs = torch.randn(1, 3, 512, 512).cuda()
conv = torch.nn.Conv2d(3, 64, (7, 7), stride=2, padding=3, bias=False).cuda()
output = conv(inputs)
Before execute the conv operation, the GPU memory usage from the nvidia-smi is 1019MB, after the conv operation the GPU memory usage is 1429, and the conv operation consume about 410MB, I know the im2col may consume a huge memory when input size is large. What i can't understand is that after the conv operation the GPU usage not become low. I think there is maybe memory leak in conv2d or there is something wrong in my experiment?
Hi,
do you have a publication or arxiv for NNCF? How should I cite your work properly?
I used it to train a model of ssd512_vgg, but it crashed because of NotImplementedError of compression_ctrl.compression_level().I did not config a compression algorithm in ssd512_vgg_voc.json, can i do it in this way?
INFO:nncf:Creating compression algorithm: NoCompressionAlgorithmBuilder
WARNING:nncf:Graphviz is not installed - only the .dot model visualization format will be used. Install pygraphviz into your Python environment and graphviz system-wide to enable PNG rendering.
Training ssd_vgg on coco dataset...
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
0: iter 0 epoch 0 || Loss: 2.728 || Time 0.4711s || lr: 0.0001 || CR loss: 0
0: iter 10 epoch 0 || Loss: 2.714 || Time 2.046s || lr: 0.0001 || CR loss: 0
0: iter 20 epoch 0 || Loss: 3.65 || Time 1.951s || lr: 0.0001 || CR loss: 0
0: iter 30 epoch 0 || Loss: 3.013 || Time 1.964s || lr: 0.0001 || CR loss: 0
0: iter 40 epoch 0 || Loss: 2.639 || Time 1.952s || lr: 0.0001 || CR loss: 0
0: iter 50 epoch 0 || Loss: 2.53 || Time 1.956s || lr: 0.0001 || CR loss: 0
0: iter 60 epoch 0 || Loss: 2.034 || Time 1.957s || lr: 0.0001 || CR loss: 0
0: iter 70 epoch 0 || Loss: 1.776 || Time 1.953s || lr: 0.0001 || CR loss: 0
0: iter 80 epoch 0 || Loss: 1.496 || Time 1.965s || lr: 0.0001 || CR loss: 0
0: iter 90 epoch 0 || Loss: 1.95 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 100 epoch 0 || Loss: 1.523 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 110 epoch 0 || Loss: 2.282 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 120 epoch 0 || Loss: 1.326 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 130 epoch 0 || Loss: 1.398 || Time 1.967s || lr: 0.0001 || CR loss: 0
0: iter 140 epoch 0 || Loss: 1.422 || Time 1.981s || lr: 0.0001 || CR loss: 0
0: iter 150 epoch 0 || Loss: 1.011 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 160 epoch 0 || Loss: 1.024 || Time 1.976s || lr: 0.0001 || CR loss: 0
0: iter 170 epoch 0 || Loss: 1.283 || Time 1.974s || lr: 0.0001 || CR loss: 0
0: iter 180 epoch 0 || Loss: 1.035 || Time 1.977s || lr: 0.0001 || CR loss: 0
0: iter 190 epoch 0 || Loss: 0.9065 || Time 1.991s || lr: 0.0001 || CR loss: 0
0: iter 200 epoch 0 || Loss: 1.312 || Time 1.995s || lr: 0.0001 || CR loss: 0
0: iter 210 epoch 0 || Loss: 1.238 || Time 1.976s || lr: 0.0001 || CR loss: 0
Traceback (most recent call last):
File "main.py", line 378, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 188, in main_worker
train(net, compression_ctrl, train_data_loader, test_data_loader, criterion, optimizer, config, lr_scheduler)
File "main.py", line 301, in train
compression_level = compression_ctrl.compression_level()
File "/home/mechmind/projects/nncf_pytorch/nncf/compression_method_api.py", line 166, in compression_level
raise NotImplementedError()
I have 3 comments/proposal:
compression_ratio
into the template file inside the Quantization readmemixed_precision
Hello,
Thanks for this great project. But
Thanks.
I am facing a issue that when I try to torch.save(model, model_path)
, it is throwing TypeError: can't pickle odict_values objects
error. For my project I want to save it as a torch compressed model object and load it for doing prediction on new images. If anyone can help me out here, it would be really great
compression_loss always equal zero in my trainling process, does it have same problems?
Hi, thank you for providing these useful tools. Currently, I'm working on INT8 quantization on both NNCF and POT. I've noticed that the inference time of POT is faster than FP32, which totally makes sense; however, the inference time of NNCF not only is slower than POT but also slower than the original FP32. The benchmark tool results are as follows:
Original FP32:
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 57.23 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 294.24 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1842 iterations
Duration: 60044.23 ms
Latency: 32.15 ms
Throughput: 30.68 FPS
========================================================================
POT INT8:
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 86.67 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 411.73 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 3245 iterations
Duration: 60032.85 ms
Latency: 18.25 ms
Throughput: 54.05 FPS
===========================================================================
NNCF INT8:
[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4
[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 114.43 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 599.62 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'result.1' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'result.1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1291 iterations
Duration: 60082.78 ms
Latency: 46.00 ms
Throughput: 21.49 FPS
===========================================================================
These results are conducted on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz .
We have found that there is a difference between the IR model of NNCF and POT, where the FakeQuantize Layer and the activation function happen to be in the opposite order, which leads to more parameters in FakeQuantize Layer in NNCF. The Neutron visualization results show as follows:
Hello, how does the quantified model (int8) compare with the original model (fp32) in the acceleration of the inference process? Thank you!
The original problem is that QauntizeLinear and DequantizeLinear operation from ONNX do not support a shrunk range of quantization levels. So we can correctly export only levels of 256 for weights and activations. On the other hand 255 levels for weights were introduced to workaround the saturation issue on AVX targets. However, we do not know how it really affects accuracy and helps.
My proposal is to have the full range of 2^bits levels but use a different workaround for saturations about which we know that it really works. It is about using 128 levels of 256 in the following way:
y = w*a = [sw * wq] * [sa * aq] * 1/sw * 1/sa = [sw * wq / 2] * [sa * aq] * 2/sw * 1/sa
It means that we divide weights by the factor of 2.0 and adjust output scales of Dequatize operation (output_high and output_low in FQ) multiplying it by 2.0.
This is relevant for INT8 only!
We need to plan this for the next release. cc'ed @alexsu52, @kchechil
Is filter prunning supported for SSD based object detection models ?
Hi! I've trained Cascade RCNN model with mmdetection by using your patch and modifying config. So for converting to onnx I've used in-build converter script in mmdetection. Looks like I've got the same model as before optimization.
Is it necessary to use your converter to onnx? How I can do it with mmdetection training pipeline?
I decided what models are same because after converting to opevino IR I've got the same inference performance. Also my onnx graph hasn't contain anything like 'FakeQuantize' layers.
Thanks,
Vladimir
We observed a drop in training time for about 28%. Details as follows.
For 30epochs of Resnet50 fine-tuning, the elapsed time gap between two commits is 6hrs.
python examples/classification/main.py \
-m train \
--config examples/classification/configs/quantization/resnet50_imagenet_mixed_int_manual.json \
--data <imagenet_dataset_path> \
--workers 16 \
--log-dir ./resnet50_train_run
43mins
per epochmkdir nncf-a0c1c2bf && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a0c1c2bf
pip install -r requirements.txt
0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.289 (0.297**) Data: 0.000 (0.002) CE_loss: 2.0819 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.0819 (2.2970) Acc@1: 54.054 (48.777) Acc@5: 67.568 (73.302)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.287 (0.297**) Data: 0.000 (0.002) CE_loss: 2.5287 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.5287 (2.2970) Acc@1: 54.054 (48.778) Acc@5: 70.270 (73.305)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.324 (0.297**) Data: 0.001 (0.002) CE_loss: 2.4854 (2.2966) CR_loss: 0.0000 (0.0000) Loss: 2.4854 (2.2966) Acc@1: 45.946 (48.784) Acc@5: 75.676 (73.311)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.288 (0.297**) Data: 0.000 (0.002) CE_loss: 2.7068 (2.2965) CR_loss: 0.0000 (0.0000) Loss: 2.7068 (2.2965) Acc@1: 37.838 (48.788) Acc@5: 62.162 (73.311)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.301 (0.297**) Data: 0.001 (0.002) CE_loss: 2.3907 (2.2962) CR_loss: 0.0000 (0.0000) Loss: 2.3907 (2.2962) Acc@1: 45.946 (48.794) Acc@5: 64.865 (73.316)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.281 (0.297**) Data: 0.000 (0.002) CE_loss: 2.2093 (2.2957) CR_loss: 0.0000 (0.0000) Loss: 2.2093 (2.2957) Acc@1: 51.351 (48.805) Acc@5: 72.973 (73.324)
55mins
per epochmkdir nncf-a27da4fb && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a27da4fb
pip install -r requirements.txt
0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.072 (0.080) CE_loss: 2.0623 (2.2975) CR_loss: 0.0000 (0.0000) Loss: 2.0623 (2.2975) Acc@1: 56.757 (48.684) Acc@5: 75.676 (73.327)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.449 (0.382**) Data: 0.156 (0.080) CE_loss: 2.1209 (2.2974) CR_loss: 0.0000 (0.0000) Loss: 2.1209 (2.2974) Acc@1: 43.243 (48.684) Acc@5: 78.378 (73.327)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.073 (0.080) CE_loss: 1.9419 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 1.9419 (2.2970) Acc@1: 59.459 (48.691) Acc@5: 81.081 (73.334)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.368 (0.382**) Data: 0.073 (0.080) CE_loss: 2.2480 (2.2967) CR_loss: 0.0000 (0.0000) Loss: 2.2480 (2.2967) Acc@1: 45.946 (48.696) Acc@5: 72.973 (73.338)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.386 (0.382**) Data: 0.085 (0.080) CE_loss: 2.2206 (2.2964) CR_loss: 0.0000 (0.0000) Loss: 2.2206 (2.2964) Acc@1: 56.757 (48.701) Acc@5: 72.973 (73.344)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.346 (0.382**) Data: 0.068 (0.080) CE_loss: 1.9694 (2.2959) CR_loss: 0.0000 (0.0000) Loss: 1.9694 (2.2959) Acc@1: 48.649 (48.710) Acc@5: 78.378 (73.352)
Hardware: Xeon-Gold, 4xV100
python: 3.7.6
torch: 1.6.0
cuda:10.2
I'm sorry. I am back.
I have been tried retinanet_r50_fpn_1x_int8.py in mmdetection, there is nothing wrong in training and evaluation.
But i only got this result in coco2017 val, docs showed that retinanet can reached 34.7 or 35.3 average box mAP on the coco_2017_val dataset.
My result are as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.260
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.272
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.258
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.425
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.453
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.260
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
I didn't change anything in config file expect changing samples_per_gpu from 6 to 4 because my gpu can't allocate enough memory. My cuda version is 10.2, pytorch is 1.6.0.
If you need any other information please contact me
Quantize Mask-RCNN to INT8 so that it has <1% acc drop comparing to FP32.
This includes generation of the following models in ONNX format as output:
Accuracy results are needed as well.
cc @AlexKoff88
Hi,
I managed to take one of the detection models and successfully converted it with the mo_onnx.py (provided by the OpenVino toolkit) to generate the binary and the xml file. However, I have not found any documentation on how to run such models on OpenVino. I have already run models from tensorflow object detection but I'm interested in running your quantized model on OpenVino. So, If you could provide some sample scripts on this, that would be great. Thank you.
I noticed incompatibility of NNCF with python 3.8.
The problem occurs during installation of one of the dependencies of NNCF and it seems to be caused by the fact that platform.linux_distribution was removed in Python 3.8:
Downloading matplotlib-3.0.3.tar.gz (36.6 B)
ERROR: Command errored out with exit status :
command: /opt/home/k8sworker/cibuilds/impt/nncf_for_digits-9/src/model_templates/.venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-0zqb86kn/matplotlib/pip-egg-info
cwd: /tmp/pip-install-0zqb86kn/matplotlb/
Complete output (51 line):
Traceback (most recent call last)
File "<string>", line 1, in <module
File "/tmp/pip-install-0zqb86kn/matplotlib/setup.py", line 225, in <module>
msg = pkg.install_help_msg(
File "/tmp/pip-install-0zqb86kn/matplotlib/setupext.py", line 650, in install_help_msg
release = platform.linux_distribution()[0].lowe()
AttributeError: module 'platform' has no attribute 'linux_distributin'```
I am getting the following error on running the demo retinanet_r50_fpn_1x_int8.py example. Any suggestions of what could be causing it?
args_kwargs_tuple = data_loader.get_inputs(loaded_item) File "/home/.conda/envs/nncf2/lib/python3.6/site-packages/nncf-1.4.1-py3.6.egg/nncf/initialization.py", line 56, in get_inputs raise NotImplementedError NotImplementedError
I had followed the master branch of nncf and mmdet commit id: c77ccbbf235c0eb50a4440698eefc2ae199f837f
In the pattern-based approach, NNCF interprets non-relu activations as a single operation for which the input must be quantized. That is, the Fake Quantize operation is inserted into the graph before non-relu activations. This blocks fusing non-relu activation to a core operation like conv.
File "/nncf/initialization.py", line 170, in _apply_initializers initializer.apply_init() File "/nncf/quantization/init_range.py", line 223, in apply_init self.quantize_module.apply_minmax_init(mins_tensor, maxs_tensor, self.log_module_name) File "/nncf/quantization/layers.py", line 293, in apply_minmax_init self.scale.masked_scatter_(torch.gt(abs_max, SCALE_LOWER_THRESHOLD), abs_max) RuntimeError: invalid argument 2: source nElements must be == mask
1 elements at /pytorch/aten/src/THC/generic/THCTensorMasked.cu:134
Should cover this case in pre-commit tests.
it seems that the program will produce the directory with "/tmp/torch_extensions", the will be locked when running in the second time if the first time failed.
This error occurs when i quantize FP32 pretrained model,is this a bug?
Traceback (most recent call last):
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 961, in init_range
range_init_args = self.quantization_config.get_extra_struct(QuantizationRangeInitArgs)
File "/home/mechmind/projects/nncf_pytorch/nncf/config.py", line 56, in get_extra_struct
return self.__nncf_extra_structs[struct_cls.get_id()]
KeyError: 'quantization_range_init_args'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 381, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 152, in main_worker
compression_ctrl, net = create_model(config, resuming_model_state_dict)
File "main.py", line 239, in create_model
compression_ctrl, compressed_model = create_compressed_model(ssd_net, config.nncf_config, resuming_model_sd)
File "/home/mechmind/projects/nncf_pytorch/nncf/model_creation.py", line 126, in create_compressed_model
compression_ctrl = compressed_model.commit_compression_changes()
File "/home/mechmind/projects/nncf_pytorch/nncf/nncf_network.py", line 416, in commit_compression_changes
return self._builders[0].build_controller(self)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 200, in build_controller
self._hw_precision_constraints)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 816, in init
self.initialize_quantizer_params()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 893, in initialize_quantizer_params
self.init_range()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 964, in init_range
'Should run range initialization as specified via config,'
ValueError: Should run range initialization as specified via config,but the initializing data loader is not provided as an extra struct. Refer to NNCFConfig.register_extra_structs
and the QuantizationRangeInitArgs
class
after integrate nncf into mmdetection, when training efficientnet (classification task).
compression_loss = compression_ctrl.loss()
compression_loss = 0
Does NNCF support 1D convolutions?
I am trying to compress a ( CNN with 1D convolutions ) as encoder for AE model.
Thank you
Some class-level specialization might be in order here, otherwise we end up with a situation when INT-N only uses a half of the available config structs, and certain quantizer configs won't correspond to any real quantizer.
class IntNQuantizerConfig(QuantizerConfig):
, class BFPQuantizerConfig(QuantizerConfig):
- what do you think?
Originally posted by @vshampor in #137 (comment)
If inscribe
How can get the compressed model and find the compression ratio which is an important concern in deep compression?
Hi,Run the main.py of examples/classification and encountered an error: FileNotFoundError: [Errno 2] No such file or directory:'/home/sky/anaconda3/lib/python3.7/site-packages/nncf-1.3.2- py3.7.egg/nncf/extensions/src/quantization/cpu/functions_cpu.cpp',
How to deal with this error, thank you!
While training object detection model through hawq config, I realized there are much more int8 activation quantizations then int8 weight quantizations. According to netron some convolutions take int8 activations and int4 weights. I suppose that's not how things should be. What do you think?
Is there any way to support two inputs?
Comparing models graphs of compressed ssd300 by NNCF and POT it was noticed that they are not the same however one of the requirements to NNCF is to build the POT-like graph. Moreover, it seems that NNCF didn't put several FakeQuanztizers where I expected. There are two images below:
If you would like to have a look at the full model's graphs. Please contact me, I will share them here or privately.
I want to finetune a detection model with int8-awareness and convert it to INT8 IR model to achieve acceleration.
The problem is that I cannot find the way to export INT8 IR model.
I found below statement here, but the tutorial
link is just a link to OpenVINO top page.
To export a model to OpenVINO IR and run it using Intel Deep Learning Deployment Toolkit please refer to this tutorial.
I also searched precision-related pages like this in OpenVINO Developer Guides,
but could not find any helpful info.
I know ModelOptmizer tool mo.py
can convert onnx to IR model,
but --data-type options only supports fp16 and fp32.
Long work of creating a compressed model of the quantization algorithm for DENSENET161 (looks to me like a loop while processing a graph)
Steps to reproduce:
0. Create python3.6 env
python examples/classification/main.py --config examples/classification/configs/quantization/densenet161_imagenet_custom_quant_pattern.json --data <path_to_dataset>
We observed large variability in performance with SSD300(VGG) when we tried different combination of precision for weight and activation in test mode. From the collected number below, the best and worst are about 15X gap, inference per batch (size: 128) is 30secs for Int2 weights and Int8 activation for the worst case. The performance should impact fine-tuning mode as well.
NNCF Version: Develop branch with commit 2a681b8
Similar observations with v1.4
Baseline config: https://github.com/openvinotoolkit/nncf_pytorch/blob/develop/examples/object_detection/configs/ssd300_vgg_voc_int8.json
Platform: V100 GPU
weights | activations | detection elapse |
---|---|---|
8 | 8 | Detect for batch: 8/39 1.847s Detect for batch: 9/39 1.844s Detect for batch: 10/39 1.876s |
8 | 4 | Detect for batch: 8/39 8.792s Detect for batch: 9/39 9.021s Detect for batch: 10/39 8.928s |
8 | 2 | Detect for batch: 8/39 17.09s Detect for batch: 9/39 17.80s Detect for batch: 10/39 17.87s |
4 | 8 | Detect for batch: 8/39 2.283s Detect for batch: 9/39 2.105s Detect for batch: 10/39 2.296s |
4 | 4 | Detect for batch: 8/39 8.285s Detect for batch: 9/39 9.583s Detect for batch: 10/39 7.425s |
4 | 2 | Detect for batch: 8/39 11.31s Detect for batch: 9/39 11.85s Detect for batch: 10/39 12.61s |
2 | 8 | Detect for batch: 8/39 29.40s Detect for batch: 9/39 30.75s Detect for batch: 10/39 29.30s |
2 | 4 | Detect for batch: 8/39 5.684s Detect for batch: 9/39 5.703s Detect for batch: 10/39 5.539s |
2 | 2 | Detect for batch: 8/39 5.954s Detect for batch: 9/39 6.040s Detect for batch: 10/39 6.159s |
Conv_bn folding is mentioned in https://arxiv.org/pdf/1806.08342.pdf 3.2.2 for getting better QAT accuracy.
It has been implemented in pytorch (https://github.com/pytorch/pytorch/blob/master/torch/nn/intrinsic/qat/modules/conv_fused.py#L82-L92)
Would it be implemented in nncf? Thanks. (I think it is important for QAT from scratch.)
The idea is to have a more advanced Filter Pruning method to be able to show SOTA results in model compression/optimization.
I suggest reimplementing the method from here: https://github.com/cmu-enyac/LeGR and reproduce baseline results for MobileNet v2 on CIFAR100 as the first step.
cc'ed @vshampor, @vanyalzr.
Currently, merge activation quantizers is always happen in the case of consistent bit-width of all affected quantizers.
For example, as in the diagram below.
But HAWQ may choose a more accurate configuration when the merge is not possible
Before implementing this feature, some research of possible performance gain for both schemes is required (consider overhead for re-quantizations and compare which configuration is faster)
I don't know how to train the CIFAR10 dataset, it always reports an error when there is no val folder, can someone tell me?
Original paper: https://arxiv.org/pdf/1806.08342.pdf
PyTorch implementation: https://github.com/pytorch/pytorch/blob/master/torch/nn/intrinsic/qat/modules/conv_fused.py#L82-L92
Experiment with low-batch training scenarios such as Mask-R-CNN to determine whether adding BatchNorm folding to NNCF will improve general quantized model quality.
I followed the branch and was running the ssd300_coco_int8 quantization aware training. I wanted to know how I can get the int8 models. I ran
python tools/train.py configs/nncf_compression/ssd/ssd300_coco_int8.py
and it creates a output folder inside which there are .pth files. But when I load these, it contains weights of type torch.cuda.FloatTensor which is 32 bit floating point. Please tell how I can get the (torch.int8) int8 model weights.
HAWQ analysis may require more GPU memory, hence it would be beneficial to have different batch sizes for training and for precision initialization
@RikAllen
By default, the merge of activation quantizers should not happen if they are connected with weight quantizers that have different supported bit-width.
The merge can happen if the corresponding flag (e.g. allow_different_bitwidth_for_weight_and_activation or something shorter) is specified in the HW Config.
I tried pruning optimization (pruning only) for my detection model.
I got following error when calling compression_ctrl.export_model()
File "nncf_pytorch/nncf/compression_method_api.py", line 213, in export_model
self.prepare_for_export()
File "nncf_pytorch/nncf/pruning/filter_pruning/algo.py", line 204, in prepare_for_export
model_pruner.prune_model()
File "nncf_pytorch/nncf/pruning/export_helpers.py", line 392, in prune_model
self.mask_propagation()
File "nncf_pytorch/nncf/pruning/export_helpers.py", line 315, in mask_propagation
cls = self.get_class_by_type_name(node_type)()
File "nncf_pytorch/nncf/pruning/export_helpers.py", line 303, in get_class_by_type_name
raise RuntimeError("Class {} is not found".format(type_name))
RuntimeError: Class conv_transpose2d is not found
Is it a bug or torch.nn.ConvTranspose2d not supported?
Pruning itself seems working judging from the training log of Mask zero %, PR, Filter PR
columns printed by print_statistics function is above 0.
When I trying to train retinanet_r50_fpn_1x_int8 demo in mmdetection, training process has no problem, when it come into evaluation it encounter problem as follows:
File "/home/amax/projects/mech_learning/tools/train.py", line 216, in main
meta=meta)
File "/home/amax/projects/mech_learning/mmdet/apis/train.py", line 149, in train_detector
compression_ctrl=compression_ctrl)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 46, in train
self.call_hook('after_train_epoch')
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self)
File "/home/amax/projects/mech_learning/mmdet/core/evaluation/eval_hooks.py", line 27, in after_train_epoch
results = single_gpu_test(runner.model, self.dataloader, show=False)
File "/home/amax/projects/mech_learning/mmdet/apis/test.py", line 36, in single_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/debug.py", line 82, in decorated
retval = forward_func(self, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/nncf_network.py", line 366, in forward
retval = self.get_nncf_wrapped_model()(*args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 156, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/single_stage.py", line 111, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 569, in get_bboxes
scale_factor, cfg, rescale)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 647, in _get_bboxes_single
cfg.max_per_img)
File "/home/amax/projects/mech_learning/mmdet/core/post_processing/bbox_nms.py", line 40, in multiclass_nms
bboxes = bboxes[valid_mask]
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 41, in wrapped
result = operator_info.custom_trace_fn(operator, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/patch_pytorch.py", line 71, in call
"input and output tensor count mismatch!".format(operator.name))
RuntimeError: Unable to forward trace through operator getitem - input and output tensor count mismatch!
Should I set --no-validate during training?
Second question, After training one epoch i got a checkpoint file and use the same config file for evaluation, when loading the model i got this error :
unexpected key in source state_dict: nncf_module.backbone.conv1.weight, nncf_module.backbone.conv1.pre_ops.0.op._num_bits, ...
missing keys in source state_dict: backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, ...
So the key in model is all mismatch and result is empty:
[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 14.3 task/s, elapsed: 350s, ETA: 0s
Evaluating bbox...
Loading and preparing results...
The testing results of the whole dataset is empty.
Is there something wrong in my operation ?
why is the quanti-aware training very slow?(almost 2 times slower than float model training). Is there any way to speed up the quanti-aware training?
Some additional results regarding #41.
Forgetting (5 batches with momentum = 0.9, then 10 batches with momentum = 0.1) works as well as original resetting to zero and using 200 batches.
Iterative update of BN statistics layer by layer does not give an accuracy boost. Statistics from previous layers for a given layer are updated on-the-go due to rolling stats calculation and that is sufficient to get good accuracy.
Model | Pruning algo info | Accuracy@1 | Accuracy@5 |
---|---|---|---|
ResNet18 (BN adapted original, 200 steps) | geometric median criterion, pruning target = 30% | 33.582 | 59.336 |
ResNet18 (BN adapted w/ forgetting, 10 steps) | geometric median criterion, pruning target = 30% | 33.976 | 58.908 |
ResNet18 (BN adapted iteratively, 20 steps for each BN node) | geometric median criterion, pruning target = 30% | 33.830 | 58.712 |
Model | Quantization bitwidths | Quantization mode | Range initializer | Accuracy@1 | Accuracy@5 |
---|---|---|---|---|---|
ResNet18 (BN adapted original, 200 steps) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 66.866 | 87.476 |
ResNet18 (BN adapted w/ forgetting, 10 steps) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 66.798 | 87.490 |
ResNet18 (BN adapted iteratively, 20 steps for each BN node) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 66.832 | 87.480 |
MobilenetV2 (BN adapted original, 200 steps) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 65.216 | 86.304 |
MobilenetV2 (BN adapted w/ forgetting, 10 steps) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 65.112 | 86.170 |
MobilenetV2 (BN adapted iteratively, 20 steps for each BN node) | a8w4 | asymmetric, per-channel for weights | mean min max, 100 batches | 65.026 | 86.292 |
NMS CUDA kernel fails when it's running on multiple processes and different GPU (even without wrapping by DistributedDataParallel and dist.init_process_group)
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at line:
THCudaCheck(cudaMemcpy(&mask_host[0],
mask_dev,
sizeof(unsigned long long) * boxes_num * col_blocks,
cudaMemcpyDeviceToHost));
The same error occurs in multi process DistributedDataParallel mode on multiple GPU when the kernel is running after dist.init_process_group and before wrapping by DistributedDataParallel.
It's OK in a single GPU mode and when it's running from a single process on multiple GPU in DataParallel mode.
The workaround is to call the kernel after wrapping by DistributedDataParallel. But this kernel can be called on the creation of the compressed model which can happen before the wrapping by DistributedDataParallel only. This is where this issue comes from. I wanted to run create_compressed_model for SSD_VGG model in evaluation mode. This mode calls NMS and fails with the mentioned error.
Compression models in the evaluation may reduce training time by not quantizing auxiliary training branches and prevent errors with corrupting BatchNorm statistics on calling dummy_forward with random inputs for the model in training mode.
@alexsu52 @vshampor @vanyalzr @AlexKoff88
I compressed a model of pointrend based on mmdet-2.2.1,the training looks normal, but a error occurs when i convert it to onnx using functions of pytorch2onnx.py in mmdet-2.3.1 (commit id:6495391) . It seems like _bbox_forward() and _mask_forward() have some problems, do you know how to fix it? the error info is as follows:
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 138, in forward_test
return self.forward_dummy(imgs[0])
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/two_stage.py", line 101, in forward_dummy
roi_outs = self.roi_head.forward_dummy(x, proposals)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 60, in forward_dummy
bbox_results = self._bbox_forward(x, rois)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 139, in _bbox_forward
x[:self.bbox_roi_extractor.num_inputs], rois)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 73, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 144, in forward
self.sample_num, self.aligned)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 30, in forward
aligned)
RuntimeError: roi_width >= 0 && roi_height >= 0 INTERNAL ASSERT FAILED at "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/src/cpu/roi_align_v2.cpp":134, please report a bug to PyTorch. ROIs in ROIAlign cannot have non-negative size!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.