openvinotoolkit / nncf Goto Github PK

Neural Network Compression Framework for enhanced OpenVINO™ inference

License: Apache License 2.0

Python 99.33% C++ 0.07% Cuda 0.38% C 0.01% PureBasic 0.11% Makefile 0.09%

quantization pruning sparsity quantization-aware-training mixed-precision-training compression semantic-segmentation object-detection classification nlp

nncf's Introduction

Neural Network Compression Framework (NNCF)

Key Features • Installation • Documentation • Usage • Tutorials and Samples • Third-party integration • NNCF Model Zoo

Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in OpenVINO™ with a minimal accuracy drop.

NNCF is designed to work with models from PyTorch, TensorFlow, ONNX and OpenVINO™.

NNCF provides samples that demonstrate the usage of compression algorithms for different use cases and models. See compression results achievable with the NNCF-powered samples on the NNCF Model Zoo page.

The framework is organized as a Python* package that can be built and used in a standalone mode. The framework architecture is unified to make it easy to add different compression algorithms for both PyTorch and TensorFlow deep learning frameworks.

Key Features

Post-Training Compression Algorithms

Compression algorithm	OpenVINO	PyTorch	TensorFlow	ONNX
Post-Training Quantization	Supported	Supported	Supported	Supported
Weights Compression	Supported	Supported	Not supported	Not supported

Training-Time Compression Algorithms

Compression algorithm	PyTorch	TensorFlow
Quantization Aware Training	Supported	Supported
Mixed-Precision Quantization	Supported	Not supported
Sparsity	Supported	Supported
Filter pruning	Supported	Supported
Movement pruning	Experimental	Not supported

Automatic, configurable model graph transformation to obtain the compressed model.

NOTE: Limited support for TensorFlow models. Only models created using Sequential or Keras Functional API are supported.
Common interface for compression methods.
GPU-accelerated layers for faster compressed model fine-tuning.
Distributed training support.
Git patch for prominent third-party repository (huggingface-transformers) demonstrating the process of integrating NNCF into custom training pipelines.
Seamless combination of pruning, sparsity, and quantization algorithms. Please refer to optimum-intel for examples of joint (movement) pruning, quantization, and distillation (JPQD), end-to-end from NNCF optimization to compressed OpenVINO IR.
Exporting PyTorch compressed models to ONNX* checkpoints and TensorFlow compressed models to SavedModel or Frozen Graph format, ready to use with OpenVINO™ toolkit.
Support for Accuracy-Aware model training pipelines via the Adaptive Compression Level Training and Early Exit Training.

Documentation

This documentation covers detailed information about NNCF algorithms and functions needed for the contribution to NNCF.

The latest user documentation for NNCF is available here.

NNCF API documentation can be found here.

Usage

Post-Training Quantization

The NNCF PTQ is the simplest way to apply 8-bit quantization. To run the algorithm you only need your model and a small (~300 samples) calibration dataset.

OpenVINO is the preferred backend to run PTQ with, while PyTorch, TensorFlow, and ONNX are also supported.

OpenVINO

import nncf
import openvino.runtime as ov
import torch
from torchvision import datasets, transforms

# Instantiate your uncompressed model
model = ov.Core().read_model("/model_path")

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)

# Step 1: Initialize transformation function
def transform_fn(data_item):
    images, _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)

PyTorch

import nncf
import torch
from torchvision import datasets, models

# Instantiate your uncompressed model
model = models.mobilenet_v2()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset)

# Step 1: Initialize the transformation function
def transform_fn(data_item):
    images, _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)

NOTE If the Post-Training Quantization algorithm does not meet quality requirements you can fine-tune the quantized pytorch model. You can find an example of the Quantization-Aware training pipeline for a pytorch model here.

TensorFlow

import nncf
import tensorflow as tf
import tensorflow_datasets as tfds

# Instantiate your uncompressed model
model = tf.keras.applications.MobileNetV2()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = tfds.load("/path", split="validation",
                        shuffle_files=False, as_supervised=True)

# Step 1: Initialize transformation function
def transform_fn(data_item):
    images, _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(val_dataset, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)

ONNX

import onnx
import nncf
import torch
from torchvision import datasets

# Instantiate your uncompressed model
onnx_model = onnx.load_model("/model_path")

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1)

# Step 1: Initialize transformation function
input_name = onnx_model.graph.input[0].name
def transform_fn(data_item):
    images, _ = data_item
    return {input_name: images.numpy()}

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(onnx_model, calibration_dataset)

Training-Time Quantization

Here is an example of Accuracy Aware Quantization pipeline where model weights and compression parameters may be fine-tuned to achieve a higher accuracy.

PyTorch

import nncf
import torch
from torchvision import datasets, models

# Instantiate your uncompressed model
model = models.mobilenet_v2()

# Provide validation part of the dataset to collect statistics needed for the compression algorithm
val_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
dataset_loader = torch.utils.data.DataLoader(val_dataset)

# Step 1: Initialize the transformation function
def transform_fn(data_item):
    images, _ = data_item
    return images

# Step 2: Initialize NNCF Dataset
calibration_dataset = nncf.Dataset(dataset_loader, transform_fn)
# Step 3: Run the quantization pipeline
quantized_model = nncf.quantize(model, calibration_dataset)

# Now use compressed_model as a usual torch.nn.Module
# to fine-tune compression parameters along with the model weights

# Save quantization modules and the quantized model parameters
checkpoint = {
    'state_dict': model.state_dict(),
    'nncf_config': model.nncf.get_config(),
    ... # the rest of the user-defined objects to save
}
torch.save(checkpoint, path_to_checkpoint)

# ...

# Load quantization modules and the quantized model parameters
resuming_checkpoint = torch.load(path_to_checkpoint)
nncf_config = resuming_checkpoint['nncf_config']
state_dict = resuming_checkpoint['state_dict']

quantized_model = nncf.torch.load_from_config(model, nncf_config, example_input)
model.load_state_dict(state_dict)
# ... the rest of the usual PyTorch-powered training pipeline

Training-Time Compression

Here is an example of Accuracy Aware RB Sparsification pipeline where model weights and compression parameters may be fine-tuned to achieve a higher accuracy.

PyTorch

import torch
import nncf.torch  # Important - must be imported before any other external package that depends on torch

from nncf import NNCFConfig
from nncf.torch import create_compressed_model, register_default_init_args

# Instantiate your uncompressed model
from torchvision.models.resnet import resnet50
model = resnet50()

# Load a configuration file to specify compression
nncf_config = NNCFConfig.from_json("resnet50_imagenet_rb_sparsity.json")

# Provide data loaders for compression algorithm initialization, if necessary
import torchvision.datasets as datasets
representative_dataset = datasets.ImageFolder("/path", transform=transforms.Compose([transforms.ToTensor()]))
init_loader = torch.utils.data.DataLoader(representative_dataset)
nncf_config = register_default_init_args(nncf_config, init_loader)

# Apply the specified compression algorithms to the model
compression_ctrl, compressed_model = create_compressed_model(model, nncf_config)

# Now use compressed_model as a usual torch.nn.Module
# to fine-tune compression parameters along with the model weights

# ... the rest of the usual PyTorch-powered training pipeline

# Export to ONNX or .pth when done fine-tuning
compression_ctrl.export_model("compressed_model.onnx")
torch.save(compressed_model.state_dict(), "compressed_model.pth")

NOTE (PyTorch): Due to the way NNCF works within the PyTorch backend, import nncf must be done before any other import of torch in your package or in third-party packages that your code utilizes. Otherwise, the compression may be applied incompletely.

Tensorflow

import tensorflow as tf

from nncf import NNCFConfig
from nncf.tensorflow import create_compressed_model, register_default_init_args

# Instantiate your uncompressed model
from tensorflow.keras.applications import ResNet50
model = ResNet50()

# Load a configuration file to specify compression
nncf_config = NNCFConfig.from_json("resnet50_imagenet_rb_sparsity.json")

# Provide dataset for compression algorithm initialization
representative_dataset = tf.data.Dataset.list_files("/path/*.jpeg")
nncf_config = register_default_init_args(nncf_config, representative_dataset, batch_size=1)

# Apply the specified compression algorithms to the model
compression_ctrl, compressed_model = create_compressed_model(model, nncf_config)

# Now use compressed_model as a usual Keras model
# to fine-tune compression parameters along with the model weights

# ... the rest of the usual TensorFlow-powered training pipeline

# Export to Frozen Graph, TensorFlow SavedModel or .h5  when done fine-tuning
compression_ctrl.export_model("compressed_model.pb", save_format="frozen_graph")

For a more detailed description of NNCF usage in your training code, see this tutorial.

Demos, Tutorials and Samples

For a quicker start with NNCF-powered compression, try sample notebooks and scripts presented below.

Jupyter* Notebook Tutorials and Demos

Ready-to-run Jupyter* notebook tutorials and demos are available to explain and display NNCF compression algorithms for optimizing models for inference with the OpenVINO Toolkit:

Notebook Tutorial Name	Compression Algorithm	Backend	Domain
BERT Quantization	Post-Training Quantization	OpenVINO	NLP
MONAI Segmentation Model Quantization	Post-Training Quantization	OpenVINO	Segmentation
PyTorch Model Quantization	Post-Training Quantization	PyTorch	Image Classification
Quantization with Accuracy Control	Post-Training Quantization with Accuracy Control	OpenVINO	Speech-to-Text, Object Detection
PyTorch Training-Time Compression	Training-Time Compression	PyTorch	Image Classification
TensorFlow Training-Time Compression	Training-Time Compression	Tensorflow	Image Classification
Joint Pruning, Quantization and Distillation for BERT	Joint Pruning, Quantization and Distillation	OpenVINO	NLP

A list of notebooks demonstrating OpenVINO conversion and inference together with NNCF compression for models from various domains:

Demo Model	Compression Algorithm	Backend	Domain
YOLOv8	Post-Training Quantization	OpenVINO	Object Detection, KeyPoint Detection, Instance Segmentation
YOLOv7	Post-Training Quantization	OpenVINO	Object Detection
EfficientSAM	Post-Training Quantization	OpenVINO	Image Segmentation
Segment Anything Model	Post-Training Quantization	OpenVINO	Image Segmentation
OneFormer	Post-Training Quantization	OpenVINO	Image Segmentation
InstructPix2Pix	Post-Training Quantization	OpenVINO	Image-to-Image
CLIP	Post-Training Quantization	OpenVINO	Image-to-Text
BLIP	Post-Training Quantization	OpenVINO	Image-to-Text
Segmind-VegaRT	Post-Training Quantization	OpenVINO	Text-to-Image
Latent Consistency Model	Post-Training Quantization	OpenVINO	Text-to-Image
Würstchen	Post-Training Quantization	OpenVINO	Text-to-Image
ControlNet QR Code Monster	Post-Training Quantization	OpenVINO	Text-to-Image
SDXL-turbo	Post-Training Quantization	OpenVINO	Text-to-Image, Image-to-Image
ImageBind	Post-Training Quantization	OpenVINO	Multi-Modal Retrieval
Distil-Whisper	Post-Training Quantization	OpenVINO	Speech-to-Text
Whisper	Post-Training Quantization	OpenVINO	Speech-to-Text
MMS Speech Recognition	Post-Training Quantization	OpenVINO	Speech-to-Text
Grammar Error Correction	Post-Training Quantization	OpenVINO	NLP, Grammar Correction
LLM Instruction Following	Weight Compression	OpenVINO	NLP, Instruction Following
Dolly 2.0	Weight Compression	OpenVINO	NLP, Instruction Following
Stable-Zephyr-3b	Weight Compression	OpenVINO	NLP, Chat Bot
LLM Chat Bots	Weight Compression	OpenVINO	NLP, Chat Bot

Post-Training Quantization Examples

Compact scripts demonstrating quantization and corresponding inference speed boost:

Example Name	Compression Algorithm	Backend	Domain
OpenVINO MobileNetV2	Post-Training Quantization	OpenVINO	Image Classification
OpenVINO YOLOv8	Post-Training Quantization	OpenVINO	Object Detection
OpenVINO YOLOv8 QwAС	Post-Training Quantization with Accuracy Control	OpenVINO	Object Detection
OpenVINO Anomaly Classification	Post-Training Quantization with Accuracy Control	OpenVINO	Anomaly Classification
PyTorch MobileNetV2	Post-Training Quantization	PyTorch	Image Classification
PyTorch SSD	Post-Training Quantization	PyTorch	Object Detection
TensorFlow MobileNetV2	Post-Training Quantization	TensorFlow	Image Classification
ONNX MobileNetV2	Post-Training Quantization	ONNX	Image Classification

Training-Time Compression Examples

Examples of full pipelines including compression, training, and inference for classification, detection, and segmentation tasks:

Example Name	Compression Algorithm	Backend	Domain
PyTorch Image Classification	Training-Time Compression	PyTorch	Image Classification
PyTorch Object Detection	Training-Time Compression	PyTorch	Object Detection
PyTorch Semantic Segmentation	Training-Time Compression	PyTorch	Semantic Segmentation
TensorFlow Image Classification	Training-Time Compression	TensorFlow	Image Classification
TensorFlow Object Detection	Training-Time Compression	TensorFlow	Object Detection
TensorFlow Instance Segmentation	Training-Time Compression	TensorFlow	Instance Segmentation

Third-party repository integration

NNCF may be easily integrated into training/evaluation pipelines of third-party repositories.

Used by

OpenVINO Training Extensions

NNCF is integrated into OpenVINO Training Extensions as a model optimization backend. You can train, optimize, and export new models based on available model templates as well as run the exported models with OpenVINO.
HuggingFace Optimum Intel

NNCF is used as a compression backend within the renowned transformers repository in HuggingFace Optimum Intel.

Installation Guide

For detailed installation instructions, refer to the Installation guide.

NNCF can be installed as a regular PyPI package via pip:

pip install nncf

NNCF is also available via conda:

conda install -c conda-forge nncf

System requirements

Ubuntu* 18.04 or later (64-bit)
Python* 3.8 or later
Supported frameworks:
- PyTorch* >=2.2, <2.4
- TensorFlow* >=2.8.4, <=2.15.1
- ONNX* ==1.16.0
- OpenVINO* >=2022.3.0

This repository is tested on Python* 3.8.10, PyTorch* 2.3.0 (NVidia CUDA* Toolkit 12.1) and TensorFlow* 2.12.1 (NVidia CUDA* Toolkit 11.8).

NNCF Compressed NNCF Model Zoo

List of models and compression results for them can be found at our NNCF Model Zoo page.

Citing

@article{kozlov2020neural,
    title =   {Neural network compression framework for fast model inference},
    author =  {Kozlov, Alexander and Lazarevich, Ivan and Shamporov, Vasily and Lyalyushkin, Nikolay and Gorbachev, Yury},
    journal = {arXiv preprint arXiv:2002.08679},
    year =    {2020}
}

Contributing Guide

Refer to the CONTRIBUTING.md file for guidelines on contributions to the NNCF repository.

Useful links

Documentation
Example scripts (model objects available through links in respective README.md files):
- PyTorch
- TensorFlow
FAQ
Notebooks
HuggingFace Optimum Intel
OpenVINO Model Optimization Guide

Telemetry

NNCF as part of the OpenVINO™ toolkit collects anonymous usage data for the purpose of improving OpenVINO™ tools. You can opt-out at any time by running the following command in the Python environment where you have NNCF installed:

opt_in_out --opt_out

More information available on OpenVINO telemetry.

nncf's People

Contributors

Stargazers

Watchers

Forkers

vshampor vashamporov ljaljushkin asenina alexkoff88 lzrvch mkaglins han324 bruinxiong nipi64310 dsp6414 pfinashx ranyajumah gnomonsis jeshy skholkin zehaos hust-wayne rikallen edwardnguyen1705 abrainsight kshpv vuiseng9 liuguoyou shaersh nerhun misakacloud neixlo a-a-egorovich mgibsonint leonidbeynenson devin-coder toydogcat krodyush alexanderdokuchaev bijonguha samerhjr xeverentx sheorangaurav88 youngboy52 cavalleria mk-nvidia alexsu52 gadylshintr evgeniya-egupova zbrnwpu andrey-churkin daniil-lyakhov vineet019 rblaczkowski hussam789 wei1tang sirius93123 longxianlei zhangliliang negvet macsz dqawami sobolevn zeta1999 mhamdan91 a-ignatyev yiweichen04 sshyran jawaechan xiaming9880 l-bat pandinosaurus eghouti jaynotleno 666dzy666 noktyrn faustpy generalova-kate dupeljan 0de554k ionstream hxl1990 sarthakpati ceciliapeng2011 sueheck mike-zyz yuhongjiu vixadd daodaoawaker zoucan520 vkoriukina 2021yy meshford dkurt vyashina rageshhajela16 tqcheng zhuyetuo metavai ethanchen916 upmem mkimhi greenwaves-technologies kodiaqq

nncf's Issues

NMS CUDA kernel fails when it's running on multiple processes and on different GPUs.

NMS CUDA kernel fails when it's running on multiple processes and different GPU (even without wrapping by DistributedDataParallel and dist.init_process_group)

RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at line:

THCudaCheck(cudaMemcpy(&mask_host[0],
                    mask_dev,
                    sizeof(unsigned long long) * boxes_num * col_blocks,
                    cudaMemcpyDeviceToHost));

The same error occurs in multi process DistributedDataParallel mode on multiple GPU when the kernel is running after dist.init_process_group and before wrapping by DistributedDataParallel.

It's OK in a single GPU mode and when it's running from a single process on multiple GPU in DataParallel mode.

The workaround is to call the kernel after wrapping by DistributedDataParallel. But this kernel can be called on the creation of the compressed model which can happen before the wrapping by DistributedDataParallel only. This is where this issue comes from. I wanted to run create_compressed_model for SSD_VGG model in evaluation mode. This mode calls NMS and fails with the mentioned error.

Compression models in the evaluation may reduce training time by not quantizing auxiliary training branches and prevent errors with corrupting BatchNorm statistics on calling dummy_forward with random inputs for the model in training mode.

@alexsu52 @vshampor @vanyalzr @AlexKoff88

How to get int8 IR model (via int8 onnx model)? OpenVINO mo.py --data-type options only supports fp16,fp32

I want to finetune a detection model with int8-awareness and convert it to INT8 IR model to achieve acceleration.
The problem is that I cannot find the way to export INT8 IR model.

I found below statement here, but the tutorial link is just a link to OpenVINO top page.

To export a model to OpenVINO IR and run it using Intel Deep Learning Deployment Toolkit please refer to this tutorial.

I also searched precision-related pages like this in OpenVINO Developer Guides,
but could not find any helpful info.

I know ModelOptmizer tool mo.py can convert onnx to IR model,
but --data-type options only supports fp16 and fp32.

Quantize Mask-RCNN

Quantize Mask-RCNN to INT8 so that it has <1% acc drop comparing to FP32.
This includes generation of the following models in ONNX format as output:

model with FakeQuantize
model with QuantizeLinear/DequantizeLinear

Accuracy results are needed as well.

cc @AlexKoff88

compress loss = 0

after integrate nncf into mmdetection, when training efficientnet (classification task).
compression_loss = compression_ctrl.loss()

compression_loss = 0

Quantify model acceleration

Hello, how does the quantified model (int8) compare with the original model (fp32) in the acceleration of the inference process? Thank you!

why is the quanti-aware training very slow?

why is the quanti-aware training very slow?(almost 2 times slower than float model training). Is there any way to speed up the quanti-aware training?

Investigate benefits of adding BatchNorm folding to NNCF quantization

Original paper: https://arxiv.org/pdf/1806.08342.pdf
PyTorch implementation: https://github.com/pytorch/pytorch/blob/master/torch/nn/intrinsic/qat/modules/conv_fused.py#L82-L92

Experiment with low-batch training scenarios such as Mask-R-CNN to determine whether adding BatchNorm folding to NNCF will improve general quantized model quality.

Mixed Precision missquantization

While training object detection model through hawq config, I realized there are much more int8 activation quantizations then int8 weight quantizations. According to netron some convolutions take int8 activations and int4 weights. I suppose that's not how things should be. What do you think?

Configure batch size for HAWQ separately from training batch size

HAWQ analysis may require more GPU memory, hence it would be beneficial to have different batch sizes for training and for precision initialization
@RikAllen

Subclass QuantizerConfig for INT-N and BFP specialization

Some class-level specialization might be in order here, otherwise we end up with a situation when INT-N only uses a half of the available config structs, and certain quantizer configs won't correspond to any real quantizer.
class IntNQuantizerConfig(QuantizerConfig):, class BFPQuantizerConfig(QuantizerConfig): - what do you think?

Originally posted by @vshampor in #137 (comment)

Pytorch 1.6.0 seem to leak memory in conv2d

I'm using Pytorch 1.6.0, and there is only one conv, my codes are as the follows:
inputs = torch.randn(1, 3, 512, 512).cuda()
conv = torch.nn.Conv2d(3, 64, (7, 7), stride=2, padding=3, bias=False).cuda()
output = conv(inputs)
Before execute the conv operation, the GPU memory usage from the nvidia-smi is 1019MB, after the conv operation the GPU memory usage is 1429, and the conv operation consume about 410MB, I know the im2col may consume a huge memory when input size is large. What i can't understand is that after the conv operation the GPU usage not become low. I think there is maybe memory leak in conv2d or there is something wrong in my experiment?

[Quantization] Support for fusing non-relu activations

In the pattern-based approach, NNCF interprets non-relu activations as a single operation for which the input must be quantized. That is, the Fake Quantize operation is inserted into the graph before non-relu activations. This blocks fusing non-relu activation to a core operation like conv.

examples/classification　error

Hi,Run the main.py of examples/classification and encountered an error: FileNotFoundError: [Errno 2] No such file or directory:'/home/sky/anaconda3/lib/python3.7/site-packages/nncf-1.3.2- py3.7.egg/nncf/extensions/src/quantization/cpu/functions_cpu.cpp',
How to deal with this error, thank you!

cifar10

I don't know how to train the CIFAR10 dataset, it always reports an error when there is no val folder, can someone tell me?

Filter prunning for ssd

Is filter prunning supported for SSD based object detection models ?

BatchNorm adaptation results

Some additional results regarding #41.

Forgetting (5 batches with momentum = 0.9, then 10 batches with momentum = 0.1) works as well as original resetting to zero and using 200 batches.

Iterative update of BN statistics layer by layer does not give an accuracy boost. Statistics from previous layers for a given layer are updated on-the-go due to rolling stats calculation and that is sufficient to get good accuracy.

Model	Pruning algo info	Accuracy@1	Accuracy@5
ResNet18 (BN adapted original, 200 steps)	geometric median criterion, pruning target = 30%	33.582	59.336
ResNet18 (BN adapted w/ forgetting, 10 steps)	geometric median criterion, pruning target = 30%	33.976	58.908
ResNet18 (BN adapted iteratively, 20 steps for each BN node)	geometric median criterion, pruning target = 30%	33.830	58.712

Model	Quantization bitwidths	Quantization mode	Range initializer	Accuracy@1	Accuracy@5
ResNet18 (BN adapted original, 200 steps)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	66.866	87.476
ResNet18 (BN adapted w/ forgetting, 10 steps)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	66.798	87.490
ResNet18 (BN adapted iteratively, 20 steps for each BN node)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	66.832	87.480
MobilenetV2 (BN adapted original, 200 steps)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	65.216	86.304
MobilenetV2 (BN adapted w/ forgetting, 10 steps)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	65.112	86.170
MobilenetV2 (BN adapted iteratively, 20 steps for each BN node)	a8w4	asymmetric, per-channel for weights	mean min max, 100 batches	65.026	86.292

mmdetection onnx convertation

Hi! I've trained Cascade RCNN model with mmdetection by using your patch and modifying config. So for converting to onnx I've used in-build converter script in mmdetection. Looks like I've got the same model as before optimization.

Is it necessary to use your converter to onnx? How I can do it with mmdetection training pipeline?

I decided what models are same because after converting to opevino IR I've got the same inference performance. Also my onnx graph hasn't contain anything like 'FakeQuantize' layers.

Thanks,
Vladimir

Slower inference with INT8 for NNCF compared to Post-Training Optimization Toolkit and FP32

Hi, thank you for providing these useful tools. Currently, I'm working on INT8 quantization on both NNCF and POT. I've noticed that the inference time of POT is faster than FP32, which totally makes sense; however, the inference time of NNCF not only is slower than POT but also slower than the original FP32. The benchmark tool results are as follows:

Original FP32:

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version............. 2.1.2020.4.0-359-21e092122f4-releases/2020/4
[ INFO ] Device info
CPU
MKLDNNPlugin............ version 2.1
Build................... 2020.4.0-359-21e092122f4-releases/2020/4

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 57.23 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 294.24 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1842 iterations
Duration: 60044.23 ms
Latency: 32.15 ms
Throughput: 30.68 FPS

========================================================================

POT INT8:

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 86.67 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 411.73 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'input0' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'input0' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 3245 iterations
Duration: 60032.85 ms
Latency: 18.25 ms
Throughput: 54.05 FPS

===========================================================================

NNCF INT8:

[Step 3/11] Setting device configuration
[Step 4/11] Reading the Intermediate Representation network
[ INFO ] Read network took 114.43 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 599.62 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'result.1' precision U8, dimensions (NCHW): 1 3 640 640
/opt/intel/openvino_2020.4.287/python/python3.6/openvino/tools/benchmark/utils/inputs_filling.py:71: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("No input files were given: all inputs will be filled with random values!")
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'result.1' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asyncronously, 1 inference requests using 1 streams for CPU, limits: 60000 ms duration)
[Step 11/11] Dumping statistics report
Count: 1291 iterations
Duration: 60082.78 ms
Latency: 46.00 ms
Throughput: 21.49 FPS

===========================================================================

These results are conducted on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz .

We have found that there is a difference between the IR model of NNCF and POT, where the FakeQuantize Layer and the activation function happen to be in the opposite order, which leads to more parameters in FakeQuantize Layer in NNCF. The Neutron visualization results show as follows:

POT INT8:

NNCF INT8:

Quantization with 1D Convolutions

Does NNCF support 1D convolutions?
I am trying to compress a ( CNN with 1D convolutions ) as encoder for AE model.
Thank you

compressed model

How can get the compressed model and find the compression ratio which is an important concern in deep compression?

Can i use it to train a model of ssd512 only?

I used it to train a model of ssd512_vgg, but it crashed because of NotImplementedError of compression_ctrl.compression_level().I did not config a compression algorithm in ssd512_vgg_voc.json, can i do it in this way?
INFO:nncf:Creating compression algorithm: NoCompressionAlgorithmBuilder
WARNING:nncf:Graphviz is not installed - only the .dot model visualization format will be used. Install pygraphviz into your Python environment and graphviz system-wide to enable PNG rendering.
Training ssd_vgg on coco dataset...
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/mechmind/projects/nncf_pytorch/examples/object_detection/utils/augmentations.py:257: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
0: iter 0 epoch 0 || Loss: 2.728 || Time 0.4711s || lr: 0.0001 || CR loss: 0
0: iter 10 epoch 0 || Loss: 2.714 || Time 2.046s || lr: 0.0001 || CR loss: 0
0: iter 20 epoch 0 || Loss: 3.65 || Time 1.951s || lr: 0.0001 || CR loss: 0
0: iter 30 epoch 0 || Loss: 3.013 || Time 1.964s || lr: 0.0001 || CR loss: 0
0: iter 40 epoch 0 || Loss: 2.639 || Time 1.952s || lr: 0.0001 || CR loss: 0
0: iter 50 epoch 0 || Loss: 2.53 || Time 1.956s || lr: 0.0001 || CR loss: 0
0: iter 60 epoch 0 || Loss: 2.034 || Time 1.957s || lr: 0.0001 || CR loss: 0
0: iter 70 epoch 0 || Loss: 1.776 || Time 1.953s || lr: 0.0001 || CR loss: 0
0: iter 80 epoch 0 || Loss: 1.496 || Time 1.965s || lr: 0.0001 || CR loss: 0
0: iter 90 epoch 0 || Loss: 1.95 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 100 epoch 0 || Loss: 1.523 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 110 epoch 0 || Loss: 2.282 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 120 epoch 0 || Loss: 1.326 || Time 1.969s || lr: 0.0001 || CR loss: 0
0: iter 130 epoch 0 || Loss: 1.398 || Time 1.967s || lr: 0.0001 || CR loss: 0
0: iter 140 epoch 0 || Loss: 1.422 || Time 1.981s || lr: 0.0001 || CR loss: 0
0: iter 150 epoch 0 || Loss: 1.011 || Time 1.975s || lr: 0.0001 || CR loss: 0
0: iter 160 epoch 0 || Loss: 1.024 || Time 1.976s || lr: 0.0001 || CR loss: 0
0: iter 170 epoch 0 || Loss: 1.283 || Time 1.974s || lr: 0.0001 || CR loss: 0
0: iter 180 epoch 0 || Loss: 1.035 || Time 1.977s || lr: 0.0001 || CR loss: 0
0: iter 190 epoch 0 || Loss: 0.9065 || Time 1.991s || lr: 0.0001 || CR loss: 0
0: iter 200 epoch 0 || Loss: 1.312 || Time 1.995s || lr: 0.0001 || CR loss: 0
0: iter 210 epoch 0 || Loss: 1.238 || Time 1.976s || lr: 0.0001 || CR loss: 0
Traceback (most recent call last):
File "main.py", line 378, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 188, in main_worker
train(net, compression_ctrl, train_data_loader, test_data_loader, criterion, optimizer, config, lr_scheduler)
File "main.py", line 301, in train
compression_level = compression_ctrl.compression_level()
File "/home/mechmind/projects/nncf_pytorch/nncf/compression_method_api.py", line 166, in compression_level
raise NotImplementedError()

Reimplementation of Filter Pruning Method from LeGR paper

The idea is to have a more advanced Filter Pruning method to be able to show SOTA results in model compression/optimization.

I suggest reimplementing the method from here: https://github.com/cmu-enyac/LeGR and reproduce baseline results for MobileNet v2 on CIFAR100 as the first step.

cc'ed @vshampor, @vanyalzr.

MMDetection fine tuning error

I am getting the following error on running the demo retinanet_r50_fpn_1x_int8.py example. Any suggestions of what could be causing it?
args_kwargs_tuple = data_loader.get_inputs(loaded_item) File "/home/.conda/envs/nncf2/lib/python3.6/site-packages/nncf-1.4.1-py3.6.egg/nncf/initialization.py", line 56, in get_inputs raise NotImplementedError NotImplementedError

I had followed the master branch of nncf and mmdet commit id: c77ccbbf235c0eb50a4440698eefc2ae199f837f

OpenVINO test

Hi,

I managed to take one of the detection models and successfully converted it with the mo_onnx.py (provided by the OpenVino toolkit) to generate the binary and the xml file. However, I have not found any documentation on how to run such models on OpenVino. I have already run models from tensorflow object detection but I'm interested in running your quantized model on OpenVino. So, If you could provide some sample scripts on this, that would be great. Thank you.

Is there a plan to support QAT with conv_bn folding?

Conv_bn folding is mentioned in https://arxiv.org/pdf/1806.08342.pdf 3.2.2 for getting better QAT accuracy.
It has been implemented in pytorch (https://github.com/pytorch/pytorch/blob/master/torch/nn/intrinsic/qat/modules/conv_fused.py#L82-L92)

Would it be implemented in nncf? Thanks. (I think it is important for QAT from scratch.)

Check consistency of all affected quantizers on activation quantizers merge in propagation mode.

By default, the merge of activation quantizers should not happen if they are connected with weight quantizers that have different supported bit-width.

The merge can happen if the corresponding flag (e.g. allow_different_bitwidth_for_weight_and_activation or something shorter) is specified in the HW Config.

@vshampor @asenina @RikAllen

I wonder if this framework supports Windows?

If inscribe

NNCF skips insert position for FakeQuanizer

Comparing models graphs of compressed ssd300 by NNCF and POT it was noticed that they are not the same however one of the requirements to NNCF is to build the POT-like graph. Moreover, it seems that NNCF didn't put several FakeQuanztizers where I expected. There are two images below:

There is illustrated the start point of the NNCF-compressed model graph. I suppose that there should be one more FakeQuantizer (I underlined a location by a red pen)
There is one more place where NNCF differs with POT graphs. The same model's location are underlined by red pen. NNCF didn't put any FakeQuantizer there, while POT did. The left image corresponds to POT and right to NNCF

If you would like to have a look at the full model's graphs. Please contact me, I will share them here or privately.

Left image - POT; right image - NNCF

Training Performance Degradation

We observed a drop in training time for about 28%. Details as follows.

For 30epochs of Resnet50 fine-tuning, the elapsed time gap between two commits is 6hrs.

python examples/classification/main.py \
    -m train \
    --config examples/classification/configs/quantization/resnet50_imagenet_mixed_int_manual.json \
    --data <imagenet_dataset_path> \
    --workers 16 \
    --log-dir ./resnet50_train_run

Environment A (commit: `a0c1c2b`): `43mins` per epoch

mkdir nncf-a0c1c2bf && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a0c1c2bf 
pip install -r requirements.txt

0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.289 (0.297**) Data: 0.000 (0.002) CE_loss: 2.0819 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.0819 (2.2970) Acc@1: 54.054 (48.777) Acc@5: 67.568 (73.302)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.287 (0.297**) Data: 0.000 (0.002) CE_loss: 2.5287 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 2.5287 (2.2970) Acc@1: 54.054 (48.778) Acc@5: 70.270 (73.305)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.324 (0.297**) Data: 0.001 (0.002) CE_loss: 2.4854 (2.2966) CR_loss: 0.0000 (0.0000) Loss: 2.4854 (2.2966) Acc@1: 45.946 (48.784) Acc@5: 75.676 (73.311)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.288 (0.297**) Data: 0.000 (0.002) CE_loss: 2.7068 (2.2965) CR_loss: 0.0000 (0.0000) Loss: 2.7068 (2.2965) Acc@1: 37.838 (48.788) Acc@5: 62.162 (73.311)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.301 (0.297**) Data: 0.001 (0.002) CE_loss: 2.3907 (2.2962) CR_loss: 0.0000 (0.0000) Loss: 2.3907 (2.2962) Acc@1: 45.946 (48.794) Acc@5: 64.865 (73.316)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.281 (0.297**) Data: 0.000 (0.002) CE_loss: 2.2093 (2.2957) CR_loss: 0.0000 (0.0000) Loss: 2.2093 (2.2957) Acc@1: 51.351 (48.805) Acc@5: 72.973 (73.324)

Environment B (commit: `a27da4f`): `55mins` per epoch

mkdir nncf-a27da4fb && cd $_
python3 -m venv env
source env/bin/activate
git clone https://github.com/openvinotoolkit/nncf_pytorch && cd nncf_pytorch
git checkout a27da4fb 
pip install -r requirements.txt

0:: Epoch: [0][8600/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.072 (0.080) CE_loss: 2.0623 (2.2975) CR_loss: 0.0000 (0.0000) Loss: 2.0623 (2.2975) Acc@1: 56.757 (48.684) Acc@5: 75.676 (73.327)
0:: Epoch: [0][8610/8657] Lr: 0.00031 Time: 0.449 (0.382**) Data: 0.156 (0.080) CE_loss: 2.1209 (2.2974) CR_loss: 0.0000 (0.0000) Loss: 2.1209 (2.2974) Acc@1: 43.243 (48.684) Acc@5: 78.378 (73.327)
0:: Epoch: [0][8620/8657] Lr: 0.00031 Time: 0.367 (0.382**) Data: 0.073 (0.080) CE_loss: 1.9419 (2.2970) CR_loss: 0.0000 (0.0000) Loss: 1.9419 (2.2970) Acc@1: 59.459 (48.691) Acc@5: 81.081 (73.334)
0:: Epoch: [0][8630/8657] Lr: 0.00031 Time: 0.368 (0.382**) Data: 0.073 (0.080) CE_loss: 2.2480 (2.2967) CR_loss: 0.0000 (0.0000) Loss: 2.2480 (2.2967) Acc@1: 45.946 (48.696) Acc@5: 72.973 (73.338)
0:: Epoch: [0][8640/8657] Lr: 0.00031 Time: 0.386 (0.382**) Data: 0.085 (0.080) CE_loss: 2.2206 (2.2964) CR_loss: 0.0000 (0.0000) Loss: 2.2206 (2.2964) Acc@1: 56.757 (48.701) Acc@5: 72.973 (73.344)
0:: Epoch: [0][8650/8657] Lr: 0.00031 Time: 0.346 (0.382**) Data: 0.068 (0.080) CE_loss: 1.9694 (2.2959) CR_loss: 0.0000 (0.0000) Loss: 1.9694 (2.2959) Acc@1: 48.649 (48.710) Acc@5: 78.378 (73.352)

Common Setup for Both Environment

Hardware: Xeon-Gold, 4xV100
python: 3.7.6
torch: 1.6.0
cuda:10.2

does nncf support post training quantization and jupyter notebook?

Hello,
Thanks for this great project. But

does nncf support post training quantization? I did not find related information/example yet.
it seems nncf does not jupyter notebook? There will be error if import modules in notebook.

Thanks.

KeyError: 'quantization_range_init_args'

This error occurs when i quantize FP32 pretrained model，is this a bug?
Traceback (most recent call last):
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 961, in init_range
range_init_args = self.quantization_config.get_extra_struct(QuantizationRangeInitArgs)
File "/home/mechmind/projects/nncf_pytorch/nncf/config.py", line 56, in get_extra_struct
return self.__nncf_extra_structs[struct_cls.get_id()]
KeyError: 'quantization_range_init_args'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 381, in
main(sys.argv[1:])
File "main.py", line 81, in main
start_worker(main_worker, config)
File "/home/mechmind/projects/nncf_pytorch/examples/common/execution.py", line 99, in start_worker
main_worker(current_gpu=config.gpu_id, config=config)
File "main.py", line 152, in main_worker
compression_ctrl, net = create_model(config, resuming_model_state_dict)
File "main.py", line 239, in create_model
compression_ctrl, compressed_model = create_compressed_model(ssd_net, config.nncf_config, resuming_model_sd)
File "/home/mechmind/projects/nncf_pytorch/nncf/model_creation.py", line 126, in create_compressed_model
compression_ctrl = compressed_model.commit_compression_changes()
File "/home/mechmind/projects/nncf_pytorch/nncf/nncf_network.py", line 416, in commit_compression_changes
return self._builders[0].build_controller(self)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 200, in build_controller
self._hw_precision_constraints)
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 816, in init
self.initialize_quantizer_params()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 893, in initialize_quantizer_params
self.init_range()
File "/home/mechmind/projects/nncf_pytorch/nncf/quantization/algo.py", line 964, in init_range
'Should run range initialization as specified via config,'
ValueError: Should run range initialization as specified via config,but the initializing data loader is not provided as an extra struct. Refer to NNCFConfig.register_extra_structs and the QuantizationRangeInitArgs class

FileNotFoundError: [Errno 2] No such file or directory: '/home/sroot/work/nncf_pytorch-master/nncf/install_type'

Percentile-based initialization fails in per-channel quantization case

File "/nncf/initialization.py", line 170, in _apply_initializers initializer.apply_init() File "/nncf/quantization/init_range.py", line 223, in apply_init self.quantize_module.apply_minmax_init(mins_tensor, maxs_tensor, self.log_module_name) File "/nncf/quantization/layers.py", line 293, in apply_minmax_init self.scale.masked_scatter_(torch.gt(abs_max, SCALE_LOWER_THRESHOLD), abs_max) RuntimeError: invalid argument 2: source nElements must be == mask1 elements at /pytorch/aten/src/THC/generic/THCTensorMasked.cu:134

Should cover this case in pre-commit tests.

Is there any way to support two inputs?

Saving and Loading compressed model in pytorch as pytorch model object

I am facing a issue that when I try to torch.save(model, model_path), it is throwing TypeError: can't pickle odict_values objects error. For my project I want to save it as a torch compressed model object and load it for doing prediction on new images. If anyone can help me out here, it would be really great

compression_loss

compression_loss always equal zero in my trainling process, does it have same problems?

Performance gap in mmdetection

I'm sorry. I am back.
I have been tried retinanet_r50_fpn_1x_int8.py in mmdetection, there is nothing wrong in training and evaluation.
But i only got this result in coco2017 val, docs showed that retinanet can reached 34.7 or 35.3 average box mAP on the coco_2017_val dataset.
My result are as follows:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.260
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.272
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.143
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.292
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.336
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.258
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.425
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.453
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.260
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600

I didn't change anything in config file expect changing samples_per_gpu from 6 to 4 because my gpu can't allocate enough memory. My cuda version is 10.2, pytorch is 1.6.0.
If you need any other information please contact me

Incompatibility with python 3.8

I noticed incompatibility of NNCF with python 3.8.
The problem occurs during installation of one of the dependencies of NNCF and it seems to be caused by the fact that platform.linux_distribution was removed in Python 3.8:

  Downloading matplotlib-3.0.3.tar.gz (36.6 B)
    ERROR: Command errored out with exit status :
     command: /opt/home/k8sworker/cibuilds/impt/nncf_for_digits-9/src/model_templates/.venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-0zqb86kn/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-0zqb86kn/matplotlib/pip-egg-info
        cwd: /tmp/pip-install-0zqb86kn/matplotlb/
    Complete output (51 line):
    Traceback (most recent call last)
      File "<string>", line 1, in <module
      File "/tmp/pip-install-0zqb86kn/matplotlib/setup.py", line 225, in <module>
       msg = pkg.install_help_msg(
      File "/tmp/pip-install-0zqb86kn/matplotlib/setupext.py", line 650, in install_help_msg
       release = platform.linux_distribution()[0].lowe()
    AttributeError: module 'platform' has no attribute 'linux_distributin'```

Revise quantization levels for weights

The original problem is that QauntizeLinear and DequantizeLinear operation from ONNX do not support a shrunk range of quantization levels. So we can correctly export only levels of 256 for weights and activations. On the other hand 255 levels for weights were introduced to workaround the saturation issue on AVX targets. However, we do not know how it really affects accuracy and helps.

My proposal is to have the full range of 2^bits levels but use a different workaround for saturations about which we know that it really works. It is about using 128 levels of 256 in the following way:

y = w*a = [sw * wq] * [sa * aq] * 1/sw * 1/sa = [sw * wq / 2] * [sa * aq] * 2/sw * 1/sa

It means that we divide weights by the factor of 2.0 and adjust output scales of Dequatize operation (output_high and output_low in FQ) multiplying it by 2.0.

This is relevant for INT8 only!

We need to plan this for the next release. cc'ed @alexsu52, @kchechil

Add ConvTranspose2d to the scope of prunable ops in filter pruning

How to cite your work?

Hi,
do you have a publication or arxiv for NNCF? How should I cite your work properly?

"Class conv_transpose2d is not found" when exporting a pruning-optimized model

I tried pruning optimization (pruning only) for my detection model.
I got following error when calling compression_ctrl.export_model()

  File "nncf_pytorch/nncf/compression_method_api.py", line 213, in export_model
    self.prepare_for_export()
  File "nncf_pytorch/nncf/pruning/filter_pruning/algo.py", line 204, in prepare_for_export
    model_pruner.prune_model()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 392, in prune_model
    self.mask_propagation()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 315, in mask_propagation
    cls = self.get_class_by_type_name(node_type)()
  File "nncf_pytorch/nncf/pruning/export_helpers.py", line 303, in get_class_by_type_name
    raise RuntimeError("Class {} is not found".format(type_name))
RuntimeError: Class conv_transpose2d is not found

Is it a bug or torch.nn.ConvTranspose2d not supported?

Pruning itself seems working judging from the training log of Mask zero %, PR, Filter PR columns printed by print_statistics function is above 0.

Merge activation quantizers after HAWQ init in propagation mode.

Currently, merge activation quantizers is always happen in the case of consistent bit-width of all affected quantizers.
For example, as in the diagram below.

But HAWQ may choose a more accurate configuration when the merge is not possible

Before implementing this feature, some research of possible performance gain for both schemes is required (consider overhead for re-quantizations and compare which configuration is faster)

@asenina @vshampor @AlexKoff88

How to get mmdetection ssd300_coco_int8 quantized model?

I followed the branch and was running the ssd300_coco_int8 quantization aware training. I wanted to know how I can get the int8 models. I ran
python tools/train.py configs/nncf_compression/ssd/ssd300_coco_int8.py
and it creates a output folder inside which there are .pth files. But when I load these, it contains weights of type torch.cuda.FloatTensor which is 32 bit floating point. Please tell how I can get the (torch.int8) int8 model weights.

Long graph processing for quantization DENSENET161

Long work of creating a compressed model of the quantization algorithm for DENSENET161 (looks to me like a loop while processing a graph)

Steps to reproduce:
0. Create python3.6 env

Install nncf (use instructions from README)
Run in terminal: python examples/classification/main.py --config examples/classification/configs/quantization/densenet161_imagenet_custom_quant_pattern.json --data <path_to_dataset>

Quantize Pointrend

I compressed a model of pointrend based on mmdet-2.2.1，the training looks normal, but a error occurs when i convert it to onnx using functions of pytorch2onnx.py in mmdet-2.3.1 (commit id:6495391) . It seems like _bbox_forward() and _mask_forward() have some problems, do you know how to fix it? the error info is as follows:

File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/base.py", line 138, in forward_test
return self.forward_dummy(imgs[0])
File "/home/mechmind/projects/mech_learning/mmdet/models/detectors/two_stage.py", line 101, in forward_dummy
roi_outs = self.roi_head.forward_dummy(x, proposals)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 60, in forward_dummy
bbox_results = self._bbox_forward(x, rois)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/standard_roi_head.py", line 139, in _bbox_forward
x[:self.bbox_roi_extractor.num_inputs], rois)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 73, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois)
File "/home/mechmind/projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/mechmind/miniconda3/envs/nncf/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 144, in forward
self.sample_num, self.aligned)
File "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/roi_align.py", line 30, in forward
aligned)
RuntimeError: roi_width >= 0 && roi_height >= 0 INTERNAL ASSERT FAILED at "/home/mechmind/projects/mech_learning/mmdet/ops/roi_align/src/cpu/roi_align_v2.cpp":134, please report a bug to PyTorch. ROIs in ROIAlign cannot have non-negative size!

Locked when training

it seems that the program will produce the directory with "/tmp/torch_extensions", the will be locked when running in the second time if the first time failed.

Two question during using retinanet_r50_fpn_1x_int8 in mmdetection

When I trying to train retinanet_r50_fpn_1x_int8 demo in mmdetection, training process has no problem, when it come into evaluation it encounter problem as follows:

File "/home/amax/projects/mech_learning/tools/train.py", line 216, in main
meta=meta)
File "/home/amax/projects/mech_learning/mmdet/apis/train.py", line 149, in train_detector
compression_ctrl=compression_ctrl)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 46, in train
self.call_hook('after_train_epoch')
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 282, in call_hook
getattr(hook, fn_name)(self)
File "/home/amax/projects/mech_learning/mmdet/core/evaluation/eval_hooks.py", line 27, in after_train_epoch
results = single_gpu_test(runner.model, self.dataloader, show=False)
File "/home/amax/projects/mech_learning/mmdet/apis/test.py", line 36, in single_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 81, in wrapped
return module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/debug.py", line 82, in decorated
retval = forward_func(self, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/nncf_network.py", line 366, in forward
retval = self.get_nncf_wrapped_model()(*args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 83, in wrapped
retval = module_call(self, *args, **kwargs)
File "/home/amax/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 51, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 180, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/base.py", line 156, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/detectors/single_stage.py", line 111, in simple_test
*outs, img_metas, rescale=rescale)
File "/home/amax/projects/mech_learning/mmdet/core/fp16/decorators.py", line 131, in new_func
return old_func(*args, **kwargs)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 569, in get_bboxes
scale_factor, cfg, rescale)
File "/home/amax/projects/mech_learning/mmdet/models/dense_heads/anchor_head.py", line 647, in _get_bboxes_single
cfg.max_per_img)
File "/home/amax/projects/mech_learning/mmdet/core/post_processing/bbox_nms.py", line 40, in multiclass_nms
bboxes = bboxes[valid_mask]
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/wrappers.py", line 41, in wrapped
result = operator_info.custom_trace_fn(operator, *args, **kwargs)
File "/home/amax/git_projects/nncf_pytorch/nncf/dynamic_graph/patch_pytorch.py", line 71, in call
"input and output tensor count mismatch!".format(operator.name))
RuntimeError: Unable to forward trace through operator getitem - input and output tensor count mismatch!

Should I set --no-validate during training?

Second question, After training one epoch i got a checkpoint file and use the same config file for evaluation, when loading the model i got this error :
unexpected key in source state_dict: nncf_module.backbone.conv1.weight, nncf_module.backbone.conv1.pre_ops.0.op._num_bits, ...
missing keys in source state_dict: backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, ...
So the key in model is all mismatch and result is empty:

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 14.3 task/s, elapsed: 350s, ETA: 0s
Evaluating bbox...
Loading and preparing results...
The testing results of the whole dataset is empty.

Is there something wrong in my operation ?

Revise mixed-precision related content

I have 3 comments/proposal:

Please add compression_ratio into the template file inside the Quantization readme
Create a separate folder for the samples of configs on the same level as quantization, pruning, etc. Let's call it mixed_precision
Move all the hawq-related configs into this folder and minimize the scope of parameters in this config removing as much as possible and let them be the defaults ones.

Variability in SSD Mixed-Precision Performance

We observed large variability in performance with SSD300(VGG) when we tried different combination of precision for weight and activation in test mode. From the collected number below, the best and worst are about 15X gap, inference per batch (size: 128) is 30secs for Int2 weights and Int8 activation for the worst case. The performance should impact fine-tuning mode as well.

NNCF Version: Develop branch with commit 2a681b8
Similar observations with v1.4
Baseline config: https://github.com/openvinotoolkit/nncf_pytorch/blob/develop/examples/object_detection/configs/ssd300_vgg_voc_int8.json
Platform: V100 GPU

weights	activations	detection elapse
8	8	Detect for batch: 8/39 1.847s Detect for batch: 9/39 1.844s Detect for batch: 10/39 1.876s
8	4	Detect for batch: 8/39 8.792s Detect for batch: 9/39 9.021s Detect for batch: 10/39 8.928s
8	2	Detect for batch: 8/39 17.09s Detect for batch: 9/39 17.80s Detect for batch: 10/39 17.87s
4	8	Detect for batch: 8/39 2.283s Detect for batch: 9/39 2.105s Detect for batch: 10/39 2.296s
4	4	Detect for batch: 8/39 8.285s Detect for batch: 9/39 9.583s Detect for batch: 10/39 7.425s
4	2	Detect for batch: 8/39 11.31s Detect for batch: 9/39 11.85s Detect for batch: 10/39 12.61s
2	8	Detect for batch: 8/39 29.40s Detect for batch: 9/39 30.75s Detect for batch: 10/39 29.30s
2	4	Detect for batch: 8/39 5.684s Detect for batch: 9/39 5.703s Detect for batch: 10/39 5.539s
2	2	Detect for batch: 8/39 5.954s Detect for batch: 9/39 6.040s Detect for batch: 10/39 6.159s

openvinotoolkit / nncf Goto Github PK

nncf's Introduction

Neural Network Compression Framework (NNCF)

Key Features

Post-Training Compression Algorithms

Training-Time Compression Algorithms

Documentation

Usage

Post-Training Quantization

Training-Time Quantization

Training-Time Compression

Demos, Tutorials and Samples

Jupyter* Notebook Tutorials and Demos

Post-Training Quantization Examples

Training-Time Compression Examples

Third-party repository integration

Used by

Installation Guide

System requirements

NNCF Compressed NNCF Model Zoo

Citing

Contributing Guide

Useful links

Telemetry

nncf's People

Contributors

Stargazers

Watchers

Forkers

nncf's Issues

Environment A (commit: a0c1c2b): 43mins per epoch

Environment B (commit: a27da4f): 55mins per epoch

Common Setup for Both Environment

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Environment A (commit: `a0c1c2b`): `43mins` per epoch

Environment B (commit: `a27da4f`): `55mins` per epoch