GithubHelp home page GithubHelp logo

hshen14 / cnn-quantization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from submission2019/cnn-quantization

0.0 0.0 0.0 339 KB

Python 24.86% Shell 0.06% Cuda 0.27% C++ 0.06% Jupyter Notebook 74.74%

cnn-quantization's Introduction

cnn-quantization

Dependencies

pip install torch torchvision bokeh pandas sklearn

Data

  • To run this code you need validation set from ILSVRC2012 data
  • Configure your dataset path by providing --data "PATH_TO_ILSVRC" or copy ILSVRC dir to ~/datasets/ILSVRC2012.
  • To get the ILSVRC2012 data, you should register on their site for access: http://www.image-net.org/

Building cuda kernels for GEMMLOWP

To improve performance GEMMLOWP quantization was implemented in cuda and requires to compile kernels.

  • Create virtual environment for python3 and activate:
virtualenv --system-site-packages -p python3 venv3
. ./venv3/bin/activate
  • build kernels
cd kernels
./build_all.sh

Prepare setup for Inference

We use a profiling data-set of approximately 256 images sampled from the training-set to collect statistics of activations at offline time. These are needed later to calculate the scale/clipping value of each activation tenor.

Collect statistics

# Collect per layer statistics
python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4
# Collect per output channel statistics
python inference/inference_sim.py -a resnet50 -b 256 -sm collect -ac --qtype int4 -pcq_a

Statistics will be saved under ~/mxt_sim/statistics folder.

Generate quantized model

python pytorch_quantizer/quantization/kmeans_quantization.py -a resnet50 -bits 4 -t quantize

Quantized model will be saved to ~/mxt_sim/models

Run inference experiments

Post-training quantization of Res50 to 8-bit weights and 4-bit activations using the suggested quantization pipeline:

python inference/inference_sim.py -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace

* Prec@1 74.114 Prec@5 91.904

Post-training quantization of Res50 to 4-bit weights and 8-bit activations using kmeans clustering:

python inference/inference_sim.py -a resnet50 -b 512 -sm use --qtype int8 -qm 4 -qw f32

* Prec@1 74.242 Prec@5 91.764

Post-training quantization of Res50 to 4-bit weights and 4-bit activations using the suggested quantization pipeline:

python inference/inference_sim.py -a resnet50 -b 512 -sm use --qtype int4 -pcq_w -pcq_a -c laplace -qm 4 -qw f32

* Prec@1 72.182 Prec@5 90.634

  • Note that results could vary due to randomization of statistic gathering and kmeans initialization.

Solution for optimal clipping

To find optimal clipping values for the Laplace/Gaussian case, we numerically solve Equations (12)/ (A4)

optimal_alpha.ipynb

Gaussian case, linear dependency Gaussian case

Quantization with optimal clipping

In order to quantize tensor to M bit with optimal clipping we use GEMMLOWP quantization with small modification. We replace dynamic range in scale computation by 2*alpha where alpha is optimal clipping value.

Quantization code can be found here: int_quantizer.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.