PTQ4SAM: Post-Training Quantization for Segment Anything (CVPR 2024)

Chengtao Lv*, Hong Chen*, Jinyang Guo📧, Yifu Ding, Xianglong Liu

(* denotes equal contribution, 📧 denotes corresponding author.)

Overview

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly.

Create Environment

🍺🍺🍺 You can refer the environment.sh in the root directory or install step by step.

Install PyTorch

conda create -n ptq4sam python=3.7 -y
pip install torch torchvision

Install MMCV

pip install -U openmim
mim install "mmcv-full<2.0.0"

Install other requirements

pip install -r requirements.txt

Compile CUDA operators

cd projects/instance_segment_anything/ops
python setup.py build install
cd ../../..

Install mmdet

cd mmdetection/
python3 setup.py build develop
cd ..

Prepare Dataset and Models

Download the official COCO dataset, put them into the corresponding folders of datasets/ and recollect them as the following form:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017

Download the pretrain weights (SAM and detectors), put them into the corresponding folders of ckpt/:

sam_b: ViT-B SAM
sam_l: ViT-L SAM
sam_h: ViT-H SAM
faster rcnn: R-50-FPN Faster R-CNN
yolox: YOLOX-l
detr: H-Deformable-DETR
dino: DINO

Usage

To perform quantization on models, specify the model configuration and quantization configuration. For example, to perform W6A6 quantization for SAM-B with a YOLO detector, use the following command:

python ptq4sam/solver/test_quant.py \
--config ./projects/configs/yolox/yolo_l-sam-vit-l.py \
--q_config exp/config66.yaml --quant-encoder

yolo_l-sam-vit-l.py: configuration file for the SAM-B model with YOLO detector.
config66.yaml: configuration file for W6A6 quantization.
quant-encoder: quant the encoder of SAM.

We recommend using a GPU with more than 40GB for experiments. If you want to visualize the prediction results, you can achieve this by specifying --show-dir. Bimodal distributions mainly occur in the mask decoder of SAM-B and SAM-L.

Reference

If you find this repo useful for your research, please consider citing the paper.

@inproceedings{lv2024ptq4sam,
  title={PTQ4SAM: Post-Training Quantization for Segment Anything},
  author={Lv, Chengtao and Chen, Hong and Guo, Jinyang and Ding, Yifu and Liu, Xianglong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={15941--15951},
  year={2024}
}

Acknowledgments

The code of PTQ4SAM was based on Prompt-Segment-Anything and QDrop. We thank for their open-sourced code.

chengtao-lv / ptq4sam Goto Github PK

ptq4sam's Introduction

PTQ4SAM: Post-Training Quantization for Segment Anything (CVPR 2024)

Overview

Create Environment

Prepare Dataset and Models

Usage

Reference

Acknowledgments

ptq4sam's People

Contributors

Stargazers

Watchers

Forkers

ptq4sam's Issues

How to deploy the quantized model on GPU?

Why did the .pt file become larger after quantization?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs