kMaX-DeepLab (ECCV 2022)

This is a PyTorch re-implementation of our ECCV 2022 paper based on Detectron2: k-means mask Transformer.

Disclaimer: This is a re-implementation of kMaX-DeepLab in PyTorch. While we have tried our best to reproduce all the numbers reported in the paper, please refer to the original numbers in the paper or tensorflow repo when making performance or speed comparisons.

kMaX-DeepLab is an end-to-end method for general segmentation tasks. Built upon MaX-DeepLab and CMT-DeepLab, kMaX-DeepLab proposes a novel view to regard the mask transformer as a process of iteratively performing cluster-assignment and cluster-update steps.

Insipred by the similarity between cross-attention and k-means clustering algorithm, kMaX-DeepLab proposes k-means cross-attention, which adopts a simple modification by changing the activation function in cross-attention from spatial-wise softmax to cluster-wise argmax.

As a result, kMaX-DeepLab not only produces much more plausible attention map but also enjoys a much better performance.

Installation

The code-base is verified with pytorch==1.12.1, torchvision==0.13.1, cudatoolkit==11.3, and detectron2==0.6, please install other libiaries through pip3 install -r requirements.txt

Please refer to Mask2Former's script for data preparation.

Model Zoo

Note that model zoo below are trained from scratch using this PyTorch code-base, we also offer code for porting and evaluating the TensorFlow checkpoints in the section Porting TensorFlow Weights.

COCO Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	ckpt
ResNet-50	53.3	83.2	63.3	58.8	45.0	download
ConvNeXt-Tiny	55.5	83.3	65.9	61.4	46.7	download
ConvNeXt-Small	56.7	83.4	67.2	62.7	47.7	download
ConvNeXt-Base	57.2	83.4	67.9	63.4	47.9	download
ConvNeXt-Large	57.9	83.5	68.5	64.3	48.4	download

Cityscapes Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	AP	IoU	ckpt
ResNet-50	63.5	82.0	76.5	57.8	67.7	38.6	79.5	download
ConvNeXt-Large	68.4	83.3	81.3	62.6	72.6	45.1	83.0	download

ADE20K Panoptic Segmentation

Backbone	PQ	SQ	RQ	PQ^thing	PQ^stuff	ckpt
ResNet-50	42.2	81.6	50.4	41.9	42.7	download
ConvNeXt-Large	50.0	83.3	59.1	49.5	50.8	download

Example Commands for Training and Testing

To train kMaX-DeepLab with ResNet-50 backbone:

python3 train_net.py --num-gpus 8 --num-machines 4 \
--machine-rank MACHINE_RANK --dist-url DIST_URL \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml

The training takes 53 hours with 32 V100 on our end.

To test kMaX-DeepLab with ResNet-50 backbone and the provided weights:

python3 train_net.py --num-gpus NUM_GPUS \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml \
--eval-only MODEL.WEIGHTS kmax_r50.pth

Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:

Porting TensorFlow Weights

We also provide a script to convert the official TensorFlow weights into PyTorch format and use them in this code-base.

Example for porting and evaluating kMaX with ConvNeXt-Large on Cityscapes from TensorFlow weights:

pip3 install tensorflow==2.9 keras==2.9
wget https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/kmax_convnext_large_res1281_ade20k_train.tar.gz
tar -xvf kmax_convnext_large_res1281_ade20k_train.tar.gz
python3 convert-tf-weights-to-d2.py ./kmax_convnext_large_res1281_ade20k_train/ckpt-100000 kmax_convnext_large_res1281_ade20k_train.pkl
python3 train_net.py --num-gpus 8 --config-file configs/ade20k/kmax_convnext_large.yaml \
--eval-only MODEL.WEIGHTS ./kmax_convnext_large_res1281_ade20k_train.pkl

This expexts to give PQ = 50.6620. Note that minor performance difference may exist due to numeric difference across different deep learning frameworks and implementation details.

Citing kMaX-DeepLab

If you find this code helpful in your research or wish to refer to the baseline results, please use the following BibTeX entry.

kMaX-DeepLab:

@inproceedings{kmax_deeplab_2022,
  author={Qihang Yu and Huiyu Wang and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
  title={{k-means Mask Transformer}},
  booktitle={ECCV},
  year={2022}
}

CMT-DeepLab:

@inproceedings{cmt_deeplab_2022,
  author={Qihang Yu and Huiyu Wang and Dahun Kim and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
  title={CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation},
  booktitle={CVPR},
  year={2022}
}

Acknowledgements

We express gratitude to the following open-source projects which this code-base is based on:

DeepLab2

Mask2Former

devkpro / kmax-deeplab Goto Github PK

kmax-deeplab's Introduction

kMaX-DeepLab (ECCV 2022)

Installation

Model Zoo

COCO Panoptic Segmentation

Cityscapes Panoptic Segmentation

ADE20K Panoptic Segmentation

Example Commands for Training and Testing

Porting TensorFlow Weights

Citing kMaX-DeepLab

Acknowledgements

kmax-deeplab's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs