GithubHelp home page GithubHelp logo

devkpro / kmax-deeplab Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bytedance/kmax-deeplab

1.0 0.0 0.0 755 KB

a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.

License: Apache License 2.0

Python 100.00%

kmax-deeplab's Introduction

kMaX-DeepLab (ECCV 2022)

This is a PyTorch re-implementation of our ECCV 2022 paper based on Detectron2: k-means mask Transformer.

Disclaimer: This is a re-implementation of kMaX-DeepLab in PyTorch. While we have tried our best to reproduce all the numbers reported in the paper, please refer to the original numbers in the paper or tensorflow repo when making performance or speed comparisons.

kMaX-DeepLab is an end-to-end method for general segmentation tasks. Built upon MaX-DeepLab and CMT-DeepLab, kMaX-DeepLab proposes a novel view to regard the mask transformer as a process of iteratively performing cluster-assignment and cluster-update steps.

Insipred by the similarity between cross-attention and k-means clustering algorithm, kMaX-DeepLab proposes k-means cross-attention, which adopts a simple modification by changing the activation function in cross-attention from spatial-wise softmax to cluster-wise argmax.

As a result, kMaX-DeepLab not only produces much more plausible attention map but also enjoys a much better performance.

Installation

The code-base is verified with pytorch==1.12.1, torchvision==0.13.1, cudatoolkit==11.3, and detectron2==0.6, please install other libiaries through pip3 install -r requirements.txt

Please refer to Mask2Former's script for data preparation.

Model Zoo

Note that model zoo below are trained from scratch using this PyTorch code-base, we also offer code for porting and evaluating the TensorFlow checkpoints in the section Porting TensorFlow Weights.

COCO Panoptic Segmentation

Backbone PQ SQ RQ PQthing PQstuff ckpt
ResNet-50 53.3 83.2 63.3 58.8 45.0 download
ConvNeXt-Tiny 55.5 83.3 65.9 61.4 46.7 download
ConvNeXt-Small 56.7 83.4 67.2 62.7 47.7 download
ConvNeXt-Base 57.2 83.4 67.9 63.4 47.9 download
ConvNeXt-Large 57.9 83.5 68.5 64.3 48.4 download

Cityscapes Panoptic Segmentation

Backbone PQ SQ RQ PQthing PQstuff AP IoU ckpt
ResNet-50 63.5 82.0 76.5 57.8 67.7 38.6 79.5 download
ConvNeXt-Large 68.4 83.3 81.3 62.6 72.6 45.1 83.0 download

ADE20K Panoptic Segmentation

Backbone PQ SQ RQ PQthing PQstuff ckpt
ResNet-50 42.2 81.6 50.4 41.9 42.7 download
ConvNeXt-Large 50.0 83.3 59.1 49.5 50.8 download

Example Commands for Training and Testing

To train kMaX-DeepLab with ResNet-50 backbone:

python3 train_net.py --num-gpus 8 --num-machines 4 \
--machine-rank MACHINE_RANK --dist-url DIST_URL \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml

The training takes 53 hours with 32 V100 on our end.

To test kMaX-DeepLab with ResNet-50 backbone and the provided weights:

python3 train_net.py --num-gpus NUM_GPUS \
--config-file configs/coco/panoptic_segmentation/kmax_r50.yaml \
--eval-only MODEL.WEIGHTS kmax_r50.pth

Integrated into Huggingface Spaces ๐Ÿค— using Gradio. Try out the Web Demo: Hugging Face Spaces

Porting TensorFlow Weights

We also provide a script to convert the official TensorFlow weights into PyTorch format and use them in this code-base.

Example for porting and evaluating kMaX with ConvNeXt-Large on Cityscapes from TensorFlow weights:

pip3 install tensorflow==2.9 keras==2.9
wget https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/kmax_convnext_large_res1281_ade20k_train.tar.gz
tar -xvf kmax_convnext_large_res1281_ade20k_train.tar.gz
python3 convert-tf-weights-to-d2.py ./kmax_convnext_large_res1281_ade20k_train/ckpt-100000 kmax_convnext_large_res1281_ade20k_train.pkl
python3 train_net.py --num-gpus 8 --config-file configs/ade20k/kmax_convnext_large.yaml \
--eval-only MODEL.WEIGHTS ./kmax_convnext_large_res1281_ade20k_train.pkl 

This expexts to give PQ = 50.6620. Note that minor performance difference may exist due to numeric difference across different deep learning frameworks and implementation details.

Citing kMaX-DeepLab

If you find this code helpful in your research or wish to refer to the baseline results, please use the following BibTeX entry.

  • kMaX-DeepLab:
@inproceedings{kmax_deeplab_2022,
  author={Qihang Yu and Huiyu Wang and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
  title={{k-means Mask Transformer}},
  booktitle={ECCV},
  year={2022}
}
  • CMT-DeepLab:
@inproceedings{cmt_deeplab_2022,
  author={Qihang Yu and Huiyu Wang and Dahun Kim and Siyuan Qiao and Maxwell Collins and Yukun Zhu and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
  title={CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation},
  booktitle={CVPR},
  year={2022}
}

Acknowledgements

We express gratitude to the following open-source projects which this code-base is based on:

DeepLab2

Mask2Former

kmax-deeplab's People

Contributors

cornettoyu avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.