GithubHelp home page GithubHelp logo

sjhpark / oneformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shi-labs/oneformer

0.0 0.0 0.0 8.93 MB

[CVPR 2023] OneFormer: One Transformer to Rule Universal Image Segmentation

Home Page: https://praeclarumjj3.github.io/oneformer

License: MIT License

Shell 0.03% C++ 0.26% Python 21.64% Cuda 2.37% Jupyter Notebook 75.70%

oneformer's Introduction

OneFormer: One Transformer to Rule Universal Image Segmentation

Framework: PyTorch Open In Colab HuggingFace space HuggingFace transformers YouTube License

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

Equal Contribution

[Project Page] [arXiv] [pdf] [BibTeX]

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.

Features

  • OneFormer is the first multi-task universal image segmentation framework based on transformers.
  • OneFormer needs to be trained only once with a single universal architecture, a single model, and on a single dataset , to outperform existing frameworks across semantic, instance, and panoptic segmentation tasks.
  • OneFormer uses a task-conditioned joint training strategy, uniformly sampling different ground truth domains (semantic instance, or panoptic) by deriving all labels from panoptic annotations to train its multi-task model.
  • OneFormer uses a task token to condition the model on the task in focus, making our architecture task-guided for training, and task-dynamic for inference, all with a single model.

OneFormer

Contents

  1. News
  2. Installation Instructions
  3. Dataset Preparation
  4. Execution Instructions
  5. Results
  6. Citation
  7. Segmentation Inference on Recorded Flight Operation Video Frames

News

  • [February 27, 2023]: OneFormer is accepted to CVPR 2023!
  • [January 26, 2023]: OneFormer sets new SOTA performance on the the Mapillary Vistas val (both panoptic & semantic segmentation) and Cityscapes test (panoptic segmentation) sets. We’ve released the checkpoints too!
  • [January 19, 2023]: OneFormer is now available as a part of the 🤗 HuggingFace transformers library and model hub! 🚀
  • [December 26, 2022]: Checkpoints for Swin-L OneFormer and DiNAT-L OneFormer trained on ADE20K with 1280×1280 resolution released!
  • [November 23, 2022]: Roboflow cover OneFormer on YouTube! Thanks to @SkalskiP for making the video!
  • [November 18, 2022]: Our demo is available on 🤗 Huggingface Space!
  • [November 10, 2022]: Project Page, ArXiv Preprint and GitHub Repo are public!
    • OneFormer sets new SOTA on Cityscapes val with single-scale inference on Panoptic Segmentation with 68.5 PQ score and Instance Segmentation with 46.7 AP score!
    • OneFormer sets new SOTA on ADE20K val on Panoptic Segmentation with 51.5 PQ score and on Instance Segmentation with 37.8 AP!
    • OneFormer sets new SOTA on COCO val on Panoptic Segmentation with 58.0 PQ score!

Installation Instructions

  • We use Python 3.8, PyTorch 1.10.1 (CUDA 11.3 build).
  • We use Detectron2-v0.6.
  • For complete installation instructions, please see INSTALL.md.

Dataset Preparation

  • We experiment on three major benchmark dataset: ADE20K, Cityscapes and COCO 2017.
  • Please see Preparing Datasets for OneFormer for complete instructions for preparing the datasets.

Execution Instructions

Training

  • We train all our models using 8 A6000 (48 GB each) GPUs.
  • We use 8 A100 (80 GB each) for training Swin-L OneFormer and DiNAT-L OneFormer on COCO and all models with ConvNeXt-XL backbone. We also train the 896x896 models on ADE20K on 8 A100 GPUs.
  • Please see Getting Started with OneFormer for training commands.

Evaluation

Demo

  • We provide quick to run demos on Colab Open In Colab and Hugging Face Spaces Huggingface space.
  • Please see OneFormer Demo for command line instructions on running the demo.

Results

Results

  • † denotes the backbones were pretrained on ImageNet-22k.
  • Pre-trained models can be downloaded following the instructions given under tools.

ADE20K

Method Backbone Crop Size PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 640×640 49.8 35.9 57.0 57.7 219M config model
OneFormer Swin-L 896×896 51.1 37.6 57.4 58.3 219M config model
OneFormer Swin-L 1280×1280 51.4 37.8 57.0 57.7 219M config model
OneFormer ConvNeXt-L 640×640 50.0 36.2 56.6 57.4 220M config model
OneFormer DiNAT-L 640×640 50.5 36.0 58.3 58.4 223M config model
OneFormer DiNAT-L 896×896 51.2 36.8 58.1 58.6 223M config model
OneFormer DiNAT-L 1280×1280 51.5 37.1 58.3 58.7 223M config model
OneFormer (COCO-Pretrained) DiNAT-L 1280×1280 53.4 40.2 58.4 58.8 223M config model | pretrained
OneFormer ConvNeXt-XL 640×640 50.1 36.3 57.4 58.8 372M config model

Cityscapes

Method Backbone PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 67.2 45.6 83.0 84.4 219M config model
OneFormer ConvNeXt-L 68.5 46.5 83.0 84.0 220M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-L 70.1 48.7 84.6 85.2 220M config model | pretrained
OneFormer DiNAT-L 67.6 45.6 83.1 84.0 223M config model
OneFormer ConvNeXt-XL 68.4 46.7 83.6 84.6 372M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-XL 69.7 48.9 84.5 85.8 372M config model | pretrained

COCO

Method Backbone PQ PQTh PQSt AP mIoU #params config Checkpoint
OneFormer Swin-L 57.9 64.4 48.0 49.0 67.4 219M config model
OneFormer DiNAT-L 58.0 64.3 48.4 49.2 68.1 223M config model

Mapillary Vistas

Method Backbone PQ mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 46.7 62.9 64.1 219M config model
OneFormer ConvNeXt-L 47.9 63.2 63.8 220M config model
OneFormer DiNAT-L 47.8 64.0 64.9 223M config model

Citation

If you found OneFormer useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{jain2023oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={CVPR}, 
      year={2023}
    }

Segmentation Inference on Recorded Flight Operation Video Frames

Setup Instructions

  • Install NVIDIA CUDA Toolkit 11.3

    sudo apt update
    sudo apt upgrade -y
    
    mkdir cudatoolkits
    cd cudatoolkits
    wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
    sudo sh cuda_11.3.0_465.19.01_linux.run --toolkit --silent --override
    cd ..
    
    # Then, open bashrc and set path:
    # Paste below into bashrc:
    # export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
    # export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    echo 'export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
    
    # Executes the content of the file passed as an argument in the current shell
    source ~/.bashrc
  • Create a conda environment

    conda create --name oneformer python=3.8 -y
    conda activate oneformer
  • Install packages and other dependencies.

    git clone https://github.com/sjhpark/OneFormer.git
    cd OneFormer
    
    # Install Pytorch
    conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
    
    # Install opencv (required for running the demo)
    pip3 install -U opencv-python
    
    # Build Detectron2 from Source (reference: https://github.com/sjhpark/OneFormer/blob/main/INSTALL.md)
    git clone https://github.com/facebookresearch/detectron2.git
    python -m pip install -e detectron2
    
    # Install other dependencies
    pip3 install git+https://github.com/cocodataset/panopticapi.git
    pip3 install git+https://github.com/mcordts/cityscapesScripts.git
    pip3 install -r requirements.txt
  • Run make.sh file.

    cd oneformer/modeling/pixel_decoder/ops
    sh make.sh
    cd ../../../../
  • Download pretrained weights.

    mkdir checkpoints
    cd checkpoints
    
    # Download weights of DiNAT model pretrained with COCO dataset
    wget https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth
    
    cd ..

Segmentation Inference

  • Run segmentation inference
cd inference

# Chop video into frames
python video2frames.py --path {your video path e.g. vids/P00_OBS.mkv} --fps {e.g. 15} --id {participant ID e.g. P00}

# Resize each extracted frame
python resize.py --in_dir {your frames path e.g. frames_fps15} --scale 0.5 0.5

# Run inference and record
python inference.py --in_dir {your frames path e.g. frames_fps15/resized} --gaze_path {Path to the gaze data e.g. gaze_data/gaze_projection} --model dinat --prior coco --task semantic

Acknowledgement

We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.

oneformer's People

Contributors

praeclarumjj3 avatar sjhpark avatar honghuis avatar alihassanijr avatar skalskip avatar rbavery avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.