GithubHelp home page GithubHelp logo

jarch-ma / sed Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xb534/sed

0.0 0.0 0.0 3.04 MB

[CVPR2024] Official Pytorch Implementation of SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation.

Shell 0.58% Python 99.36% Makefile 0.07%

sed's Introduction

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

This is our official pytorch implementation of SED.

🔥 News

  • SED is accepted by CVPR 2024.

Introduction


  • We propose an encoder-decoder for open-vocabulary semantic segmentation comprising a hierarchical encoder-based cost map generation and a gradual fusion decoder.
  • We introduce a category early rejection scheme to reject non-existing categories at the early layer, which aids in markedly increasing the inference speed without any significant degradation in segmentation performance. For instance, it provides 4.7 times acceleration on PC-459.
  • Our proposed method, SED, achieves the superior performance on multiple open-vocabulary segmentation datasets. Specifically, the proposed SED provides a good trade-off in terms of segmentation performance and speed. When using ConvNeXt-L, our proposed SED obtains mIoU scores of 35.2% on A-150 and 22.6% on PC-459.

For further details and visualization results, please check out our paper.

Installation

Please follow installation.

Data Preparation

Please follow dataset preperation.

Training

We provide shell scripts for training and evaluation. run.sh trains the model in default configuration and evaluates the model after training.

To train or evaluate the model in different environments, modify the given shell script and config files accordingly.

Training script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

# For ConvNeXt-B variant
sh run.sh configs/convnextB_768.yaml 4 output/
# For ConvNeXt-L variant
sh run.sh configs/convnextL_768.yaml 4 output/

Evaluation

eval.sh automatically evaluates the model following our evaluation protocol, with weights in the output directory if not specified. To individually run the model in different datasets, please refer to the commands in eval.sh.

Evaluation script

sh run.sh [CONFIG] [NUM_GPUS] [OUTPUT_DIR] [OPTS]

sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth

# Fast version.
sh eval.sh configs/convnextB_768.yaml 4 output/ MODEL.WEIGHTS path/to/weights.pth  TEST.FAST_INFERENCE True  TEST.TOPK 8

Results


We provide pretrained weights for our models reported in the paper. All of the models were evaluated with 4 NVIDIA A6000 GPUs, and can be reproduced with the evaluation script above. The inference time is reported on a single NVIDIA A6000 GPU.

Name CLIP A-847 PC-459 A-150 PC-59 PAS-20 Download
SED (B) ConvNeXt-B 11.2 18.6 31.8 57.7 94.4 ckpt 
SED-fast (B) ConvNeXt-B 11.4 18.6 31.6 57.3 94.4 ckpt 
SED (L) ConvNeXt-L 13.7 22.1 35.3 60.9 96.1 ckpt 
SED-fast (L) ConvNeXt-L 13.9 22.6 35.2 60.6 96.1 ckpt 

Citation

@misc{xie2023sed,
      title={SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation}, 
      author={Bin Xie and Jiale Cao and Jin Xie and Fahad Shahbaz Khan and Yanwei Pang},
      year={2023},
      eprint={2311.15537},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

We would like to acknowledge the contributions of public projects, such as CAT-Seg, whose code has been utilized in this repository.

sed's People

Contributors

xb534 avatar bigbin1103 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.