GithubHelp home page GithubHelp logo

lizhiwen19900709 / x-decoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/x-decoder

0.0 0.0 0.0 140.82 MB

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

License: Apache License 2.0

Python 100.00%

x-decoder's Introduction

X-Decoder: Generalized Decoding for Pixel, Image, and Language

[Project Page] [Paper] [HuggingFace All-in-One Demo] [HuggingFace Instruct Demo] [Video]

by Xueyan Zou*, Zi-Yi Dou*, Jianwei Yang*, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee^, Jianfeng Gao^ in CVPR 2023.

๐ŸŒถ๏ธ Getting Started

We release the following contents for both SEEM and X-Decoderโ—

  • Demo Code
  • Model Checkpoint
  • Comprehensive User Guide
  • Training Code
  • Evaluation Code

๐Ÿ‘‰ One-Line SEEM Demo with Linux:

git clone [email protected]:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh aasets/scripts/run_demo.sh

๐Ÿ“ [New] Getting Started:

๐Ÿ“ [New] Latest Checkpoints and Numbers:

COCO Ref-COCOg VOC SBD
Method Checkpoint backbone PQ โ†‘ mAP โ†‘ mIoU โ†‘ cIoU โ†‘ mIoU โ†‘ AP50 โ†‘ NoC85 โ†“ NoC90 โ†“ NoC85 โ†“ NoC90 โ†“
X-Decoder ckpt Focal-T 50.8 39.5 62.4 57.6 63.2 71.6 - - - -
X-Decoder-oq201 ckpt Focal-L 56.5 46.7 67.2 62.8 67.5 76.3 - - - -
SEEM_v0 ckpt Focal-T 50.6 39.4 60.9 58.5 63.5 71.6 3.54 4.59 * *
SEEM_v0 - Davit-d3 56.2 46.8 65.3 63.2 68.3 76.6 2.99 3.89 5.93 9.23
SEEM_v0 ckpt Focal-L 56.2 46.4 65.5 62.8 67.7 76.2 3.04 3.85 * *
SEEM_v1 ckpt Focal-T 50.8 39.4 60.7 58.5 63.7 72.0 3.19 4.13 * *
SEEM_v1 ckpt SAM-ViT-B 52.0 43.5 60.2 54.1 62.2 69.3 2.53 3.23 * *
SEEM_v1 ckpt SAM-ViT-L 49.0 41.6 58.2 53.8 62.2 69.5 2.40 2.96 * *

SEEM_v0: Supporting Single Interactive object training and inference
SEEM_v1: Supporting Multiple Interactive objects training and inference

๐Ÿ”ฅ News

  • [2023.10.04] We are excited to release โœ… training/evaluation/demo code, โœ… new checkpoints, and โœ… comprehensive readmes for both X-Decoder and SEEM!
  • [2023.09.24] We are providing new demo command/code for inference (DEMO.md)!
  • [2023.07.19] ๐ŸŽข We are excited to release the x-decoder training code (INSTALL.md, DATASET.md, TRAIN.md, EVALUATION.md)!
  • [2023.07.10] We release Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available!
  • [2023.04.14] We are releasing SEEM, a new universal interactive interface for image segmentation! You can use it for any segmentation tasks, way beyond what X-Decoder can do!

  • [2023.03.20] As an aspiration of our X-Decoder, we developed OpenSeeD ([Paper][Code]) to enable open-vocabulary segmentation and detection with a single model, Check it out!
  • [2023.03.14] We release X-GPT which is an conversational version of our X-Decoder through GPT-3 langchain!
  • [2023.03.01] The Segmentation in the Wild Challenge had been launched and ready for submitting results!
  • [2023.02.28] We released the SGinW benchmark for our challenge. Welcome to build your own models on the benchmark!
  • [2023.02.27] Our X-Decoder has been accepted by CVPR 2023!
  • [2023.02.07] We combine X-Decoder (strong image understanding), GPT-3 (strong language understanding) and Stable Diffusion (strong image generation) to make an instructional image editing demo, check it out!
  • [2022.12.21] We release inference code of X-Decoder.
  • [2022.12.21] We release Focal-T pretrained checkpoint.
  • [2022.12.21] We release open-vocabulary segmentation benchmark.

๐Ÿ–Œ๏ธ DEMO

๐Ÿซ [X-GPT] โ€‚ ๐Ÿ“[Instruct X-Decoder]

demo

๐ŸŽถ Introduction

github_figure

X-Decoder is a generalized decoding model that can generate pixel-level segmentation and token-level texts seamlessly!

It achieves:

  • State-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets;
  • Better or competitive finetuned performance to generalist and specialist models on segmentation and VL tasks;
  • Friendly for efficient finetuning and flexible for novel task composition.

It supports:

  • One suite of parameters pretrained for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, and Image-Text Retrieval;
  • One model architecture finetuned for Semantic/Instance/Panoptic Segmentation, Referring Segmentation, Image Captioning, Image-Text Retrieval and Visual Question Answering (with an extra cls head);
  • Zero-shot task composition for Region Retrieval, Referring Captioning, Image Editing.

Acknowledgement

Citation

@article{zou2022xdecoder,
  author      = {Zou*, Xueyan and Dou*, Zi-Yi and Yang*, Jianwei and Gan, Zhe and Li, Linjie and Li, Chunyuan and Dai, Xiyang and Wang, Jianfeng and Yuan, Lu and Peng, Nanyun and Wang, Lijuan and Lee*, Yong Jae and Gao*, Jianfeng},
  title       = {Generalized Decoding for Pixel, Image and Language},
  publisher   = {arXiv},
  year        = {2022},
}

x-decoder's People

Contributors

eltociear avatar jwyang avatar maureenzou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.