GithubHelp home page GithubHelp logo

idea-research / openseed Goto Github PK

View Code? Open in Web Editor NEW
632.0 21.0 38.0 85.67 MB

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

License: Apache License 2.0

Python 93.50% Shell 0.07% C++ 0.64% Cuda 5.79%

openseed's Introduction

OpenSeeD

PWC PWC PWC PWC

This is the official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection".

openseed_9.4m.mp4

You can also find the more detailed demo at video link on Youtube.

👉 [New] demo code is available 👉 [New] OpenSeeD has been accepted to ICCV 2023! training code is available!

🚀 Key Features

  • A Simple Framework for Open-Vocabulary Segmentation and Detection.
  • Support interactive segmentation with box input to generate mask.

💡 Installation

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
python -m pip install -r requirements.txt
export DATASET=/pth/to/dataset

Download the pretrained checkpoint from here.

💡 Demo script

python demo/demo_panoseg.py evaluate --conf_files configs/openseed/openseed_swint_lang.yaml  --image_path images/animals.png --overrides WEIGHT /path/to/ckpt/model_state_dict_swint_51.2ap.pt

🔥 Remember to modify the vocabulary thing_classes and stuff_classes in demo_panoseg.py if your want to segment open-vocabulary objects.

Evaluation on coco

python train_net.py --original_load --eval_only --num-gpus 8 --config-file configs/openseed/openseed_swint_lang.yaml MODEL.WEIGHTS=[/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/openseed/model_state_dict_swint_51.2ap.pt)

You are expected to get 55.4 PQ.

💡 Some coco-format data

Here is the coco-format json file for evaluating BDD and SUN.

Training OpenSeeD baseline

Training on coco

python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)

Training on coco+o365

python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang_o365.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)

Checkpoints

  • Swin-T model trained on COCO panoptic segmentation and Objects365 weights.
  • Swin-L model fine-tuned on COCO panoptic segmentation weights.
  • Swin-L model fine-tuned on ADE20K semantic segmentation weights. hero_figure

🦄 Model Framework

hero_figure

🌋 Results

Results on open segmentation hero_figure Results on task transfer and segmentation in the wild hero_figure

Citing OpenSeeD

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@article{zhang2023simple,
  title={A Simple Framework for Open-Vocabulary Segmentation and Detection},
  author={Zhang, Hao and Li, Feng and Zou, Xueyan and Liu, Shilong and Li, Chunyuan and Gao, Jianfeng and Yang, Jianwei and Zhang, Lei},
  journal={arXiv preprint arXiv:2303.08131},
  year={2023}
}

openseed's People

Contributors

fengli-ust avatar haozhang534 avatar ideacvr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openseed's Issues

No Prediction on Custom Image

The model is not producing any output on custom images.
However, if I use the images provided in the repo, it works.

Any help would be appreciated.

Task-specific transfer

Hi!

Thanks for the great work and for making the code publicly available!
I would like to reproduce your results on task-transfer of Tab.3 in the paper, especially on Cityscapes and ADE20K.
Can you provide any guidance? Is the code ready for finetuning on any dataset (assuming mapper is implemented)?

Thank you in advance!

RuntimeError: numel: integer multiplication overflow

Hi, thanks for sharing your work.

I followed the recommended environment setup instructions and ran into the following error while trying to run the demo. I am using python 3.9 (if that plays a role here). Looking forward to any suggestions to fix the error. Thanks.

RuntimeError: numel: integer multiplication overflow

The complete traceback for your reference:

Traceback (most recent call last):
  File "/home/jj/OpenSeeD/demo/demo_panoseg.py", line 107, in <module>
    main()
  File "/home/jj/OpenSeeD/demo/demo_panoseg.py", line 86, in main
    outputs = model.forward(batch_inputs)
  File "/home/jj/OpenSeeD/openseed/BaseModel.py", line 19, in forward
    outputs = self.model(*inputs, **kwargs)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 299, in forward
    processed_results = self.forward_seg(batched_inputs, task=inference_task)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 434, in forward_seg
    instance_r = retry_if_cuda_oom(self.instance_inference)(mask_cls_result, mask_pred_result, mask_box_result)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/detectron2/utils/memory.py", line 70, in wrapped
    return func(*args, **kwargs)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 558, in instance_inference
    print(scores, keep)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor.py", line 427, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 637, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 568, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 328, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 115, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: numel: integer multiplication overflow

Not able change to other coco classes

I have tested the demo_panoseg in Colab. I also had to compile Deformable-DETR and restart runtime in colab.
when i changed the thing classes to coco classes like cup, spoon, fork, bottle etc., i got this error.
Attribute 'thing_colors' in the metadata of 'demo' cannot be set to a different value!
if we can add text2seg or groundedsam type of prompt as class names, it would be useful.

About OpenSeeD(L)

The COCO PQ result of OpenSeeD(L) is 59.5. Can you release the corresponding weights?

How should I train if I want to reproduce the result of COCO PQ of OpenSeeD(L)?

train_net.py error

File "E:\docker\OpenSeeD\train_net.py", line 442, in main
trainer._trainer.model.module = trainer._trainer.model.module.from_pretrained(cfg.MODEL.WEIGHTS)
File "E:\ProgramData\Anaconda3\envs\torch113\lib\site-packages\torch\nn\modules\module.py", line 1265, in getattr
load original language language weight!!!!!!
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BaseModel' object has no attribute 'module'. Did you mean: 'modules'?

Is it possible to generate embeddings that can be queried later

Title says it pretty much. In SAM, they have the idea of encode-once, decode anytime later, which helps engineer systems around it. Can something similar be implementing in OpenSeeD? Can the encoder's embeddings be cached such that it can be decoded and matched against an input-text-prompt at runtime?

Can inference latency be further compressed?

Hi, it is a nice work! I am interested in it. I tried to mask the interested object in the picture of my own using OpenSeed. But it takes around 90 seconds! My CPU is Intel Xeon Processor and the GPU is Tesla V100S. So I want to know is there some methods to accelerate the inference latency?

The demo has relatively low performance.

I ran demo_instseg.py with your released weights and found that it did not detect any object, but if I used images and text descriptions that included people, then it worked normally. This indicates that the performance is relatively low for OV objects. I don't know why, have you run this script?

Evaluation

Hi, Thanks for your great work! How to run an evaluation on the other datasets in a zero-shot manner, like ADE20K/LIVS/SUN? Could you provide a script?

LVIS evaluation

Thank you for open sourcing the great work!
I have implemented the Fixed AP version of LVIS evaluation according to GLIP and tested the available 51.4 ap model.

But I found the box MAP(15.9) and the mask MAP(14.3) is much lower than the results in the paper.
Is there any special configuration for LVIS evaluation?

BaseModel().cuda() out of memory

When I run
def build_model(cls, cfg):
model = BaseModel(cfg, build_model(cfg)).cuda()
in train_net.py
the memory-usage will increase without stop. Finally cuda out of memory.
I have tried lots of times and I don't know where it comes from.

issues about your config

Hello thank you very much for your contribution but I don't really understand the config that you released!
for openseed_swint_lang.yaml, why the ADE20K train is contained?
image
for openseed_swint_lang_o365.yaml, why the object365 val is contained in the train?
image
for openseed_swint_lang_o365_decouple.yaml, are there any misspellings?
image
meanwhile, Is the latest version of the code currently available to reproduce the sota experiments in the paper?
Thanks again for your wonderful work, hopefully the experimental config file for swin-l can be released soon so that there can be more works based on this work.

The weights of OpenSeed(Swin-L).

Hi! Thanks for your excellent work.

Are you planning to release the weights of OpenSeed(Swin-L)? How long should we expect to wait?

Implementation about the stuff and thing queries

Hello authors, thanks for the great work!

I just have a quick question about the code. One of the main contribution of the paper is to use the foreground query and background query. But in the given script of COCO panoptic evaluation, I saw that only the 300 foreground queries were used. There is no 100 stuff class queries mentioned in the paper. I indeed saw the variable num_queries_stuff in openseed_decoder_decouple.py, but it is not used in the implementation. Isn't the released code the fully implementation of the paper, am I missing something?

Number of qeuries for panoptic segmentation

According to the paper, there should be 400 (300 things and 100 stuff) queries for panoptic segmentation. However, I do not find this setting in the demo or evaluation codes. It seems that only 300 queries are used.

Some questions of background queries

Dear author, after carefully reading through the code I have two queries that I hope you could answer at your convenience:

  1. Conditioned Mask Decoding was not open sourced, is that correct?
  2. I did not find the background queries part. I think what is currently open sourced is only an open vocabulary mask dino algorithm that can train on detection and segmentation datasets simultaneously, without specifically distinguishing foreground and background queries. I'm not sure if I understand correctly?

Failed to replicate 55.4 PQ

I trained the model on coco and coco+obj365,the model performs well on AP but PQ is very low on stuff classes, can you provide your log or help me to find the problem.

PQ SQ RQ #categories
All 43.291 80.736 51.350 133
Things 58.728 84.347 69.128 80
Stuff 19.990 75.284 24.514 53

[01/04 20:35:52 d2.evaluation.coco_evaluation]: Evaluation results for bbox:

AP AP50 AP75 APs APm APl
49.908 69.194 54.247 32.613 53.047 65.220

[01/04 20:36:11 d2.evaluation.coco_evaluation]: Evaluation results for segm:

AP AP50 AP75 APs APm APl
44.594 67.810 48.344 24.946 47.901 64.468

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Hello,

I don't know if I was the unlucky one but for running the code I needed to compile MultiScaleDeformableAttention CUDA Op.
In order to solve the issue:

cd openseed/body/encoder/ops
sh make.sh

Hoping this will help people having the same issue!
You could add it in the installation instructions just in case!
Best regards,
saltoricristiano

Finetuning capabilities?

Is it possible to finetune this model based on a custom dataset if we meet the COCO panoptic segmentation format standard ?
What I am hoping to achieve is being able to train on a set of very domain specific data I have.

I am currently trying to set up a Colab Notebook where I can try this out.
I have managed to sort my data correctly and generated the following segmentation.png files but I am running into some problems when trying to run the model and i'm unsure where to define the checkpoint .pt provided as a base for the finetuning if so.

I have followed the steps provided in the readme with the setup.
I have gotten so far that when I try and start the training i get this error:

Traceback (most recent call last): File "/content/OpenSeeD/train_net.py", line 466, in <module> launch( File "/usr/local/lib/python3.10/dist-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "/content/OpenSeeD/train_net.py", line 442, in main trainer._trainer.model.module = trainer._trainer.model.module.from_pretrained(cfg.MODEL.WEIGHTS) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1269, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'BaseModel' object has no attribute 'module'. Did you mean: 'modules'?

Not sure where to go from here or if what I am trying to do is even possible. Any help or guidance would be very much appreciated!
Really interesting to see what you guys have achieved with this model, great work!

Did provided pretrained OpenSeed(T) checkpoint train on coco+o365?

Did provided pretrained OpenSeed(T) checkpoint(model_state_dict_swint_51.2ap.pt) train on coco+o365(v1) by the command python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang_o365.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)?

Does the command include Online Mask Assistance? Did the pretrained OpenSeed(T) checkpoint train with Offline Mask Assistance?

SUN-37 dataset

How to process the sun rgbd dataset, can you update this part in the readme?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.