idea-research / openseed Goto Github PK

View Code? Open in Web Editor NEW

632.0 21.0 38.0 85.67 MB

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

License: Apache License 2.0

Python 93.50% Shell 0.07% C++ 0.64% Cuda 5.79%

openseed's Introduction

OpenSeeD

This is the official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection".

openseed_9.4m.mp4

You can also find the more detailed demo at video link on Youtube.

👉 [New] demo code is available 👉 [New] OpenSeeD has been accepted to ICCV 2023! training code is available!

🚀 Key Features

A Simple Framework for Open-Vocabulary Segmentation and Detection.
Support interactive segmentation with box input to generate mask.

💡 Installation

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
python -m pip install -r requirements.txt
export DATASET=/pth/to/dataset

Download the pretrained checkpoint from here.

💡 Demo script

python demo/demo_panoseg.py evaluate --conf_files configs/openseed/openseed_swint_lang.yaml  --image_path images/animals.png --overrides WEIGHT /path/to/ckpt/model_state_dict_swint_51.2ap.pt

🔥 Remember to modify the vocabulary thing_classes and stuff_classes in demo_panoseg.py if your want to segment open-vocabulary objects.

Evaluation on coco

python train_net.py --original_load --eval_only --num-gpus 8 --config-file configs/openseed/openseed_swint_lang.yaml MODEL.WEIGHTS=[/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/openseed/model_state_dict_swint_51.2ap.pt)

You are expected to get 55.4 PQ.

💡 Some coco-format data

Here is the coco-format json file for evaluating BDD and SUN.

Training OpenSeeD baseline

Training on coco

python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)

Training on coco+o365

python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang_o365.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)

Checkpoints

Swin-T model trained on COCO panoptic segmentation and Objects365 weights.
Swin-L model fine-tuned on COCO panoptic segmentation weights.
Swin-L model fine-tuned on ADE20K semantic segmentation weights.

🦄 Model Framework

🌋 Results

Results on open segmentation Results on task transfer and segmentation in the wild

Citing OpenSeeD

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@article{zhang2023simple,
  title={A Simple Framework for Open-Vocabulary Segmentation and Detection},
  author={Zhang, Hao and Li, Feng and Zou, Xueyan and Liu, Shilong and Li, Chunyuan and Gao, Jianfeng and Yang, Jianwei and Zhang, Lei},
  journal={arXiv preprint arXiv:2303.08131},
  year={2023}
}

openseed's People

Contributors

Stargazers

Watchers

openseed's Issues

No Prediction on Custom Image

The model is not producing any output on custom images.
However, if I use the images provided in the repo, it works.

Any help would be appreciated.

Thanks for the great work and for making the code publicly available!
I would like to reproduce your results on task-transfer of Tab.3 in the paper, especially on Cityscapes and ADE20K.
Can you provide any guidance? Is the code ready for finetuning on any dataset (assuming mapper is implemented)?

Thank you in advance!

[Question] How should I train on my own training set?

I am new to this field and now that I have converted my data set to COCO format, what should I do to train my model on this data set?
I would appreciate it if anyone could answer!

RuntimeError: numel: integer multiplication overflow

Hi, thanks for sharing your work.

I followed the recommended environment setup instructions and ran into the following error while trying to run the demo. I am using python 3.9 (if that plays a role here). Looking forward to any suggestions to fix the error. Thanks.

RuntimeError: numel: integer multiplication overflow

The complete traceback for your reference:

Traceback (most recent call last):
  File "/home/jj/OpenSeeD/demo/demo_panoseg.py", line 107, in <module>
    main()
  File "/home/jj/OpenSeeD/demo/demo_panoseg.py", line 86, in main
    outputs = model.forward(batch_inputs)
  File "/home/jj/OpenSeeD/openseed/BaseModel.py", line 19, in forward
    outputs = self.model(*inputs, **kwargs)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 299, in forward
    processed_results = self.forward_seg(batched_inputs, task=inference_task)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 434, in forward_seg
    instance_r = retry_if_cuda_oom(self.instance_inference)(mask_cls_result, mask_pred_result, mask_box_result)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/detectron2/utils/memory.py", line 70, in wrapped
    return func(*args, **kwargs)
  File "/home/jj/OpenSeeD/openseed/architectures/openseed_model.py", line 558, in instance_inference
    print(scores, keep)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor.py", line 427, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 637, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 568, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 328, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/jj/anaconda3/envs/openseed/lib/python3.9/site-packages/torch/_tensor_str.py", line 115, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: numel: integer multiplication overflow

PublicAccessNotPermitted

The URL "https://projects4jw.blob.core.windows.net/x-decoder/release/coco_caption.zip" in the "install_cococapeval.sh" file is giving an error stating that it cannot be publicly accessed.

Not able change to other coco classes

I have tested the demo_panoseg in Colab. I also had to compile Deformable-DETR and restart runtime in colab.
when i changed the thing classes to coco classes like cup, spoon, fork, bottle etc., i got this error.
Attribute 'thing_colors' in the metadata of 'demo' cannot be set to a different value!
if we can add text2seg or groundedsam type of prompt as class names, it would be useful.

About OpenSeeD(L)

The COCO PQ result of OpenSeeD(L) is 59.5. Can you release the corresponding weights?

How should I train if I want to reproduce the result of COCO PQ of OpenSeeD(L)?

Does this code run on Windows?

train_net.py error

File "E:\docker\OpenSeeD\train_net.py", line 442, in main
trainer._trainer.model.module = trainer._trainer.model.module.from_pretrained(cfg.MODEL.WEIGHTS)
File "E:\ProgramData\Anaconda3\envs\torch113\lib\site-packages\torch\nn\modules\module.py", line 1265, in getattr
load original language language weight!!!!!!
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BaseModel' object has no attribute 'module'. Did you mean: 'modules'?

Is it possible to generate embeddings that can be queried later

Title says it pretty much. In SAM, they have the idea of encode-once, decode anytime later, which helps engineer systems around it. Can something similar be implementing in OpenSeeD? Can the encoder's embeddings be cached such that it can be decoded and matched against an input-text-prompt at runtime?

What parameters are set, only detection, no segmentation

Asking for training resources.

How many GPUs and training time do we need for OpenSeeD? Thanks.

Can inference latency be further compressed?

Hi, it is a nice work! I am interested in it. I tried to mask the interested object in the picture of my own using OpenSeed. But it takes around 90 seconds! My CPU is Intel Xeon Processor and the GPU is Tesla V100S. So I want to know is there some methods to accelerate the inference latency?

Are there tutorials for evaluating in other datasets?

The demo has relatively low performance.

I ran demo_instseg.py with your released weights and found that it did not detect any object, but if I used images and text descriptions that included people, then it worked normally. This indicates that the performance is relatively low for OV objects. I don't know why, have you run this script?

Evaluation

Hi, Thanks for your great work! How to run an evaluation on the other datasets in a zero-shot manner, like ADE20K/LIVS/SUN? Could you provide a script?

1 how to train ADE20K dataset on panoptic segmentation.

Thanks,

I am trying to use the ADE20K dataset for panoptic segmentation.
How do I prepare the dataset for panoptic segmentation, and how do I train using the ADE20K dataset?

How to do with this error?

RuntimeError: CUDA error: device-side assert triggered

LVIS evaluation

Thank you for open sourcing the great work!
I have implemented the Fixed AP version of LVIS evaluation according to GLIP and tested the available 51.4 ap model.

But I found the box MAP(15.9) and the mask MAP(14.3) is much lower than the results in the paper.
Is there any special configuration for LVIS evaluation?

When will release code?

BaseModel().cuda() out of memory

When I run
def build_model(cls, cfg):
model = BaseModel(cfg, build_model(cfg)).cuda()
in train_net.py
the memory-usage will increase without stop. Finally cuda out of memory.
I have tried lots of times and I don't know where it comes from.

issues about your config

Hello thank you very much for your contribution but I don't really understand the config that you released！
for openseed_swint_lang.yaml, why the ADE20K train is contained?

for openseed_swint_lang_o365.yaml, why the object365 val is contained in the train?

for openseed_swint_lang_o365_decouple.yaml, are there any misspellings?

meanwhile, Is the latest version of the code currently available to reproduce the sota experiments in the paper?
Thanks again for your wonderful work, hopefully the experimental config file for swin-l can be released soon so that there can be more works based on this work.

The weights of OpenSeed(Swin-L).

Hi! Thanks for your excellent work.

Are you planning to release the weights of OpenSeed(Swin-L)? How long should we expect to wait?

[BUG] The demo does not need to consider the background.

Panoptic segmentation and semantic segmentation are trained in sigmoid mode, so demo_panoseg.py and demo_semseg.py do not need to consider the background, otherwise it will report an index error.

Implementation about the stuff and thing queries

Hello authors, thanks for the great work!

I just have a quick question about the code. One of the main contribution of the paper is to use the foreground query and background query. But in the given script of COCO panoptic evaluation, I saw that only the 300 foreground queries were used. There is no 100 stuff class queries mentioned in the paper. I indeed saw the variable num_queries_stuff in openseed_decoder_decouple.py, but it is not used in the implementation. Isn't the released code the fully implementation of the paper, am I missing something?

Number of qeuries for panoptic segmentation

According to the paper, there should be 400 (300 things and 100 stuff) queries for panoptic segmentation. However, I do not find this setting in the demo or evaluation codes. It seems that only 300 queries are used.

Some questions of background queries

Dear author, after carefully reading through the code I have two queries that I hope you could answer at your convenience:

Conditioned Mask Decoding was not open sourced, is that correct?
I did not find the background queries part. I think what is currently open sourced is only an open vocabulary mask dino algorithm that can train on detection and segmentation datasets simultaneously, without specifically distinguishing foreground and background queries. I'm not sure if I understand correctly?

Failed to replicate 55.4 PQ

I trained the model on coco and coco+obj365，the model performs well on AP but PQ is very low on stuff classes, can you provide your log or help me to find the problem.

	PQ	SQ	RQ	#categories
All	43.291	80.736	51.350	133
Things	58.728	84.347	69.128	80
Stuff	19.990	75.284	24.514	53

[01/04 20:35:52 d2.evaluation.coco_evaluation]: Evaluation results for bbox:

AP	AP50	AP75	APs	APm	APl
49.908	69.194	54.247	32.613	53.047	65.220

[01/04 20:36:11 d2.evaluation.coco_evaluation]: Evaluation results for segm:

AP	AP50	AP75	APs	APm	APl
44.594	67.810	48.344	24.946	47.901	64.468

The environment

pretrained checkpoint（model_state_dict_swint_51.2ap.pt）

Great job! I would like to ask, what data was used for training the pretrained checkpoint（model_state_dict_swint_51.2ap.pt）?

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Hello,

I don't know if I was the unlucky one but for running the code I needed to compile MultiScaleDeformableAttention CUDA Op.
In order to solve the issue:

cd openseed/body/encoder/ops
sh make.sh

Hoping this will help people having the same issue!
You could add it in the installation instructions just in case!
Best regards,
saltoricristiano

Finetuning capabilities?

Is it possible to finetune this model based on a custom dataset if we meet the COCO panoptic segmentation format standard ?
What I am hoping to achieve is being able to train on a set of very domain specific data I have.

I am currently trying to set up a Colab Notebook where I can try this out.
I have managed to sort my data correctly and generated the following segmentation.png files but I am running into some problems when trying to run the model and i'm unsure where to define the checkpoint .pt provided as a base for the finetuning if so.

I have followed the steps provided in the readme with the setup.
I have gotten so far that when I try and start the training i get this error:

Traceback (most recent call last): File "/content/OpenSeeD/train_net.py", line 466, in <module> launch( File "/usr/local/lib/python3.10/dist-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "/content/OpenSeeD/train_net.py", line 442, in main trainer._trainer.model.module = trainer._trainer.model.module.from_pretrained(cfg.MODEL.WEIGHTS) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1269, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'BaseModel' object has no attribute 'module'. Did you mean: 'modules'?

Not sure where to go from here or if what I am trying to do is even possible. Any help or guidance would be very much appreciated!
Really interesting to see what you guys have achieved with this model, great work!

How to train this model?

Did provided pretrained OpenSeed(T) checkpoint train on coco+o365?

Did provided pretrained OpenSeed(T) checkpoint(model_state_dict_swint_51.2ap.pt) train on coco+o365(v1) by the command python train_net.py --num-gpus 8 --config-file configs/openseed/openseed_swint_lang_o365.yaml --lang_weight [/path/to/lang/weight](https://github.com/IDEA-Research/OpenSeeD/releases/download/training/model_state_dict_only_language.pt)?

Does the command include Online Mask Assistance? Did the pretrained OpenSeed(T) checkpoint train with Offline Mask Assistance？

SUN-37 dataset

How to process the sun rgbd dataset, can you update this part in the readme?

Training code.

When will your group release the training code?