[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

License: Apache License 2.0

Python 94.88% C++ 0.50% Cuda 4.55% Shell 0.07%

segment-everything-everywhere-all-at-once's People

Contributors

Stargazers

Watchers

Forkers

zhang-tao-whu anminhhung manomanas yukang2017 licongguan haorand lkxu hsaigroup jwyang moileehyeji yangedai machinelearning-ai undercontroller baicuya sorieil kaidduong jimmyma99 sunrainyg collector-m qianqian121 yijunwu xiaomogo1998 xingpanfeng whuhxb ghljh rpgloverdragon chencontrol kkk222iu ibrandiay fourthm zhsker able-chy zhangningboo wlzhdtk evdcush onepiec1 miaowu99 dandingbudanding jon-drugstore smyucas sfidea edsun3941 yizhangliu flying21 liu4lin lycsqq jasonjeng simonebonato d-mad jaredshuai techthiyanes thanhpham1987 zhaozhipeng1997 lightsun hyojunguy giithuuuub hufeihu luisa13florez13 nnzhangup paperwave weihao-bo eltociear my-basement githuberpilot qxmao research-developer ai-alebrijecircus-x tngamemo ptichkass mohannadehabbarakat amokame goswamig pavankumarpaladi dankosaric46 jaedukseo alexptg tina9309 fxhollow antoniogonzalezsuarez hadryan guitaryourself lidi100 volome vieozhu jiangzhengkai ericlong423 thenetguy rockystevejobs rentainhe gth-ai lestercovax sfgrahman ailearnwjf carlosfudan aliushn aicodehunt dchichkov wuzujiong feitianxiaoxiao jessy-huang

segment-everything-everywhere-all-at-once's Issues

在我得数据据，表现非常糟糕，请问是否可以用我得数据集训练，使其在我的数据集表现优异

code for: Referring image to mask

hello
Is this repo is only for a demo? or you plan to release the code,at least for the feature of: Referring image to mask
thanks

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

Checkpoint for Gradio Demo

Thanks for open-sourcing your great work!

I have tried your Gradio demo and it works very well. I wonder whether your released checkpoint, i.e., 'seem_focalt_v1.pt`, is the same one that was used in the demo. Thanks!

RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'

I encountered an error, when using the sample， task is vedio. How can I solve this problem?

Traceback (most recent call last):
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "app.py", line 65, in inference
return interactive_infer_video(model, audio, image, task, *args, **kwargs)
File "/data2/nkm/Segment-Everything-Everywhere-All-At-Once-main/demo_code/tasks/interactive.py", line 226, in interactive_infer_video
refimg_mask = (F.interpolate(refimg_mask, (_height, _width), mode='bilinear', align_corners=True) > 0)
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/torch/nn/functional.py", line 3731, in interpolate
return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'

Demo code installation

Using windows WSL and trying to run the demo code.

I get an error while installing the requirements. For some reason it starts with the github repositories linked in the requirements.txt.

While installing from https://github.com/MaureenZOU/detectron2-xyz.git you get an error saying ModuleNotFoundError: No module named 'torch'.

By installing manually the torch module the error is solved but I don´t think this is ideal.

Another issue comes from the run_demo.sh, at least in my case it is not recognizing the sudo apt commands making it fail. Again if I do the commands manually it works.

Questions about water surfaces and mirrors

Thanks for your great work! I noticed that your demo can identify water surfaces and mirrors well without misclassifying them due to reflections. Is there any specific design for this?

Awesome! Does it segment with text in e2e mode?

Does it need detection first and then using box to segmentation like SAM?

Can our project achieve real-time segmentation video?

about Issues
facebookresearch/segment-anything#240

Code release and some typos in the paper

Hi authors,

Thanks for the amazing work!
I really like this paper.
I'd like to know when do you plan to open source the code and checkpoint?

There are some typos in this paper. (Will update if I found more)

line 3 of Fig 3 caption: "object".
First paragraph of Section 3. The red mark of the three types of prompts are not consistent. Also, they are not reflected in Eq. (3), but in Eq. (1).

Use multiple texts to query an image

I wonder how to segment zebra and grass simultaneously. And how to get all the segmentation masks of the zebras?

How to train our custom datasets?

Visual referring data process

hi,
Thank you for your work!
I am very interested in the part of Visual referring. There is little information about the data processing for training in the paper, so could you please share some key points of visual referring data processing for training? Thanks!

Question about training with non-prompt, visual-prompt, and text-prompt

Thanks for sharing the cool results! I have one more detailed question.
When you train no-prompt, text-prompt, and visual prompts, how do you train them all together?
Does it mean while training, for every batch, you randomly pick 1 task from 3 tasks (no-prompt, visual prompt, text-prompt)?
It is not clear how you can train them together when the given prompt not aligned with each other, for example, given text-prompt (e.g., caption of the whole image describing all the objects, say 10) but the visual-prompt only has 1 or 2 points represent 1 or 2 objects.

Thank you!

Demo effection

In some scenarios, the effect may not seem very good...

[Question] inference with multiple reference images.

Can I infer from multiple reference images?
In your demo, I can infer from a single reference image, but I would like to infer from multiple reference images.

Code release

This is a great job. Estimated when will the code be released? Thanks for your contribution to the community.

run error

when I run the app.py script, the error is as follows below, and how I can run successfully, Thank you.

Traceback (most recent call last):
  File "D:\Projects\Segment-Everything-Everywhere-All-At-Once\demo_code\app.py", line 18, in <module>
    import whisper
  File "D:\Applications\Anaconda3\envs\seem\lib\site-packages\whisper.py", line 69, in <module>
    libc = ctypes.CDLL(libc_name)
  File "D:\Applications\Anaconda3\envs\seem\lib\ctypes\__init__.py", line 364, in __init__
    if '/' in name or '\\' in name:
TypeError: argument of type 'NoneType' is not iterable

Training code

Looking forward...

One or All prompts?

May I know whether a model corresponds to a specific prompt or whether a model can correspond to all the prompts you have listed at the same time (that means you use massive of all of these prompts in different fashions at the same time to train a single model)?

No such file or directory:Segment-Everything-Everywhere-All-At-Once-main/demo_code/null

I have a problem with this, can you explain it to me?

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/starlette/responses.py", line 335, in call
stat_result = await anyio.to_thread.run_sync(os.stat, self.path)
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
FileNotFoundError: [Errno 2] No such file or directory: '/home/nts1/users/tungdd7/Segment-Everything-Everywhere-All-At-Once-main/demo_code/null'

This needs a license

Current licensing is ambiguous because the repo has no license. It makes it hard to adopt this work for further R&D, commercial applications, etc.

File Not Found: seem_focall_v1.pt

When I try to run app.py I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'seem_focall_v1.pt'

The link(https://projects4jw.blob.core.windows.net/x-decoder/release/seem_focall_v1.pt) seems to go to this page:
This XML file does not appear to have any style information associated with it. The document tree is shown below.

PublicAccessNotPermitted
Public access is not permitted on this storage account. RequestId:b7249e77-701e-0067-6579-968f11000000 Time:2023-06-04T00:17:46.1304179Z

Any assistance would be appreciated.

Why does it just show one object in the pic ？although the result seems right using "Panoptic"

Why does it just show one object in the pic ？although the result seems right using "Panoptic". If there is any other setting needed to be changed, please tell me, thx!

Differences between SEEM Focal-L and the Huggingface Demo model?

The demo outputs a warning:

The current model is run on SEEM Focal-L, for best performance refer to our demo.

And then the performance of the model seems to be worse than the SEEM demo at the Huggingface. In particular, I've noticed that the segmentation with referring text are worse, they "splash" onto the neighboring objects. What are the differences between the official demo on Huggingface and the published SEEM Focal-L checkpoint and config?

502 Bad Gateway?

Hello, I just wanted to access the demo of the video, and there was an error in the title. Is there something wrong?https://36d0f3ae0b147b22.gradio.app

Can it use bounding box prompt to get the label and mask?

Losses used in training

Hi! Wonderful project you have here :)
just wondering if you can provide us with all the losses you use in the training step, since it is not mentioned in the paper.
thanks a lot!

Results on open-vocabulary panoptic segmentation

In your paper, you mention

"...strong performance on many segmentation tasks including closed-set and open-set
panoptic segmentation, ...

I cannot seem to find the section on open-set panoptic segmentation. Do you have results on this task?

Video demo with Referring Text

As far as my current understanding. Video demos only support draw on referring image.
Is there any limitation on it? can we use text or audio prompt?

Would you mind adding a link to SA3D?

Hi there,

This is Lingxi Xie, one of the co-authors of SA3D.

I read the paper and code and found the project quite interesting and inspiring!

I saw that you referred to SA3D and completed the segmentation on NeRF. If the mentioned SA3D in the repo was our project (sorry if it was not), would you mind adding a link to the text in the readme file so that readers can get the message?

Best,
Lingxi

about the testing results

Hello, author! nice work. Does your project provide the code to obtain the results mentioned in the paper?

How can I supress the "other" label

I would like to know how to supress the "other" label in the predicted video.
And how can I use my own coco datasets to replace the default one.
Thanks you!

Video Demo (Beta) web can not access

Video Demo (Beta) web can not access，maybe down? https://44a0d1f15aefcf1a.gradio.app/

the model will be open？

AttributeError: module 'enum' has no attribute 'IntFlag'

当我在anaconda中创建好环境，并运行“pip install -r requirements.txt”后，出现了这一提示，请问如何解决呢？

When I created the environment in anaconda and ran "pip install -r requirements.txt", this prompt appeared. How to solve it?

ValueError: RGBA values should be within 0-1 range

Sometimes color values go beyond 1 .

File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/tasks/interactive.py", line 113, in interactive_infer_image
demo = visual.draw_panoptic_seg(pano_seg.cpu(), pano_seg_info) # rgb Image
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 543, in draw_panoptic_seg
self.overlay_instances(masks=masks, labels=labels, assigned_colors=colors, alpha=alpha)
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 745, in overlay_instances
self.draw_text(
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 890, in draw_text
color = np.maximum(list(mplc.to_rgb(color)), 0.2)
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 496, in to_rgb
return to_rgba(c)[:3]
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 299, in to_rgba
rgba = _to_rgba_no_colorcycle(c, alpha)
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 395, in _to_rgba_no_colorcycle
raise ValueError("RGBA values should be within 0-1 range")
ValueError: RGBA values should be within 0-1 range

Code Release

Thanks for the great work! The demo is very inspiring and we are excited about it's potential applications in downstream tasks, especially in robotics. Do you have an ETA on when the code would be released?

We would love to play with the thresholds (seems to have many false positives in the demo) to see if this is useful out of the box. Please let me know! Happy to reach out privately as well.

Thanks
Dhruv (UC Berkeley)

Always segments person when using text

Hello!

Thank you for releasing this great work! I'm trying out the demo on some fashion images and I primarily get the entire person in the segmentation when using text. I have provided some examples below. Thank you!

About Ref-COCO dataset overlapping.

I'd like to express my appreciation for your excellent work. it is both engaging and insightful. However, I have some confusion regarding the experiment detailed in Chapter 4.

In the paper, you mentioned using a combination of Ref-COCO, Ref-COCOg, and Ref-COCO+ for COCO image annotations in the referring segmentation task. Then, you report your evaluation on Ref-COCOg. While I find this approach interesting, I'm not quite sure what you mean by "combination." Additionally, I am concerned about the potential for data leakage since Ref-COCO, Ref-COCOg, and Ref-COCO+ are three types of annotations on the same image dataset, which might lead to overlap between the training and test sets of different annotations. Could you please provide further clarification on this experimental part? Thank you!

the code doesn't work for me

Following the guide, i manage to install all requirements in a brandy new conda env. I tried to run the zebra example (in which i am interested the most) got no segmentation results. I had tried other examples, but in no vain (no segmentation results at all).

the outputs from my terminal console seemed alright, no error message except some warnings. here is the package list i installed, would you be so kind to tell me the reason i failed:

absl-py 1.4.0
accelerate 0.19.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.1
antlr4-python3-runtime 4.9.3
anyio 3.7.0
appdirs 1.4.4
astunparse 1.6.3
async-timeout 4.0.2
attrs 23.1.0
black 21.4b2
cachetools 5.3.1
certifi 2023.5.7
charset-normalizer 3.1.0
cityscapesScripts 2.2.2
click 8.1.3
cloudpickle 2.2.1
cmake 3.26.3
coloredlogs 15.0.1
contourpy 1.0.7
cycler 0.11.0
detectron2 0.6
diffdist 0.1
diffusers 0.11.1
einops 0.6.1
exceptiongroup 1.1.1
fastapi 0.95.2
ffmpy 0.3.0
filelock 3.12.0
flatbuffers 23.5.26
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
ftfy 6.1.1
future 0.18.3
fvcore 0.1.5.post20221221
gast 0.4.0
google-auth 2.19.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
gradio 3.31.0
gradio_client 0.2.5
grpcio 1.54.2
h11 0.14.0
h5py 3.8.0
httpcore 0.17.2
httpx 0.24.1
huggingface-hub 0.14.1
humanfriendly 10.0
hydra-core 1.3.2
idna 3.4
imageio 2.30.0
importlib-metadata 6.6.0
importlib-resources 5.12.0
invisible-watermark 0.1.5
iopath 0.1.9
Jinja2 3.1.2
joblib 1.2.0
json-tricks 3.17.0
jsonschema 4.17.3
keras 2.11.0
kiwisolver 1.4.4
kornia 0.6.4
lazy_loader 0.2
libclang 16.0.0
linkify-it-py 2.0.2
lit 16.0.5
llvmlite 0.40.0
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
more-itertools 9.1.0
mpmath 1.3.0
multidict 6.0.4
mup 1.0.0
mypy-extensions 1.0.0
networkx 3.1
nltk 3.8.1
numba 0.57.0
numpy 1.23.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
omegaconf 2.3.0
onnx 1.12.0
onnxruntime 1.15.0
openai 0.27.7
openai-whisper 20230314
opencv-python 4.7.0.72
opt-einsum 3.3.0
orjson 3.8.14
packaging 23.1
pandas 2.0.2
pathspec 0.11.1
Pillow 9.5.0
pip 23.0.1
pkgutil_resolve_name 1.3.10
portalocker 2.7.0
protobuf 3.19.6
psutil 5.9.5
pyarrow 12.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycocotools 2.0.4
pydantic 1.10.8
pydot 1.4.2
pydub 0.25.1
Pygments 2.15.1
pyparsing 3.0.9
pyquaternion 0.9.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyWavelets 1.4.1
PyYAML 6.0
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
scann 1.2.9
scikit-image 0.20.0
scikit-learn 1.2.2
scipy 1.9.1
seaborn 0.12.2
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 67.8.0
shapely 2.0.1
six 1.16.0
sniffio 1.3.0
starlette 0.27.0
sympy 1.12
tabulate 0.9.0
tenacity 8.2.2
tensorboard 2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.11.1
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.32.0
termcolor 2.3.0
threadpoolctl 3.1.0
tifffile 2023.4.12
tiktoken 0.3.3
timm 0.4.12
tokenizers 0.12.1
toml 0.10.2
toolz 0.12.0
torch 2.0.1
torchmetrics 0.6.0
torchvision 0.15.2
tqdm 4.65.0
transformers 4.19.2
triton 2.0.0
typing 3.7.4.3
typing_extensions 4.6.2
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 1.26.16
uvicorn 0.22.0
vision-datasets 0.2.2
wcwidth 0.2.6
websockets 11.0.3
Werkzeug 2.3.4
wheel 0.38.4
wrapt 1.15.0
yacs 0.1.8
yarl 1.9.2
zipp 3.15.0

Panoptic segmentation failing

In the SEEM huggingface space selecting the panoptic segmentation with just the image is not working, with every image I'm getting an error on the output, could anyone help me with that?

Help: how to extract feature given a mask/region of an image?

Hi, I'm wondering how to extract feature given a ground truth mask/region of an image using SEEM.

Thanks

Question about the cost function

Dear author,

Thanks so much for your contribution and the inference code.

I missed the explanation of your loss function in the paper, could you please briefly describe it? Is it based on the linear combination between the focal and dice loss, like the one in Maskformer?

Kind Regards,
Yuyuan

taijuanle

as the title.

When interactivate mode is stroke, RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'

When interactivate mode is stroke, RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'
The error may locate in "demo_code/tasks/interactive.py mask_ori = (F.interpolate(mask_ori, (height, width), mode='bilinear', align_corners=True) > 0)"
return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)

Some question about the paper

Thank you for the great work ! I am now having some troubles understanding the paper sec.3 subsection "Compositional", what is function in eq.5 "Match" referring to? Are there papers I can turn to?

can't download the released model

I couldn't download the released mode.
How can I get them?

SEEM Focal-L and X-Decoder Focal-L checkpoints.

bellow is error message.

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>PublicAccessNotPermitted</Code>
<Message>Public access is not permitted on this storage account. RequestId:5c2d36f8-601e-0012-2ca8-93d64b000000 Time:2023-05-31T10:11:10.4213753Z</Message>
</Error>

Finetuning code (training code)

Hey folks! Super cool stuff. Looking forward to seeing some code for finetuning! :)

Batch Inference

How can we do batch inference with text queries?

ux-decoder / segment-everything-everywhere-all-at-once Goto Github PK

segment-everything-everywhere-all-at-once's People

Contributors

Stargazers

Watchers

Forkers

segment-everything-everywhere-all-at-once's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs