GithubHelp home page GithubHelp logo

ux-decoder / segment-everything-everywhere-all-at-once Goto Github PK

View Code? Open in Web Editor NEW
4.1K 4.1K 326.0 303.91 MB

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

License: Apache License 2.0

Python 94.88% C++ 0.50% Cuda 4.55% Shell 0.07%

segment-everything-everywhere-all-at-once's People

Contributors

eltociear avatar fengli-ust avatar haozhang534 avatar jwyang avatar linjieli222 avatar maureenzou avatar simonebonato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segment-everything-everywhere-all-at-once's Issues

code for: Referring image to mask

hello
Is this repo is only for a demo? or you plan to release the code,at least for the feature of: Referring image to mask
thanks

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Checkpoint for Gradio Demo

Thanks for open-sourcing your great work!

I have tried your Gradio demo and it works very well. I wonder whether your released checkpoint, i.e., 'seem_focalt_v1.pt`, is the same one that was used in the demo. Thanks!

RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'

I encountered an error, when using the sample, task is vedio. How can I solve this problem?

Traceback (most recent call last):
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "app.py", line 65, in inference
return interactive_infer_video(model, audio, image, task, *args, **kwargs)
File "/data2/nkm/Segment-Everything-Everywhere-All-At-Once-main/demo_code/tasks/interactive.py", line 226, in interactive_infer_video
refimg_mask = (F.interpolate(refimg_mask, (_height, _width), mode='bilinear', align_corners=True) > 0)
File "/data1/anaconda3/envs/nkm2/lib/python3.8/site-packages/torch/nn/functional.py", line 3731, in interpolate
return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: "upsample_bilinear2d_channels_last" not implemented for 'Byte'

Demo code installation

Using windows WSL and trying to run the demo code.

I get an error while installing the requirements. For some reason it starts with the github repositories linked in the requirements.txt.

While installing from https://github.com/MaureenZOU/detectron2-xyz.git you get an error saying ModuleNotFoundError: No module named 'torch'.

By installing manually the torch module the error is solved but I don´t think this is ideal.

Another issue comes from the run_demo.sh, at least in my case it is not recognizing the sudo apt commands making it fail. Again if I do the commands manually it works.

Questions about water surfaces and mirrors

Thanks for your great work! I noticed that your demo can identify water surfaces and mirrors well without misclassifying them due to reflections. Is there any specific design for this?

Code release and some typos in the paper

Hi authors,

Thanks for the amazing work!
I really like this paper.
I'd like to know when do you plan to open source the code and checkpoint?

There are some typos in this paper. (Will update if I found more)

  1. line 3 of Fig 3 caption: "object".
  2. First paragraph of Section 3. The red mark of the three types of prompts are not consistent. Also, they are not reflected in Eq. (3), but in Eq. (1).

Visual referring data process

hi,
Thank you for your work!
I am very interested in the part of Visual referring. There is little information about the data processing for training in the paper, so could you please share some key points of visual referring data processing for training? Thanks!

Question about training with non-prompt, visual-prompt, and text-prompt

Thanks for sharing the cool results! I have one more detailed question.
When you train no-prompt, text-prompt, and visual prompts, how do you train them all together?
Does it mean while training, for every batch, you randomly pick 1 task from 3 tasks (no-prompt, visual prompt, text-prompt)?
It is not clear how you can train them together when the given prompt not aligned with each other, for example, given text-prompt (e.g., caption of the whole image describing all the objects, say 10) but the visual-prompt only has 1 or 2 points represent 1 or 2 objects.

Thank you!

Demo effection

In some scenarios, the effect may not seem very good...

Code release

This is a great job. Estimated when will the code be released? Thanks for your contribution to the community.

run error

when I run the app.py script, the error is as follows below, and how I can run successfully, Thank you.

Traceback (most recent call last):
  File "D:\Projects\Segment-Everything-Everywhere-All-At-Once\demo_code\app.py", line 18, in <module>
    import whisper
  File "D:\Applications\Anaconda3\envs\seem\lib\site-packages\whisper.py", line 69, in <module>
    libc = ctypes.CDLL(libc_name)
  File "D:\Applications\Anaconda3\envs\seem\lib\ctypes\__init__.py", line 364, in __init__
    if '/' in name or '\\' in name:
TypeError: argument of type 'NoneType' is not iterable

One or All prompts?

May I know whether a model corresponds to a specific prompt or whether a model can correspond to all the prompts you have listed at the same time (that means you use massive of all of these prompts in different fashions at the same time to train a single model)?

No such file or directory:Segment-Everything-Everywhere-All-At-Once-main/demo_code/null

I have a problem with this, can you explain it to me?

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/starlette/responses.py", line 335, in call
stat_result = await anyio.to_thread.run_sync(os.stat, self.path)
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/nts1/miniconda3/envs/SEEM/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
FileNotFoundError: [Errno 2] No such file or directory: '/home/nts1/users/tungdd7/Segment-Everything-Everywhere-All-At-Once-main/demo_code/null'

This needs a license

Current licensing is ambiguous because the repo has no license. It makes it hard to adopt this work for further R&D, commercial applications, etc.

File Not Found: seem_focall_v1.pt

When I try to run app.py I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'seem_focall_v1.pt'

The link(https://projects4jw.blob.core.windows.net/x-decoder/release/seem_focall_v1.pt) seems to go to this page:
This XML file does not appear to have any style information associated with it. The document tree is shown below.

PublicAccessNotPermitted
Public access is not permitted on this storage account. RequestId:b7249e77-701e-0067-6579-968f11000000 Time:2023-06-04T00:17:46.1304179Z

Any assistance would be appreciated.

Differences between SEEM Focal-L and the Huggingface Demo model?

The demo outputs a warning:

The current model is run on SEEM Focal-L, for best performance refer to our demo.

And then the performance of the model seems to be worse than the SEEM demo at the Huggingface. In particular, I've noticed that the segmentation with referring text are worse, they "splash" onto the neighboring objects. What are the differences between the official demo on Huggingface and the published SEEM Focal-L checkpoint and config?

Losses used in training

Hi! Wonderful project you have here :)
just wondering if you can provide us with all the losses you use in the training step, since it is not mentioned in the paper.
thanks a lot!

Results on open-vocabulary panoptic segmentation

In your paper, you mention

"...strong performance on many segmentation tasks including closed-set and open-set
panoptic segmentation, ...

I cannot seem to find the section on open-set panoptic segmentation. Do you have results on this task?

Video demo with Referring Text

As far as my current understanding. Video demos only support draw on referring image.
Is there any limitation on it? can we use text or audio prompt?

Would you mind adding a link to SA3D?

Hi there,

This is Lingxi Xie, one of the co-authors of SA3D.

I read the paper and code and found the project quite interesting and inspiring!

I saw that you referred to SA3D and completed the segmentation on NeRF. If the mentioned SA3D in the repo was our project (sorry if it was not), would you mind adding a link to the text in the readme file so that readers can get the message?

Best,
Lingxi

about the testing results

image
Hello, author! nice work. Does your project provide the code to obtain the results mentioned in the paper?

How can I supress the "other" label

I would like to know how to supress the "other" label in the predicted video.
And how can I use my own coco datasets to replace the default one.
Thanks you!

AttributeError: module 'enum' has no attribute 'IntFlag'

当我在anaconda中创建好环境,并运行“pip install -r requirements.txt”后,出现了这一提示,请问如何解决呢?

When I created the environment in anaconda and ran "pip install -r requirements.txt", this prompt appeared. How to solve it?

ValueError: RGBA values should be within 0-1 range

Sometimes color values go beyond 1 .

File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/tasks/interactive.py", line 113, in interactive_infer_image
demo = visual.draw_panoptic_seg(pano_seg.cpu(), pano_seg_info) # rgb Image
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 543, in draw_panoptic_seg
self.overlay_instances(masks=masks, labels=labels, assigned_colors=colors, alpha=alpha)
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 745, in overlay_instances
self.draw_text(
File "/home/aadalarasan/coda/Segment-Everything-Everywhere-All-At-Once/demo_code/utils/visualizer.py", line 890, in draw_text
color = np.maximum(list(mplc.to_rgb(color)), 0.2)
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 496, in to_rgb
return to_rgba(c)[:3]
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 299, in to_rgba
rgba = _to_rgba_no_colorcycle(c, alpha)
File "/home/aadalarasan/.pyenv/versions/3.10.4/lib/python3.10/site-packages/matplotlib/colors.py", line 395, in _to_rgba_no_colorcycle
raise ValueError("RGBA values should be within 0-1 range")
ValueError: RGBA values should be within 0-1 range

Code Release

Thanks for the great work! The demo is very inspiring and we are excited about it's potential applications in downstream tasks, especially in robotics. Do you have an ETA on when the code would be released?

We would love to play with the thresholds (seems to have many false positives in the demo) to see if this is useful out of the box. Please let me know! Happy to reach out privately as well.

Thanks
Dhruv (UC Berkeley)

Always segments person when using text

Hello!

Thank you for releasing this great work! I'm trying out the demo on some fashion images and I primarily get the entire person in the segmentation when using text. I have provided some examples below. Thank you!

image

image

About Ref-COCO dataset overlapping.

I'd like to express my appreciation for your excellent work. it is both engaging and insightful. However, I have some confusion regarding the experiment detailed in Chapter 4.

In the paper, you mentioned using a combination of Ref-COCO, Ref-COCOg, and Ref-COCO+ for COCO image annotations in the referring segmentation task. Then, you report your evaluation on Ref-COCOg. While I find this approach interesting, I'm not quite sure what you mean by "combination." Additionally, I am concerned about the potential for data leakage since Ref-COCO, Ref-COCOg, and Ref-COCO+ are three types of annotations on the same image dataset, which might lead to overlap between the training and test sets of different annotations. Could you please provide further clarification on this experimental part? Thank you!

the code doesn't work for me

Following the guide, i manage to install all requirements in a brandy new conda env. I tried to run the zebra example (in which i am interested the most) got no segmentation results. I had tried other examples, but in no vain (no segmentation results at all).
image

the outputs from my terminal console seemed alright, no error message except some warnings. here is the package list i installed, would you be so kind to tell me the reason i failed:

absl-py 1.4.0
accelerate 0.19.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.1
antlr4-python3-runtime 4.9.3
anyio 3.7.0
appdirs 1.4.4
astunparse 1.6.3
async-timeout 4.0.2
attrs 23.1.0
black 21.4b2
cachetools 5.3.1
certifi 2023.5.7
charset-normalizer 3.1.0
cityscapesScripts 2.2.2
click 8.1.3
cloudpickle 2.2.1
cmake 3.26.3
coloredlogs 15.0.1
contourpy 1.0.7
cycler 0.11.0
detectron2 0.6
diffdist 0.1
diffusers 0.11.1
einops 0.6.1
exceptiongroup 1.1.1
fastapi 0.95.2
ffmpy 0.3.0
filelock 3.12.0
flatbuffers 23.5.26
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
ftfy 6.1.1
future 0.18.3
fvcore 0.1.5.post20221221
gast 0.4.0
google-auth 2.19.0
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
gradio 3.31.0
gradio_client 0.2.5
grpcio 1.54.2
h11 0.14.0
h5py 3.8.0
httpcore 0.17.2
httpx 0.24.1
huggingface-hub 0.14.1
humanfriendly 10.0
hydra-core 1.3.2
idna 3.4
imageio 2.30.0
importlib-metadata 6.6.0
importlib-resources 5.12.0
invisible-watermark 0.1.5
iopath 0.1.9
Jinja2 3.1.2
joblib 1.2.0
json-tricks 3.17.0
jsonschema 4.17.3
keras 2.11.0
kiwisolver 1.4.4
kornia 0.6.4
lazy_loader 0.2
libclang 16.0.0
linkify-it-py 2.0.2
lit 16.0.5
llvmlite 0.40.0
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
more-itertools 9.1.0
mpmath 1.3.0
multidict 6.0.4
mup 1.0.0
mypy-extensions 1.0.0
networkx 3.1
nltk 3.8.1
numba 0.57.0
numpy 1.23.5
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
omegaconf 2.3.0
onnx 1.12.0
onnxruntime 1.15.0
openai 0.27.7
openai-whisper 20230314
opencv-python 4.7.0.72
opt-einsum 3.3.0
orjson 3.8.14
packaging 23.1
pandas 2.0.2
pathspec 0.11.1
Pillow 9.5.0
pip 23.0.1
pkgutil_resolve_name 1.3.10
portalocker 2.7.0
protobuf 3.19.6
psutil 5.9.5
pyarrow 12.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycocotools 2.0.4
pydantic 1.10.8
pydot 1.4.2
pydub 0.25.1
Pygments 2.15.1
pyparsing 3.0.9
pyquaternion 0.9.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyWavelets 1.4.1
PyYAML 6.0
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
rsa 4.9
scann 1.2.9
scikit-image 0.20.0
scikit-learn 1.2.2
scipy 1.9.1
seaborn 0.12.2
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 67.8.0
shapely 2.0.1
six 1.16.0
sniffio 1.3.0
starlette 0.27.0
sympy 1.12
tabulate 0.9.0
tenacity 8.2.2
tensorboard 2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.11.1
tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.32.0
termcolor 2.3.0
threadpoolctl 3.1.0
tifffile 2023.4.12
tiktoken 0.3.3
timm 0.4.12
tokenizers 0.12.1
toml 0.10.2
toolz 0.12.0
torch 2.0.1
torchmetrics 0.6.0
torchvision 0.15.2
tqdm 4.65.0
transformers 4.19.2
triton 2.0.0
typing 3.7.4.3
typing_extensions 4.6.2
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 1.26.16
uvicorn 0.22.0
vision-datasets 0.2.2
wcwidth 0.2.6
websockets 11.0.3
Werkzeug 2.3.4
wheel 0.38.4
wrapt 1.15.0
yacs 0.1.8
yarl 1.9.2
zipp 3.15.0

Question about the cost function

Dear author,

Thanks so much for your contribution and the inference code.

I missed the explanation of your loss function in the paper, could you please briefly describe it? Is it based on the linear combination between the focal and dice loss, like the one in Maskformer?

Kind Regards,
Yuyuan

Some question about the paper

Thank you for the great work ! I am now having some troubles understanding the paper sec.3 subsection "Compositional", what is function in eq.5 "Match" referring to? Are there papers I can turn to?

can't download the released model

I couldn't download the released mode.
How can I get them?

SEEM Focal-L and X-Decoder Focal-L checkpoints.

bellow is error message.

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>PublicAccessNotPermitted</Code>
<Message>Public access is not permitted on this storage account. RequestId:5c2d36f8-601e-0012-2ca8-93d64b000000 Time:2023-05-31T10:11:10.4213753Z</Message>
</Error>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.