ux-decoder / semantic-sam Goto Github PK

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 94.34% Shell 0.06% C++ 0.56% Cuda 5.04%

semantic-sam's Introduction

Semantic-SAM: Segment and Recognize Anything at Any Granularity

In this work, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it.

🍇 [Read our arXiv Paper]

🍎 [Try Auto Generation with Controllable Granularity Demo] 🍎 [Try Interactive Multi-Granularity Demo]

🚀 Features

🔥 Reproduce SAM. SAM training is a sub-task of ours. We have released the training code to reproduce SAM training.

🔥 Beyond SAM. Our newly proposed model offers the following attributes from instance to part level:

Granularity Abundance. Our model can produce all possible segmentation granularities for a user click with high quality, which enables more controllable and user-friendly interactive segmentation.
Semantic Awareness. We jointly train SA-1B with semantically labeled datasets to learn the semantics at both object-level and part-level.
High Quality. We base on the DETR-based model to implement both generic and interactive segmentation, and validate that SA-1B helps generic and part segmentation. The mask quality of multi-granularity is high.

🚀 News

🔥 We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

🔥 We release the demo code for controllable mask auto-generation with different granularity prompts!

Segment everything for one image. We output controllable granularity masks from semantic, instance to part level when using different granularity prompts.

🔥 We release the demo code for mask auto-generation!

Segment everything for one image. We output more masks with more granularity.

🔥 We release the demo code for interactive segmentation! One click to output up to 6 granularity masks. Try it in our demo!

🔥 We release the training and inference code and checkpoints (SwinT, SwinL) trained on SA-1B!

🔥 We release the training code to reproduce SAM!

Our model supports a wide range of segmentation tasks and their related applications, including:

Generic Segmentation
Part Segmentation
Interactive Multi-Granularity Segmentation with Semantics
Multi-Granularity Image Editing

👉: Related projects:

Mask DINO: We build upon Mask DINO which is a unified detection and segmentation model to implement our model.
OpenSeeD: Strong open-set segmentation methods based on Mask DINO. We base on it to implement our open-vocabulary segmentation.
SEEM: Segment using a wide range of user prompts.
VLPart: Going denser with open-vocabulary part segmentation.

🦄 Getting Started

🛠️ Installation

pip3 install torch==1.13.1 torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu113
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
pip install git+https://github.com/cocodataset/panopticapi.git
git clone https://github.com/UX-Decoder/Semantic-SAM
cd Semantic-SAM
python -m pip install -r requirements.txt

export DATASET=/pth/to/dataset  # path to your coco data

⭐ A few lines to get generated results

First download a checkpoint from model zoo.

For interactive multi-granularity segmentation

from semantic_sam import prepare_image, plot_multi_results, build_semantic_sam, SemanticSAMPredictor
original_image, input_image = prepare_image(image_pth='examples/dog.jpg')  # change the image path to your image
mask_generator = SemanticSAMPredictor(build_semantic_sam(model_type='<model_type>', ckpt='</your/ckpt/path>')) # model_type: 'L' / 'T', depends on your checkpint
iou_sort_masks, area_sort_masks = mask_generator.predict_masks(original_image, input_image, point='<your prompts>') # input point [[w, h]] relative location, i.e, [[0.5, 0.5]] is the center of the image
plot_multi_results(iou_sort_masks, area_sort_masks, original_image, save_path='../vis/')  # results and original images will be saved at save_path

For mask auto generation

from semantic_sam import prepare_image, plot_results, build_semantic_sam, SemanticSamAutomaticMaskGenerator
original_image, input_image = prepare_image(image_pth='examples/dog.jpg')  # change the image path to your image
mask_generator = SemanticSamAutomaticMaskGenerator(build_semantic_sam(model_type='<model_type>', ckpt='</your/ckpt/path>')) # model_type: 'L' / 'T', depends on your checkpint
masks = mask_generator.generate(input_image)
plot_results(masks, original_image, save_path='../vis/')  # results and original images will be saved at save_path

Advanced usage:

Level is set to [1,2,3,4,5,6] to use all six prompts by default
You can change the input prompt for controllable mask auto-generation to get the granularity results you want. An example is shown in here
Here are some examples of mask_generator for generating different granularity results

mask_generator = SemanticSamAutomaticMaskGenerator(semantic_sam, level=[1]) # [1] and [2] for semantic level.
mask_generator = SemanticSamAutomaticMaskGenerator(semantic_sam, level=[3]) # [3] for instance level.
mask_generator = SemanticSamAutomaticMaskGenerator(semantic_sam, level=[6]) # [4], [5], [6] for different part level.

🕌 Data preparation

Please refer to prepare SA-1B data. Let us know if you need more instructions about it.

🌋 Model Zoo

The currently released checkpoints are only trained with SA-1B data.

Name	Training Dataset	Backbone	1-IoU@Multi-Granularity	1-IoU@COCO(Max\|Oracle)	download
Semantic-SAM \| config	SA-1B	SwinT	88.1	54.5\|73.8	model
Semantic-SAM \| config	SA-1B	SwinL	89.0	55.1\|74.1	model

▶️ Demo

For interactive segmentation.

python demo.py --ckpt /your/ckpt/path

For mask auto-generation.

python demo_auto_generation.py --ckpt /your/ckpt/path

🌻 Evaluation

We do zero-shot evaluation on COCO val2017. $n is the number of gpus you use

For SwinL backbone

python train_net.py --eval_only --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  MODEL.WEIGHTS=/path/to/weights

For SwinT backbone

python train_net.py --eval_only --resume --num-gpus $n --config-file configs/semantic_sam_only_sa-1b_swinT.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  MODEL.WEIGHTS=/path/to/weights

⭐ Training

We currently release the code of training on SA-1B only. Complete training with semantics will be released later. $n is the number of gpus you use before running the training code, you need to specify your training data of SA-1B.

export SAM_DATASETS=/pth/to/dataset
export SAM_SUBSET_START=$start
export SAM_SUBSET_END=$end

We convert SA-1B data into 100 tsv files. start(int, 0-99) is the start of your SA-1B data index and end(int, 0-99) is the end of your data index. If you are not using the tsv data formats, you can refer to this json registration for SAM for a reference.

For SwinL backbone

python train_net.py --resume --num-gpus $n  --config-file configs/semantic_sam_only_sa-1b_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  SAM.TEST.BATCH_SIZE_TOTAL=$n  SAM.TRAIN.BATCH_SIZE_TOTAL=$n

For SwinT backbone

python train_net.py --resume --num-gpus $n  --config-file configs/semantic_sam_only_sa-1b_swinT.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  SAM.TEST.BATCH_SIZE_TOTAL=$n  SAM.TRAIN.BATCH_SIZE_TOTAL=$n
**We also support training to reproduce SAM**
```shell
python train_net.py --resume --num-gpus $n  --config-file configs/semantic_sam_reproduce_sam_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  SAM.TEST.BATCH_SIZE_TOTAL=$n  SAM.TRAIN.BATCH_SIZE_TOTAL=$n

This is a swinL backbone. The only difference of this script is to use many-to-one matching and 3 prompts as in SAM.

👀 Comparison with SAM and SA-1B Ground-truth

(a)(b) are the output masks of our model and SAM, respectively. The red points on the left-most image of each row are the user clicks. (c) shows the GT masks that contain the user clicks. The outputs of our model have been processed to remove duplicates.

🌳 Learned prompt semantics

We visualize the prediction of each content prompt embedding of points with a fixed order for our model. We find all the output masks are from small to large. This indicates each prompt embedding represents a semantic level. The red point in the first column is the click.

🦕 Method

🎖️ Experiments

We also show that jointly training SA-1B interactive segmentation and generic segmentation can improve the generic segmentation performance.

We also outperform SAM on both mask quality and granularity completeness, please refer to our paper for more experimental details.

📑 Todo list

Release demo
Release code and checkpoints trained on SA-1B
Release demo with semantics
Release code and checkpoints trained on SA-1B and semantically-labeled datasets

♥️ Acknowledgement

Our model is related to Mask DINO and OpenSeeD. We also thank Segment Anything for the SA-1B data.

✒️ Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@article{li2023semantic,
  title={Semantic-SAM: Segment and Recognize Anything at Any Granularity},
  author={Li, Feng and Zhang, Hao and Sun, Peize and Zou, Xueyan and Liu, Shilong and Yang, Jianwei and Li, Chunyuan and Zhang, Lei and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2307.04767},
  year={2023}
}
}

semantic-sam's People

Contributors

Stargazers

Watchers

Forkers

apollohuang1 haorand techthiyanes pei-eng mrcodechef f901107 babyblue26 hhaandroid scottflybird tangohu17 orochippw fujianhai zijunwei darren-pfchen maureenzou zuqiutxy liuwenhaha xcc313 fengxing666 hobbymarks jxw-tmp fengli-ust cvjie xixiyahaha iwillcodeu williamzhangzhe keyman9848 hhy5277 rockystevejobs paperwave ailabteam cindycxy lidi100 liangdashuang 1ssb bothell4everyoung cv-seg hitori940101 hupanice samcluster hjsybyq creatorcao gogopen saulocatharino universewill defe41251135 dylanhu7 heisenberg-chef 2132660698 jamirando liyunfei0411 goldfishfive djene-mengistu paul007008 zlin-monarch wangyujie615 zhang-tao-whu xushilin1 aylin1 heybred johnryan465 robin697 meigaoms tuhinmallick clxie jags111 hailin-shi aredwine3 liren2515 qycools lhztop echosun1996 bobby20180331 ronghanghu nuaasxr gorluxor jackzhousz yanlingz archdc tecworks-dev zhiwuz abinthomasonline off0set iloncka-ds amariyap qjuse-zh water-cookie polaris-control yi-ming-qian xxs90 lihe8811 dianzizs diridiri dorucioclea senliontec senlion-internal ronaldzgithub cqray1990 bowmanchow whuhxb

semantic-sam's Issues

there is void between mask and mask ，why？

why？

predict iou loss is different from sam?

https://github.com/UX-Decoder/Semantic-SAM/blob/main/semantic_sam/modules/criterion_interactive_many_to_one.py#L380C9-L380C35
target_iou = 1 - dice_loss
after this code ,the target iou is dice not iou.

How to define the semantic class set and get the semantics of the mask

Hi, thanks for your work.
I think Semantic-SAM can define a class set and output the semantic segmentation result since it has the open-set ability. Then where should I define the class set and how to get the semantics of the mask?

I tried the following code and there is no semantics of the mask.

from semantic_sam import prepare_image, plot_results, build_semantic_sam, SemanticSamAutomaticMaskGenerator
# original_image, input_image = prepare_image(image_pth='examples/dog.jpg')  # change the image path to your image
mask_generator = SemanticSamAutomaticMaskGenerator(build_semantic_sam(model_type='T', ckpt='pretrained/swint_only_sam_many2many.pth'), level=[3]) # model_type: 'L' / 'T', depends on your checkpint
masks = mask_generator.generate(input_image)
print(masks[0].keys())

The keys of the masks are

dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

How to convert sam json to tsv data formats?

How to convert sam json to tsv data formats?
May the "data prepare" be more detailed

Multi-level Semantics and Part Segmentation

How to run Multi-level Semantics or Part Segmentation demo or code for multiple objects in one image?

How to get the class label in auto mask generate process?

The pretrained model is only trained on SAM 1B dataset. So it cannot output the segmeantion mask with class lablel right?

model and ckpt don't match

Thanks for your great work!
Why there are some weights don't match?

*UNLOADED* norm0.bias, Model Shape: torch.Size([192])
*UNLOADED* norm0.weight, Model Shape: torch.Size([192])
*UNLOADED* norm1.bias, Model Shape: torch.Size([384])
*UNLOADED* norm1.weight, Model Shape: torch.Size([384])
*UNLOADED* norm2.bias, Model Shape: torch.Size([768])
*UNLOADED* norm2.weight, Model Shape: torch.Size([768])
*UNLOADED* norm3.bias, Model Shape: torch.Size([1536])
*UNLOADED* norm3.weight, Model Shape: torch.Size([1536])
$UNUSED$ head.bias, Ckpt Shape: torch.Size([21841])
$UNUSED$ head.weight, Ckpt Shape: torch.Size([21841, 1536])
$UNUSED$ layers.0.blocks.1.attn_mask, Ckpt Shape: torch.Size([64, 144, 144])
$UNUSED$ layers.1.blocks.1.attn_mask, Ckpt Shape: torch.Size([16, 144, 144])
$UNUSED$ layers.2.blocks.1.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.11.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.13.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.15.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.17.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.3.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.5.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.7.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ layers.2.blocks.9.attn_mask, Ckpt Shape: torch.Size([4, 144, 144])
$UNUSED$ norm.bias, Ckpt Shape: torch.Size([1536])
$UNUSED$ norm.weight, Ckpt Shape: torch.Size([1536])

segmentation result imperfect

Hi, great work! I want to ask why this mask prediction using 'All Prompt' isn't perfect. (There should be a mask in my circle, but there isn't.)

Can you release an inference script?

Can you release a script that just takes 1 image or a folder of images and segments? With command line arguments for all the inference hyperparameters?

How to convert dataset format

How to convert own dataset to the same format as SA 1B data? Each picture is to a JSON file

cuda:1运行问题

请问，为什么我把demo.py中的cuda设为cuda1，以及task文件下interactive_infer_image也设为cuda1，我的结果却是错误的，
，我想要将程序迁移至cuda1上面请问该如何做呢

semantic label function

求问各位大佬

请问怎么获得每个mask的语义标签啊，我在config files中找到了 SEMANTIC_ON: True 这个参数，是设置成True就可以在output中找到嘛？

数据集下载完之后的目录结构应该是怎样的？

是需要单独运行一下semantic_sam/body/encoder/ops/make.sh这个程序吗

关于带语义模型提供的计划

感谢作者提供这样优秀的工作，请问带语义的训练模型和demo什么时候也能给出，向大佬学习

AssertionError: Dataset 'sam_train' is empty!

How to register sam？

input image size

Are we enforced to resize the input image 1024x1024 at inference?

Is there a way to inference at higher resolutions without loosing details in preprocessing?

Inquiry about the 'prompt_switch' function in the code

I have been going through the code in the repository, and I came across the 'prompt_switch' function. I am trying to understand how this function determines the switching between different levels of granularity. It seems that there is no specific handling for different granularity in the training code. Could you please shed some light on how it works and how it handles the switching between various levels of granularity?
Thank you in advance for your assistance!

Training questions

Hello, thank you for your work !

I have some technical questions regarding your training of Semantic-SAM. On how many GPUs did you perform the training and which type of GPU (A100, V100, others) ?
Did you use some advanced parallelism method such as ZeRO or FSDP ? Or did you only use the classical DataDistributedParallel ?

Next to that, did you implement an Automatic Mask Generator script like original work or not at all ?

Thank you in advance for your answers !

The generated masks are overlapped

Hello,

It seems that the generated masks are sometimes overlapped with each other even with a single granularity prompt (e.g. [3]). I just need non-overlapped masks. What should I do to realize this?

Thanks

训练的时候加载的是什么模型呀？

python train_net.py --resume --num-gpus $n  --config-file configs/semantic_sam_only_sa-1b_swinL.yaml COCO.TEST.BATCH_SIZE_TOTAL=$n  SAM.TEST.BATCH_SIZE_TOTAL=$n  SAM.TRAIN.BATCH_SIZE_TOTAL=$n MODEL.WEIGHTS=/path/to/weights

MODEL.WEIGHTS 可以不传吗？

现在的训练代码可以train from scratch的复现1-IoU@COCO(Max|Oracle)54.5|73.8的指标吗？我看代码好像只用了0-100的sa1b的数据，是不是没用全？

The link of "[Try Auto Generation Demo]" seems fail again

Running inference on a single image?

Is there a way to run inference on a single image where a coordinate is inputted and the model will segment the object at that coordinate? Sort of similar to how SAM allows you to segment with clicking?

Code question

Hi I want to ask why after I click on the image in gradio, the inference function isn't called?

我可能发现一处代码错误

task/interactive_predictor.py 的30行

    def predict(self, image_ori, image, point=None):
        """
        produce up to 6 prediction results for each click
        """
        width = image_ori.shape[0]  # 这里应该为image_ori.shape[1]
        height = image_ori.shape[1] # 这里应该为image_ori.shape[0]

只影响可视化，画点的地方出错了，提了MR

How did you combine generic segmentation and prompt based segmentation?

From the code, I understand you convert the point/bbox prompt embeddings as query embeddings to perform interactive mask decoder training. But where is the generic segmentation branch in this implementation?

Should I change the config file and select a different decoder or head to train another model? That is, prompt-based segmentation and generic segmentation are not in the same model?

seems the demo link is dead?

not 100% sure but seems the demo stopped working since days ago? Could anyone visit and try it now? Thanks.

The link of "[Try Auto Generation Demo]" is failed

Does your code support CPU inference?

I have been trying to clone and install you're repo without CUDA availability but I am stuck. In case you managed to run it on CPU, which python version are you using?

How to obtain the embedding of an image？

I would like to ask how to obtain the image embedding of semantic-sam?

why pytorch2.0 can‘t get the right results？

I had install the pytorch 1.3 by folowing the install introduction， however I can't load the MultiScaleDeformableAttention module.
I tried to install pytorch 2.0 with conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia, and I can load the MultiScaleDeformableAttention module. However, I got the new error, and I can't get the masks. Can you help me to check it ? Thanks!
D:\Users\Administrator\anaconda3\envs\yolov8-streamlit5\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
UNLOADED backbone.layers.0.blocks.0.attn.proj.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.0.attn.proj.weight, Model Shape: torch.Size([192, 192])
UNLOADED backbone.layers.0.blocks.0.attn.qkv.bias, Model Shape: torch.Size([576])
UNLOADED backbone.layers.0.blocks.0.attn.qkv.weight, Model Shape: torch.Size([576, 192])
UNLOADED backbone.layers.0.blocks.0.attn.relative_position_bias_table, Model Shape: torch.Size([529, 6])
UNLOADED backbone.layers.0.blocks.0.attn.relative_position_index, Model Shape: torch.Size([144, 144])
UNLOADED backbone.layers.0.blocks.0.mlp.fc1.bias, Model Shape: torch.Size([768])
UNLOADED backbone.layers.0.blocks.0.mlp.fc1.weight, Model Shape: torch.Size([768, 192])
UNLOADED backbone.layers.0.blocks.0.mlp.fc2.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.0.mlp.fc2.weight, Model Shape: torch.Size([192, 768])
UNLOADED backbone.layers.0.blocks.0.norm1.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.0.norm1.weight, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.0.norm2.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.0.norm2.weight, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.1.attn.proj.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.1.attn.proj.weight, Model Shape: torch.Size([192, 192])
UNLOADED backbone.layers.0.blocks.1.attn.qkv.bias, Model Shape: torch.Size([576])
UNLOADED backbone.layers.0.blocks.1.attn.qkv.weight, Model Shape: torch.Size([576, 192])
UNLOADED backbone.layers.0.blocks.1.attn.relative_position_bias_table, Model Shape: torch.Size([529, 6])
UNLOADED backbone.layers.0.blocks.1.attn.relative_position_index, Model Shape: torch.Size([144, 144])
UNLOADED backbone.layers.0.blocks.1.mlp.fc1.bias, Model Shape: torch.Size([768])
UNLOADED backbone.layers.0.blocks.1.mlp.fc1.weight, Model Shape: torch.Size([768, 192])
UNLOADED backbone.layers.0.blocks.1.mlp.fc2.bias, Model Shape: torch.Size([192])
UNLOADED backbone.layers.0.blocks.1.mlp.fc2.weight, Model Shape: torch.Size([192, 768])
UNLOADED backbone.layers.0.blocks.1.norm1.bias, Model Shape: torch.Size([192])
......
masks = mask_generator.generate(input_image)
mask=[]

Are Granularity Prompt Level 2 and Level 3 swapped?

Dear authors, thanks for the great paper to introduce more granularity from SAM.

Btw, I have an issue where:

the granularity level 2 that is supposed to be "Semantic Level" seems to be doing "Instance Level"/Level 3.
the granularity level 3 that is supposed to be "Instance Level" seems to be doing "Semantic Level"/Level 2.

Here is the script to generate the output:

import os
import torch
from semantic_sam import prepare_image, plot_results, build_semantic_sam, SemanticSamAutomaticMaskGenerator
original_image, input_image = prepare_image(image_pth='examples/dog.jpg')  # change the image path to your image
# for level 2
# mask_generator = SemanticSamAutomaticMaskGenerator(build_semantic_sam(model_type='T', ckpt='swint_only_sam_many2many.pth'), level=[2] ) 
# for level 3
mask_generator = SemanticSamAutomaticMaskGenerator(build_semantic_sam(model_type='T', ckpt='swint_only_sam_many2many.pth'), level=[3] ) 

outputs = mask_generator.generate(input_image)

save_path='vis/'
os.makedirs(save_path, exist_ok=True)
plot_results(outputs, original_image, save_path=save_path)

Level-2	Level-3

is the granularity swapped? or is the ordering as in the paper has some issue? or is my test flawed? thanks in advance~

demo运行问题

/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
$UNUSED$ criterion.empty_weight, Ckpt Shape: torch.Size([2])
/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/gradio/components/image.py:390: UserWarning: The style method is deprecated. Please set these arguments in the constructor instead.
warnings.warn(
/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/gradio/components/gallery.py:205: UserWarning: The style method is deprecated. Please set these arguments in the constructor instead.
warnings.warn(
/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/gradio/components/gallery.py:209: UserWarning: The 'grid' parameter will be deprecated. Please use 'grid_cols' in the constructor instead.
warnings.warn(
sys:1: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 20.0}
sys:1: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 1.0}
sys:1: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 2.0}
sys:1: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 6.0}
sys:1: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 9.0}
Running on local URL: http://127.0.0.1:6082
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/fastapi/applications.py", line 289, in call
await super().call(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/fastapi/routing.py", line 273, in app
raw_response = await run_endpoint_function(
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/fastapi/routing.py", line 192, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/gradio/routes.py", line 289, in api_info
return gradio.blocks.get_api_info(config, serialize) # type: ignore
File "/home/user2/miniconda3/envs/semanticsam/lib/python3.8/site-packages/gradio/blocks.py", line 518, in get_api_info
serializer = serializing.COMPONENT_MAPPINGtype
KeyError: 'dataset'
出现了这个问题，会弹出Gradio,但是上传图片点击run之后，没有反应

No module named 'MultiScaleDeformableAttention'

When I try to run any demo or eval python, I come across such problem：

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

Please compile MultiScaleDeformableAttention CUDA op with the following commands:
cd mask2former/modeling/pixel_decoder/ops
sh make.sh

BUT there is no path mask2former/modeling/pixel_decoder/ops.
How can I fix it ?

If you can provide a lightweight Semantic-SAM, like MobileSAM , it will be very nice work.

About SAM's 11 rounds of training

This is really a nice work, which has a pretty cool performance and wonderful generalization ability.
However, I am curious about the training code of reproducing SAM's performance.

Does this repository contain all training strategies of SAM? i.e. 11-Rounds of training, handling ambiguity ...
Looking forward to get a reply : ) thanks!

Issue when running Semantic-SAM with box input

Hi, thank you for releasing the code and checkpoint!

I was trying to run Semantic-SAM with bounding box input (instead of point click) but the result on the dog example seems to be wrong (see the attacked images).

Based on the paper, the click and box prompts should in the same (x, y, w, h) format with point clicks having a small w and h, which is what I did below. Are there anything I'm missing when running Semantic-SAM with box input or is the code for that not released yet?

Thank you in advance!

import torch
from semantic_sam import prepare_image, plot_multi_results, build_semantic_sam, SemanticSAMPredictor
def box_xyxy_to_cxcywh(x):
    x0, y0, x1, y1 = x.unbind(-1)
    b = [(x0 + x1) / 2, (y0 + y1) / 2,
         (x1 - x0), (y1 - y0)]
    return torch.stack(b, dim=-1)

# Resize image s.t. smaller edge is 640
# original_image is np.ndarray HWC, input_image is torch.Tensor CHW
original_image, input_image = prepare_image(image_pth=f'{REPO_PATH}/examples/dog.jpg')  # change the image path to your image
height, width = original_image.shape[:2]
mask_generator = SemanticSAMPredictor(build_semantic_sam(
    model_type='L',
    ckpt=f'{REPO_PATH}/models/swinl_only_sam_many2many.pth')) # model_type: 'L' / 'T', depends on your checkpint
bbox_xyxy = [350.0, 100, 650, 600]
iou_sort_masks, area_sort_masks = mask_generator.predict_masks(
    original_image, input_image,
    point=box_xyxy_to_cxcywh(torch.tensor([bbox_xyxy])) / torch.tensor([width, height, width, height]),  # [xywh]
) # input point [[w, h]] relative location, i.e, [[0.5, 0.5]] is the center of the image
plot_multi_results(iou_sort_masks, area_sort_masks, original_image, save_path='./vis/')  # results and original images will be saved at save_path

import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
ax.imshow(original_image)
ax.set_axis_off()
x_min, y_min, x_max, y_max = bbox_xyxy
ax.plot([x_min, x_max, x_max, x_min, x_min],
        [y_min, y_min, y_max, y_max, y_min], "r")

When will the code and checkpoints trained on SA-1B and semantically-labeled datasets be released?

Running the auto generation demo taking too long

I'm trying to run the auto generation demo on a T4 GPU (16 virtual memory), but it is taking too long (more than one hour), is that a normal behavior? How much GPU did you use for inference?

Thanks in advance!

how to finetune the model use box input？

I had install and run the Semantic-SAM on windows 11, it had ocurrs the error, can you help me to check it? Thanks a lot .

usage: SemanticSAM Demo [--conf_files FILE] [--ckpt FILE]
SemanticSAM Demo: error: unrecognized arguments: -f C:\Users\Administrator\AppData\Roaming\jupyter\runtime\kernel-29c3123a-4d5a-4c00-8855-f032f2577c65.json
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2
D:\Users\Administrator\anaconda3\envs\yolov8-streamlit5\lib\site-packages\IPython\core\interactiveshell.py:3516: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

The WEBUI is not working anymore

I tried to use your webui but it is not working.

Is it something temporal or was it shut-down?

How can I input text and output mask like GroundedSAM

What does reproduce SAM means?

SAM uses ViT backbone, however you are using Swin. Hence, I am not quite sure what you mean by reproduce SAM here. Can you clarify? Have you actually reproduced the SAM released by Meta such that their performance and backbone are the same?

sam 社群群聊

AssertionError: now only support one image training for interactive segmentation

why only batch=1 per gpu?
The error occur when compute loss。

train sam loss isn't focal loss.

As the code show, the training loss is ce loss not focal loss
losses = { "loss_mask_bce_0": sigmoid_ce_loss_jit(point_logits, point_labels, num_masks), "loss_mask_dice_0": dice_loss_jit(point_logits, point_labels, num_masks), }

Installation doesn't work

The installation stalls with:

INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce │
runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened he│
re: https://pip.pypa.io/surveys/backtracking │
Using cached importlib_resources-5.7.1-py3-none-any.whl (28 kB) │
Using cached importlib_resources-5.7.0-py3-none-any.whl (28 kB) │
Using cached importlib_resources-5.6.0-py3-none-any.whl (28 kB) │
Using cached importlib_resources-5.4.0-py3-none-any.whl (28 kB) │
Using cached importlib_resources-5.3.0-py3-none-any.whl (28 kB)

Do you have any idea why? Can you check which version of importlib_resources your code uses?

Release of checkpoints trained on part-data

Hi,

Is there currently a plan to release checkpoints trained on SA-1B + object level & part level datasets?

Thanks a lot!

Unable to achieve multi-point prompt interaction

Thank you very much for providing such an excellent work, but it seems that this work cannot achieve multi-point prompts like SAM, and there is no background point prompt?

When will the code and demo with semantics be released?

If I want to know the categories of segmentation (semantic, instance, part), can this code do it? If so, what should I do to achieve this goal?