bowang-lab / medsam Goto Github PK

View Code? Open in Web Editor NEW

2.3K 17.0 304.0 61.49 MB

Segment Anything in Medical Images

Home Page: https://www.nature.com/articles/s41467-024-44824-z

License: Apache License 2.0

Python 13.27% Jupyter Notebook 86.64% Shell 0.09%

medsam's Introduction

MedSAM

This is the official repository for MedSAM: Segment Anything in Medical Images.

News

2024.01.15: Welcome to join CVPR 2024 Challenge: MedSAM on Laptop
2024.01.15: Release LiteMedSAM and 3D Slicer Plugin, 10x faster than MedSAM!

Installation

Create a virtual environment conda create -n medsam python=3.10 -y and activate it conda activate medsam
Install Pytorch 2.0
git clone https://github.com/bowang-lab/MedSAM
Enter the MedSAM folder cd MedSAM and run pip install -e .

Get Started

Download the model checkpoint and place it at e.g., work_dir/MedSAM/medsam_vit_b

We provide three ways to quickly test the model on your images

Command line

python MedSAM_Inference.py # segment the demo image

Segment other images with the following flags

-i input_img
-o output path
--box bounding box of the segmentation target

Jupyter-notebook

We provide a step-by-step tutorial on CoLab

You can also run it locally with tutorial_quickstart.ipynb.

Install PyQt5 with pip: pip install PyQt5 or conda: conda install -c anaconda pyqt

python gui.py

Load the image to the GUI and specify segmentation targets by drawing bounding boxes.

MedSAM-Demo.mp4

Model Training

Data preprocessing

Download SAM checkpoint and place it at work_dir/SAM/sam_vit_b_01ec64.pth .

Download the demo dataset and unzip it to data/FLARE22Train/.

This dataset contains 50 abdomen CT scans and each scan contains an annotation mask with 13 organs. The names of the organ label are available at MICCAI FLARE2022.

Run pre-processing

Install cc3d: pip install connected-components-3d

python pre_CT_MR.py

split dataset: 80% for training and 20% for testing
adjust CT scans to soft tissue window level (40) and width (400)
max-min normalization
resample image size to 1024x1024
save the pre-processed images and labels as npy files

Training on multiple GPUs (Recommend)

The model was trained on five A100 nodes and each node has four GPUs (80G) (20 A100 GPUs in total). Please use the slurm script to start the training process.

sbatch train_multi_gpus.sh

When the training process is done, please convert the checkpoint to SAM's format for convenient inference.

python utils/ckpt_convert.py # Please set the corresponding checkpoint path first

Training on one GPU

python train_one_gpu.py

If you only want to train the mask decoder, please check the tutorial on the 0.1 branch.

Acknowledgements

We highly appreciate all the challenge organizers and dataset owners for providing the public dataset to the community.
We thank Meta AI for making the source code of segment anything publicly available.
We also thank Alexandre Bonnet for sharing this great blog

Reference

@article{MedSAM,
  title={Segment Anything in Medical Images},
  author={Ma, Jun and He, Yuting and Li, Feifei and Han, Lin and You, Chenyu and Wang, Bo},
  journal={Nature Communications},
  volume={15},
  pages={1--9},
  year={2024}
}

medsam's People

Contributors

Stargazers

Watchers

Forkers

nzwang iff-0303 2418615589 howard-scu lsptb yanfang-research arnabiswas chengmuni66 andreped successhaha ajinkya-kulkarni nanbhas nhgowtham lhssu mtyhon nielsrogge techthiyanes danhleth joseangelgarciasanchez twonp168 mailbox-lab sarrabenyahia creatorcao jinzaizhichi githangar wahyurahmaniar chinmay5 yichizhang98 fyii200 rohanbanerjee themavencoder lixiang007666 thanhpham1987 foolssss amitabhama hufeihu happy-life-echo jiachen0212 nikolaydyankov gth-ai hjxiangzuodashen segunmarcaida dorbodwolf haojianguo wuzujiong genff ljunius simonqian999 shitoudidi neoalwayz nisheethjaiswal ctrlaltf2 dungtd2403 matthewleecode frexg parinitaedke jennynanap elsword016 git-tengsun deepmd-mvp1 abbsalehi doctor-wong xhc19930714 joyli-x d710055071 lppllppl920 rusab jialeqian annchou devhliu 2132660698 abhishekiitd327 tchangtc cv-seg guanshanjushi matcovic kalbedigitallab gems-jwh119 gebeyaw boyiqian oli-mac jzw0025 jumbojing mahmoudnavaser isa96 tuchsanai pojo-25 ztt0821 gitteor austintapp linhandev csccsccsccsc josephxing timwahoo yunlong12 taipingeric tyqqj0 wangfq-cpu ajinkyapuar zwk062

medsam's Issues

For multiple object in a image

Firstly, thanks for the great work!

I was trying to fine-tune my custom dataset. My dataset contains several 2D images, basically some of the cell on the image. I have a ground truth mask ndarray file, which represents EACH of the cell by a positive int.

So my ground truth array looks like a thing as follows:

0 as background as those different positive int indicate different cells.

I have read your demo code for 2D images preprocessing: pre_grey_rgb2D.py

However, if I am not making it wrong, since your demo dataset only have one mask per image, your code is design for binary ground truth instead of multiple objects in one images.

I am trying to modify the code you provided to handle multiple objects, I am able to save my ground truth as it original way into gt_data since it is already what I am expecting. I would like to ask:

Do I need to modify anything about embedding? Or does the embedding has any relation with the masks?
I understand I should also modify the finetune_and_inference_tutorial_2D_dataset.ipynb file to produce multiple bounding box instead of one. Where should I modify it exactly?

Thank you.

The semantic labels for the training npz files

Hi, Thanks for the great work.

It seems the training data is class-agnostic, with only image and binary gt-mask provided. Could you please provide specific lesion/organ label for each training sample?

About the implementation of NpzDataset

I found that the NpzDataset in finetune_and_inference_tutorial.py is mostly implemented using numpy, which caused this code to run very slowly on my machine. I changed it to the following code implemented using tensor, and got a significant speed increase. At the same time The DSC on the sample data set MICCAI FLARE2022 is 0.9008, which is not lower than the result of the original code. I hope you can also try the following code.

class NpzDataset(Dataset):
    def __init__(self, data_root, image_size=256):
        self.data_root = data_root
        self.image_size = image_size
        self.npz_files = sorted(os.listdir(self.data_root))
        self.npz_data = [
            np.load(join(data_root, f
                         ), allow_pickle=True)
            for f in self.npz_files
        ]
        self.ori_gts = torch.vstack(
            [torch.from_numpy(d['gts']) for d in self.npz_data])
        self.img_embeddings = torch.vstack(
            [torch.from_numpy(d['img_embeddings']) for d in self.npz_data])
        print(self.ori_gts.shape, self.img_embeddings.shape)

    def __len__(self):
        return self.ori_gts.shape[0]

    def __getitem__(self, index):
        img_embed = self.img_embeddings[index]
        gt2D = self.ori_gts[index]
        y_indices, x_indices = torch.where(gt2D > 0)
        x_min, x_max = torch.min(x_indices), torch.max(x_indices)
        y_min, y_max = torch.min(y_indices), torch.max(y_indices)
        H, W = gt2D.shape
        x_min = max(0, x_min - torch.randint(0, 20, (1, )).item())
        x_max = min(W, x_max + torch.randint(0, 20, (1, )).item())
        y_min = max(0, y_min - torch.randint(0, 20, (1, )).item())
        y_max = min(H, y_max + torch.randint(0, 20, (1, )).item())
        bboxes = torch.tensor([x_min, y_min, x_max, y_max]).float()
        return img_embed.float(), gt2D[None, :, :].long(), bboxes

image embeddings and bbox for prompt_encoder

Thanks for sharing and building this repo! I got two questions:

Why use the 3D image itself as image embeddings? Why not use the concatenation of 2D embeddings derived from SAM's own image encoder?
You passed the bbox to the prompt encoder providing that you know the labels for any input image. But for lesion detection tasks, there's usually no way we can somehow locate the area of all lesions. Is it more practical to fine-tune the model without prompt inputs?

How to pre-process and predict unlabeled test sets？

I am a beginner in medical imaging, how can I modify the code so that it does not require '-gt' to make predictions?

How to run train.py on multi GPUs?

I tried to " CUDA_VISIBLE_DEVICES=0,1", but it is still run on GPU 0.
I also tried "sam_model = DataParallel(sam_model, device_ids=[0,1])", but it does not work either.

About the GPU used in the experiment

Hi, that's a great job. I noticed that the article did not describe the hardware conditions used. May I ask what GPU you are using?

about 3D MRI data inprecompute_img_embed.py

thanks for your project, and i want to know can i try
sam_transform.apply_image need (256,256,3) but my npz.imgs are (159,256,256,3) the 159 is section

for name in tqdm(npz_files):
img = np.load(join(pre_img_path, name))['imgs'] # (256, 256, 3)
gt = np.load(join(pre_img_path, name))['gts']
#resize_img = sam_transform.apply_image(img)
#fixme:maybe make a loop better
resize_img = sam_transform.apply_image(img)

Image embeddings size

Hi there,

I have been trying to follow your code for my own custom fine tuning on 3d images. However, I have some doubts.
So, in the pre_CT.py file, after the image embedding computation, the image size is (1,3,1024,1024), but when the embeddings are being stacked, you have mentioned the shape as (n, 1, 256, 64, 64) and the same has been used in the train.py as well.

img_embeddings = np.stack(img_embeddings, axis=0) # (n, 1, 256, 64, 64)

I don't see any other transform being applied to the embeddings before being stacked. What am I missing here?

what is label_id

In the file "pre_CT.py", when defining preprocessing function in line 54:
def preprocess_ct(gt_path, nii_path, gt_name, image_name, label_id, image_size, sam_model):
…
what is label_id?
请帮忙回答一下，谢谢

How to pre-process normal dataset

If my dataset's images and masks are like Pascal Voc, which means they are .png or .jpg. How to transfer for that kind of dataset.
Thank you very much for you kindly help.

Fine-Tune SAM with own dataset

I fine-tuned the SAM model based on this project. I have prepared my own dataset and save ground truth to single-channel images.
I used the "pre_grey_rgb2D.py" script to convert the dataset into npz format. (Note: The images in the dataset have different sizes.)
When I trained the model following the "`finetune_and_inference_tutorial_2D_dataset.ipynb" , I encountered an error during the computation of the loss in the training loop: "ground truth has different shape (torch.Size([44, 1, 1024, 1024])) from input (torch.Size([44, 1, 256, 256]))".

Did I make any mistakes in the steps? How should I handle this?

Unexpected key(s) in state_dict: "image_encoder.blocks.0.mlp.lin1.weight", "image_encoder.blocks.0.mlp.lin1.bias", "image_encoder.blocks.0.mlp.lin2.weight", "image_encoder.blocks.0.mlp.lin2.bias",

I change the nn.linear, and it had error
i do not how to solve it

and this is the error

in finetune_and_inference_tutorial_2D_dataset

Hi @JunMa11

When i am creating the npz dataset for a new custom dataset of mine which is in PNG format - the npz file is not getting saved in the folder.

tqdm is working and no error but no file is saved.

can you please help?

Possible Issue with CUDA Device and CPU Support in Inference Script

Hello,

I believe there may be an issue with line 45 of the inference script. Specifically, the script is forcing the CUDA device, which may prevent CPU support when passing the argument '--device cpu'. Would it be possible for you to investigate this further?

Thank you.

CPU to run inference

Hi,
Changing 'cuda:0' to CPU does not work to run the MedSAM_Inference.py.

How do you conduct the evaluation

I am curious about the evaluation step. It requires the bounding box input, so do you guys input the bounding box manually or do you use a better idea?

How to preprocess my own 2d x-ray dataset ? (jpgs)

I have read the preprocess script in the dir (pre_CT,pre_MR),but they are both used to handle 3d data.I want to test MedSAM on my 2d x-ray ribs segmentation dataset,but don't know how to rewrite the script to fit 2d data.Do you have the same issue or can someone help me?

[Metrics reported in report is different from others]

Hello Dr Ma and Dr Wang ,

Nice to see you again !(I am a participant NIPS-cell-seg) Thanks for this wonderful work!
The DICE in your report for breast ultrasound for original SAM in Table 2 is 78.01.However , the DICE reported in this [report]( breast ultrasound ) is around 0.4 /0.6(different setting ,and they use some information from ground-truth) .I my experiment , without any information from ground-truth,the zero-shot inference DICE I got is around 0.3 .Would you mind give some hint on inference SAM on breast ultrasound dataset?

Best regards,
BIzhe

Inference on New Data

Good morning. I have been implementing MedSAM on my data and I have run into a confusion with inference. When we have ground truths, everything runs well. However, when looking at the inference scripts provided, they all seem to require a ground truth, as that is used to generate a bounding box which is then fed into the segmenter as a prompt. Is there a script demonstrating inference on new data that I missed or an obvious work-around?

Thank you,
Chris

preprocessing error

Hi ，
when do preprocessing use pre_CT.py and my own dataset, it returns
0it [00:00, ?it/s]
0it [00:00, ?it/s]
What is the problem?
不知道是什么问题，需要您的帮助，能加您的QQ或微信沟通一下吗，谢谢！

Bounding Box prompts during inference

Thank you for the detailed explanation of using the MedSAM model.

I have a dataset where bounding boxes are available during training but not during inference. If I train the model using the bounding boxes and perform inference without them, will I get comparable performance?

question about released model

Thank for your job。
I have a question about the model you released in this place "Download the model checkpoint (GoogleDrive)"

Is it a single application scenario of a label_id，for example ，only for 9 ? Because in the fine tuning code, you need to set label_id.

Can we do segmentation over whole 3d CT dataset without any ground truth?

Have you resampled the 3D volumes?

As the volume size differs among different 3D medical images, when I finetune SAM on my own dasaset, I resampled the 3D CT volume to a fixed space, like (1,1,1) before converting to 2D slices, but it seems there is no change for the result. Is it necessary to resample the 3d volume before training or fine-tuning SAM for medical images?

using the network for brain MRI dataset

Hi, I am trying to use the code on a new dataset, BraTS for brain tumor segmentation. I meet one problem about the ground truth. The ground truth of this dataset is not binary, it has 4 labels.
Can this model be used for this dataset?
Sincerely,

Provide full configuration for DRIVE vessel segmentation dataset

Hi, first of all, thanks for your work!

However, when trying to reproduce the result from DRIVE dataset which is about vessel segmentation, I could not achieve the results as you did in Table 2 of your paper. Could you kindly provide me more detailedly about how to produce the DSC of around 66 (my best results are only around 60).

No such file or directory: 'work_dir/SAM/sam_vit_b_01ec64.pth

Traceback (most recent call last):
  File "/xxx/MedSAM/pre_CT.py", line 116, in <module>
    sam_model = sam_model_registry[args.model_type](checkpoint=args.checkpoint).to(args.device)
  File "/xxx/MedSAM/segment_anything/build_sam.py", line 38, in build_sam_vit_b
    return _build_sam(
  File "/xxx/MedSAM/segment_anything/build_sam.py", line 104, in _build_sam
    with open(checkpoint, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'work_dir/SAM/sam_vit_b_01ec64.pth'

从这 https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth 下了缺少的文件放过去可以正常跑。感觉sam好像是有自动下载的不知道为啥我跑的时候会说缺文件。可能readme可以加一下？

Error while training

I'm trying out the code, with 2D dataset as suggested in the documentation. But i'm having an runtime error, "he size of tensor a (4096) must match the size of tensor b (64) at non-singleton dimension 0".
self.img_embeddings.shape=(456, 256, 64, 64), self.ori_gts.shape=(456, 256, 256)
img_embed.shape=torch.Size([8, 256, 64, 64]), gt2D.shape=torch.Size([8, 1, 256, 256]), bboxes.shape=torch.Size([8, 4])

The tensor shapes seems okay.

About instance segmentation

I wonder whether MedSAM can only work on one class of segmentation target because it use box as prompt，and maybe instance segmentation is not possible now？

what is mean of gt_data's length setting?

Hello, Dr Ma and Dr Wang!

First of all, Thanks for share your code and paper!
I have question about a specific part in
line 60, pre_MR.py
if np.sum(gt_data)>1000:
why you setting 1000? Does it have any special meaning?

In my case, empty mask and disease mask are mixed in same nii.gz. so i'll change code to
if np.sum(gt_data)>=0:
it can be a problem?

Thank you,

Multi-class segmentation?

Hi, does it only support single-class segmentation?
As far as I understood, your code only supports single-class segmentation. Am I correct?

Bounding box training

Hey.
As far as I understood your fine-tuning method, you marked a bounding box around your target area and then trained the model to segment that region better(In general). But if we don't know the bounding box starting point? Let's say we want to find a tumor inside a breast area, but we cannot decide where is the real tumor's spatial area. Have you tried to do it automatically?

Question about the data splitting

Hi,

Thanks for sharing a nice work. I have a question about the data splitting.

As mentioned, using pre_CT.py to split medical dataset, where 80% for training and 20% for testing

Also, we need to download the testing dataset from GoogleDrive

So, are these two part different? What's goal of generating 20% testing data during pre_CT.py?

Best.

how to handle multi-class semantic segmentation of 2d images

Missing branch in "pre-compute image image embedding"

Hi
Thanks for you effort on this great repo!

You've covered these branches:

3 dimensions and 3 channels
2 dimensions (grayscale)
But missed the 3 dimensions 1 channel (grayscale) one

Here is the correct version:

for name in tqdm(names[::]):
    image_data = io.imread(join(data_tr_path, 'images', name))
    if image_data.ndim == 3 and image_data.shape[-1]>3:
        image_data = image_data[:,:,:3]
    if image_data.ndim == 2:
        image_data = np.repeat(image_data[:,:,None], 3, axis=-1)
        
    sam_transform = ResizeLongestSide(sam_model.image_encoder.img_size)
    resize_img = sam_transform.apply_image(image_data)
    ........
    ........
    ........

This is in finetune_and_inference_tutorial_auto_seg.ipynb notebook

Cheers

Would you please share the trained model of your MedSAM?

It really takes a very long time to train MedSAM on totally 21 3D segmentation tasks by myself with only one RTX 3090Ti.

train error with Google Drive data

I am trying to test this out before turning it over to the researchers, and i have been going over the various steps. I was able to successfully run

(medsam) [root@lri-uapps-1 MedSAM]# python utils/precompute_img_embed.py -i /data/train -o /data/Tr_emb

however the actual model seems to be failing due to too many files:

(medsam) [root@lri-uapps-1 MedSAM]# python train.py -i /data/Tr_emb --task_name SAM-ViT-B --num_epochs 1000 --batch_size 8 --lr 1e-5
Traceback (most recent call last):
File "/usr/local/MedSAM/train.py", line 83, in
train_dataset = NpzDataset(args.npz_tr_path)
File "/usr/local/MedSAM/train.py", line 24, in init
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/MedSAM/train.py", line 24, in
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/anaconda3/envs/medsam/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
OSError: [Errno 24] Too many open files: '/data/Tr_emb/Tr_000000990.npz'

(medsam) [root@lri-uapps-1 MedSAM]# ls /data/Tr_emb/ | wc -l
161857

Could this be a numpy error perhaps?

About making dataset

How convert coco dataset to the required dataset

Runtime error for batch size > 1

Thank you for the great work.

I am getting the following error for batch size > 1.

src = src + dense_prompt_embeddings
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 0

Could you please tell me how can I fix this error?

I really appreciate your help.

Sincerely,
Mostafij

i do not know the difference bewteen SAM to MedSAM

In my opinion,the defination about SAM and the MedSAM is same .
just like
ori_sam_model = sam_model_registrymodel_type.to(device)
sam_model = sam_model_registrymodel_type.to(device)
so i can not to what diffience bewteen the SAM and the MedSAM
because i modify the code in 'segment-anything'
and it apperence this picture

so why the SAM DESC is zero???

Using the model for a new dataset.

Hello, can this model be used for the vocdevkit dataset?
How to convert PNG images and masks to NPZ format?

I integrated MedSam into napari FYI

Hey,

I just wanted to let you know that I integrated MedSam already into my Napari SAM plugin: https://github.com/MIC-DKFZ/napari-sam

So you can check the mark on "3D slicer and napari support" on your todo list if you want ;)

Best,
Karol

ValueError: operands could not be broadcast together with shapes (480,600) (256,256)

Hello, while trying to infer 2D images of different sizes (which have been successfully trained), I found a problem with the model output medsam_ seg_ Prob=256 * 256, which is inconsistent with the original image segmentation position. I believe that after the inference is completed, "sam_model. postprocess_masks (medsam_seg, input_image. shape, gt_data. shape)" is required, but this leads to the direct loss of the predicted mask

About NpzDataset in finetune_and_inference_tutorial_2D_dataset

Hi professor,

Can I just use NpzDataset to read in a .png formatted dataset, or do I need to modify it to fit a .png formatted dataset?
Thank you for your patience and kindness!

Best regards

Python Version Mismatch issue

The documentation says that should use Python 3.10, but when I am running the install in my environment a bug is happening asking for my Python to be downgraded to install. Please look at this. The error is pasted below.

ERROR: Package 'medsam' requires a different Python: 3.8.10 not in '>=3.9'

A minor question about training data root

The difference between line52 and Line62.

image encoder

Why not use a pretrained ViT-L model as an image encoder? In the original paper, ViT-L performed better than ViT-B.

License, MIT?

Hey, this is really cool work and the result looks really promising!

There is no license on this, I'm assuming the intent is that its freely usable everywhere? If so, could you please attach a standard MIT license to the repo? 🙏

Thanks again for the awesome research!

Question about image size

Thank you so much for this great job. I notice that your image size is 256X256 in your example, as my imge size is 512X512, how can I modify parameters so that these images can fit for the SAM checkpoint?