GithubHelp home page GithubHelp logo

bowang-lab / medsam Goto Github PK

View Code? Open in Web Editor NEW
2.2K 2.2K 269.0 61.51 MB

Segment Anything in Medical Images

Home Page: https://www.nature.com/articles/s41467-024-44824-z

License: Apache License 2.0

Python 13.27% Jupyter Notebook 86.64% Shell 0.09%

medsam's People

Contributors

ajinkya-kulkarni avatar ctrlaltf2 avatar frexg avatar joseangelgarciasanchez avatar junma11 avatar linhandev avatar sarrabenyahia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medsam's Issues

what is label_id

In the file "pre_CT.py", when defining preprocessing function in line 54:
def preprocess_ct(gt_path, nii_path, gt_name, image_name, label_id, image_size, sam_model):

what is label_id?
请帮忙回答一下,谢谢

Provide full configuration for DRIVE vessel segmentation dataset

Hi, first of all, thanks for your work!

However, when trying to reproduce the result from DRIVE dataset which is about vessel segmentation, I could not achieve the results as you did in Table 2 of your paper. Could you kindly provide me more detailedly about how to produce the DSC of around 66 (my best results are only around 60).

in finetune_and_inference_tutorial_2D_dataset

Hi @JunMa11

When i am creating the npz dataset for a new custom dataset of mine which is in PNG format - the npz file is not getting saved in the folder.

tqdm is working and no error but no file is saved.

image

can you please help?

preprocessing error

Hi ,
when do preprocessing use pre_CT.py and my own dataset, it returns
0it [00:00, ?it/s]
0it [00:00, ?it/s]
What is the problem?
不知道是什么问题,需要您的帮助,能加您的QQ或微信沟通一下吗,谢谢!

train error with Google Drive data

I am trying to test this out before turning it over to the researchers, and i have been going over the various steps. I was able to successfully run

(medsam) [root@lri-uapps-1 MedSAM]# python utils/precompute_img_embed.py -i /data/train -o /data/Tr_emb

however the actual model seems to be failing due to too many files:

(medsam) [root@lri-uapps-1 MedSAM]# python train.py -i /data/Tr_emb --task_name SAM-ViT-B --num_epochs 1000 --batch_size 8 --lr 1e-5
Traceback (most recent call last):
File "/usr/local/MedSAM/train.py", line 83, in
train_dataset = NpzDataset(args.npz_tr_path)
File "/usr/local/MedSAM/train.py", line 24, in init
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/MedSAM/train.py", line 24, in
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/anaconda3/envs/medsam/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
OSError: [Errno 24] Too many open files: '/data/Tr_emb/Tr_000000990.npz'

(medsam) [root@lri-uapps-1 MedSAM]# ls /data/Tr_emb/ | wc -l
161857

Could this be a numpy error perhaps?

For multiple object in a image

Firstly, thanks for the great work!

I was trying to fine-tune my custom dataset. My dataset contains several 2D images, basically some of the cell on the image. I have a ground truth mask ndarray file, which represents EACH of the cell by a positive int.

So my ground truth array looks like a thing as follows:
image
0 as background as those different positive int indicate different cells.

I have read your demo code for 2D images preprocessing: pre_grey_rgb2D.py

However, if I am not making it wrong, since your demo dataset only have one mask per image, your code is design for binary ground truth instead of multiple objects in one images.

I am trying to modify the code you provided to handle multiple objects, I am able to save my ground truth as it original way into gt_data since it is already what I am expecting. I would like to ask:

  1. Do I need to modify anything about embedding? Or does the embedding has any relation with the masks?
  2. I understand I should also modify the finetune_and_inference_tutorial_2D_dataset.ipynb file to produce multiple bounding box instead of one. Where should I modify it exactly?

Thank you.

Question about image size

Thank you so much for this great job. I notice that your image size is 256X256 in your example, as my imge size is 512X512, how can I modify parameters so that these images can fit for the SAM checkpoint?

Inference on New Data

Good morning. I have been implementing MedSAM on my data and I have run into a confusion with inference. When we have ground truths, everything runs well. However, when looking at the inference scripts provided, they all seem to require a ground truth, as that is used to generate a bounding box which is then fed into the segmenter as a prompt. Is there a script demonstrating inference on new data that I missed or an obvious work-around?

Thank you,
Chris

Have you resampled the 3D volumes?

As the volume size differs among different 3D medical images, when I finetune SAM on my own dasaset, I resampled the 3D CT volume to a fixed space, like (1,1,1) before converting to 2D slices, but it seems there is no change for the result. Is it necessary to resample the 3d volume before training or fine-tuning SAM for medical images?

Fine-Tune SAM with own dataset

I fine-tuned the SAM model based on this project. I have prepared my own dataset and save ground truth to single-channel images.
I used the "pre_grey_rgb2D.py" script to convert the dataset into npz format. (Note: The images in the dataset have different sizes.)
When I trained the model following the "`finetune_and_inference_tutorial_2D_dataset.ipynb" , I encountered an error during the computation of the loss in the training loop: "ground truth has different shape (torch.Size([44, 1, 1024, 1024])) from input (torch.Size([44, 1, 256, 256]))".

Did I make any mistakes in the steps? How should I handle this?

How to run train.py on multi GPUs?

I tried to " CUDA_VISIBLE_DEVICES=0,1", but it is still run on GPU 0.
I also tried "sam_model = DataParallel(sam_model, device_ids=[0,1])", but it does not work either.

Missing branch in "pre-compute image image embedding"

Hi
Thanks for you effort on this great repo!

You've covered these branches:

  • 3 dimensions and 3 channels
  • 2 dimensions (grayscale)
    But missed the 3 dimensions 1 channel (grayscale) one

Here is the correct version:

for name in tqdm(names[::]):
    image_data = io.imread(join(data_tr_path, 'images', name))
    if image_data.ndim == 3 and image_data.shape[-1]>3:
        image_data = image_data[:,:,:3]
    if image_data.ndim == 2:
        image_data = np.repeat(image_data[:,:,None], 3, axis=-1)
        
    sam_transform = ResizeLongestSide(sam_model.image_encoder.img_size)
    resize_img = sam_transform.apply_image(image_data)
    ........
    ........
    ........

This is in finetune_and_inference_tutorial_auto_seg.ipynb notebook

Cheers

How to pre-process normal dataset

If my dataset's images and masks are like Pascal Voc, which means they are .png or .jpg. How to transfer for that kind of dataset.
Thank you very much for you kindly help.

Python Version Mismatch issue

The documentation says that should use Python 3.10, but when I am running the install in my environment a bug is happening asking for my Python to be downgraded to install. Please look at this. The error is pasted below.

ERROR: Package 'medsam' requires a different Python: 3.8.10 not in '>=3.9'

Error while training

I'm trying out the code, with 2D dataset as suggested in the documentation. But i'm having an runtime error, "he size of tensor a (4096) must match the size of tensor b (64) at non-singleton dimension 0".
self.img_embeddings.shape=(456, 256, 64, 64), self.ori_gts.shape=(456, 256, 256)
img_embed.shape=torch.Size([8, 256, 64, 64]), gt2D.shape=torch.Size([8, 1, 256, 256]), bboxes.shape=torch.Size([8, 4])

The tensor shapes seems okay.

License, MIT?

Hey, this is really cool work and the result looks really promising!

There is no license on this, I'm assuming the intent is that its freely usable everywhere? If so, could you please attach a standard MIT license to the repo? 🙏

Thanks again for the awesome research!

How to preprocess my own 2d x-ray dataset ? (jpgs)

I have read the preprocess script in the dir (pre_CT,pre_MR),but they are both used to handle 3d data.I want to test MedSAM on my 2d x-ray ribs segmentation dataset,but don't know how to rewrite the script to fit 2d data.Do you have the same issue or can someone help me?

ValueError: operands could not be broadcast together with shapes (480,600) (256,256)

Hello, while trying to infer 2D images of different sizes (which have been successfully trained), I found a problem with the model output medsam_ seg_ Prob=256 * 256, which is inconsistent with the original image segmentation position. I believe that after the inference is completed, "sam_model. postprocess_masks (medsam_seg, input_image. shape, gt_data. shape)" is required, but this leads to the direct loss of the predicted mask

Question about the data splitting

Hi,

Thanks for sharing a nice work. I have a question about the data splitting.

As mentioned, using pre_CT.py to split medical dataset, where 80% for training and 20% for testing

Also, we need to download the testing dataset from GoogleDrive

So, are these two part different? What's goal of generating 20% testing data during pre_CT.py?

Best.

image embeddings and bbox for prompt_encoder

Thanks for sharing and building this repo! I got two questions:

  1. Why use the 3D image itself as image embeddings? Why not use the concatenation of 2D embeddings derived from SAM's own image encoder?

  2. You passed the bbox to the prompt encoder providing that you know the labels for any input image. But for lesion detection tasks, there's usually no way we can somehow locate the area of all lesions. Is it more practical to fine-tune the model without prompt inputs?

what is mean of gt_data's length setting?

Hello, Dr Ma and Dr Wang!

First of all, Thanks for share your code and paper!
I have question about a specific part in
line 60, pre_MR.py
if np.sum(gt_data)>1000:
why you setting 1000? Does it have any special meaning?

In my case, empty mask and disease mask are mixed in same nii.gz. so i'll change code to
if np.sum(gt_data)>=0:
it can be a problem?

Thank you,

Multi-class segmentation?

Hi, does it only support single-class segmentation?
As far as I understood, your code only supports single-class segmentation. Am I correct?

using the network for brain MRI dataset

Hi, I am trying to use the code on a new dataset, BraTS for brain tumor segmentation. I meet one problem about the ground truth. The ground truth of this dataset is not binary, it has 4 labels.
Can this model be used for this dataset?
Sincerely,

How do you conduct the evaluation

I am curious about the evaluation step. It requires the bounding box input, so do you guys input the bounding box manually or do you use a better idea?

The semantic labels for the training npz files

Hi, Thanks for the great work.

It seems the training data is class-agnostic, with only image and binary gt-mask provided. Could you please provide specific lesion/organ label for each training sample?

Image embeddings size

Hi there,

I have been trying to follow your code for my own custom fine tuning on 3d images. However, I have some doubts.
So, in the pre_CT.py file, after the image embedding computation, the image size is (1,3,1024,1024), but when the embeddings are being stacked, you have mentioned the shape as (n, 1, 256, 64, 64) and the same has been used in the train.py as well.

img_embeddings = np.stack(img_embeddings, axis=0) # (n, 1, 256, 64, 64)

I don't see any other transform being applied to the embeddings before being stacked. What am I missing here?

about 3D MRI data inprecompute_img_embed.py

thanks for your project, and i want to know can i try
sam_transform.apply_image need (256,256,3) but my npz.imgs are (159,256,256,3) the 159 is section

for name in tqdm(npz_files):
img = np.load(join(pre_img_path, name))['imgs'] # (256, 256, 3)
gt = np.load(join(pre_img_path, name))['gts']
#resize_img = sam_transform.apply_image(img)
#fixme:maybe make a loop better
resize_img = sam_transform.apply_image(img)

Possible Issue with CUDA Device and CPU Support in Inference Script

Hello,

I believe there may be an issue with line 45 of the inference script. Specifically, the script is forcing the CUDA device, which may prevent CPU support when passing the argument '--device cpu'. Would it be possible for you to investigate this further?

Thank you.

Runtime error for batch size > 1

Thank you for the great work.

I am getting the following error for batch size > 1.

src = src + dense_prompt_embeddings
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 0

Could you please tell me how can I fix this error?

I really appreciate your help.

Sincerely,
Mostafij

image encoder

Why not use a pretrained ViT-L model as an image encoder? In the original paper, ViT-L performed better than ViT-B.

No such file or directory: 'work_dir/SAM/sam_vit_b_01ec64.pth

Traceback (most recent call last):
  File "/xxx/MedSAM/pre_CT.py", line 116, in <module>
    sam_model = sam_model_registry[args.model_type](checkpoint=args.checkpoint).to(args.device)
  File "/xxx/MedSAM/segment_anything/build_sam.py", line 38, in build_sam_vit_b
    return _build_sam(
  File "/xxx/MedSAM/segment_anything/build_sam.py", line 104, in _build_sam
    with open(checkpoint, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'work_dir/SAM/sam_vit_b_01ec64.pth'

从这 https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth 下了缺少的文件放过去可以正常跑。感觉sam好像是有自动下载的不知道为啥我跑的时候会说缺文件。可能readme可以加一下?

[Metrics reported in report is different from others]

Hello Dr Ma and Dr Wang ,

Nice to see you again !(I am a participant NIPS-cell-seg) Thanks for this wonderful work!
The DICE in your report for breast ultrasound for original SAM in Table 2 is 78.01.However , the DICE reported in this [report]( breast ultrasound ) is around 0.4 /0.6(different setting ,and they use some information from ground-truth) .I my experiment , without any information from ground-truth,the zero-shot inference DICE I got is around 0.3 .Would you mind give some hint on inference SAM on breast ultrasound dataset?

Best regards,
BIzhe

About instance segmentation

I wonder whether MedSAM can only work on one class of segmentation target because it use box as prompt,and maybe instance segmentation is not possible now?

i do not know the difference bewteen SAM to MedSAM

In my opinion,the defination about SAM and the MedSAM is same .
just like
ori_sam_model = sam_model_registrymodel_type.to(device)
sam_model = sam_model_registrymodel_type.to(device)
so i can not to what diffience bewteen the SAM and the MedSAM
because i modify the code in 'segment-anything'
and it apperence this picture
image
so why the SAM DESC is zero???

Bounding box training

Hey.
As far as I understood your fine-tuning method, you marked a bounding box around your target area and then trained the model to segment that region better(In general). But if we don't know the bounding box starting point? Let's say we want to find a tumor inside a breast area, but we cannot decide where is the real tumor's spatial area. Have you tried to do it automatically?

question about released model

Thank for your job。
I have a question about the model you released in this place "Download the model checkpoint (GoogleDrive)"

Is it a single application scenario of a label_id,for example ,only for 9 ? Because in the fine tuning code, you need to set label_id.

About the implementation of NpzDataset

I found that the NpzDataset in finetune_and_inference_tutorial.py is mostly implemented using numpy, which caused this code to run very slowly on my machine. I changed it to the following code implemented using tensor, and got a significant speed increase. At the same time The DSC on the sample data set MICCAI FLARE2022 is 0.9008, which is not lower than the result of the original code. I hope you can also try the following code.

class NpzDataset(Dataset):
    def __init__(self, data_root, image_size=256):
        self.data_root = data_root
        self.image_size = image_size
        self.npz_files = sorted(os.listdir(self.data_root))
        self.npz_data = [
            np.load(join(data_root, f
                         ), allow_pickle=True)
            for f in self.npz_files
        ]
        self.ori_gts = torch.vstack(
            [torch.from_numpy(d['gts']) for d in self.npz_data])
        self.img_embeddings = torch.vstack(
            [torch.from_numpy(d['img_embeddings']) for d in self.npz_data])
        print(self.ori_gts.shape, self.img_embeddings.shape)

    def __len__(self):
        return self.ori_gts.shape[0]

    def __getitem__(self, index):
        img_embed = self.img_embeddings[index]
        gt2D = self.ori_gts[index]
        y_indices, x_indices = torch.where(gt2D > 0)
        x_min, x_max = torch.min(x_indices), torch.max(x_indices)
        y_min, y_max = torch.min(y_indices), torch.max(y_indices)
        H, W = gt2D.shape
        x_min = max(0, x_min - torch.randint(0, 20, (1, )).item())
        x_max = min(W, x_max + torch.randint(0, 20, (1, )).item())
        y_min = max(0, y_min - torch.randint(0, 20, (1, )).item())
        y_max = min(H, y_max + torch.randint(0, 20, (1, )).item())
        bboxes = torch.tensor([x_min, y_min, x_max, y_max]).float()
        return img_embed.float(), gt2D[None, :, :].long(), bboxes

Bounding Box prompts during inference

Thank you for the detailed explanation of using the MedSAM model.

I have a dataset where bounding boxes are available during training but not during inference. If I train the model using the bounding boxes and perform inference without them, will I get comparable performance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.