bowang-lab / medsam Goto Github PK
View Code? Open in Web Editor NEWSegment Anything in Medical Images
Home Page: https://www.nature.com/articles/s41467-024-44824-z
License: Apache License 2.0
Segment Anything in Medical Images
Home Page: https://www.nature.com/articles/s41467-024-44824-z
License: Apache License 2.0
In the file "pre_CT.py", when defining preprocessing function in line 54:
def preprocess_ct(gt_path, nii_path, gt_name, image_name, label_id, image_size, sam_model):
…
what is label_id?
请帮忙回答一下,谢谢
Hi, first of all, thanks for your work!
However, when trying to reproduce the result from DRIVE dataset which is about vessel segmentation, I could not achieve the results as you did in Table 2 of your paper. Could you kindly provide me more detailedly about how to produce the DSC of around 66 (my best results are only around 60).
Hi, that's a great job. I noticed that the article did not describe the hardware conditions used. May I ask what GPU you are using?
Hi @JunMa11
When i am creating the npz dataset for a new custom dataset of mine which is in PNG format - the npz file is not getting saved in the folder.
tqdm is working and no error but no file is saved.
can you please help?
Hi ,
when do preprocessing use pre_CT.py and my own dataset, it returns
0it [00:00, ?it/s]
0it [00:00, ?it/s]
What is the problem?
不知道是什么问题,需要您的帮助,能加您的QQ或微信沟通一下吗,谢谢!
Hi professor,
Can I just use NpzDataset to read in a .png formatted dataset, or do I need to modify it to fit a .png formatted dataset?
Thank you for your patience and kindness!
Best regards
It really takes a very long time to train MedSAM on totally 21 3D segmentation tasks by myself with only one RTX 3090Ti.
I am a beginner in medical imaging, how can I modify the code so that it does not require '-gt' to make predictions?
I am trying to test this out before turning it over to the researchers, and i have been going over the various steps. I was able to successfully run
(medsam) [root@lri-uapps-1 MedSAM]# python utils/precompute_img_embed.py -i /data/train -o /data/Tr_emb
however the actual model seems to be failing due to too many files:
(medsam) [root@lri-uapps-1 MedSAM]# python train.py -i /data/Tr_emb --task_name SAM-ViT-B --num_epochs 1000 --batch_size 8 --lr 1e-5
Traceback (most recent call last):
File "/usr/local/MedSAM/train.py", line 83, in
train_dataset = NpzDataset(args.npz_tr_path)
File "/usr/local/MedSAM/train.py", line 24, in init
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/MedSAM/train.py", line 24, in
self.npz_data = [np.load(join(data_root, f)) for f in self.npz_files]
File "/usr/local/anaconda3/envs/medsam/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
OSError: [Errno 24] Too many open files: '/data/Tr_emb/Tr_000000990.npz'
(medsam) [root@lri-uapps-1 MedSAM]# ls /data/Tr_emb/ | wc -l
161857
Could this be a numpy error perhaps?
Firstly, thanks for the great work!
I was trying to fine-tune my custom dataset. My dataset contains several 2D images, basically some of the cell on the image. I have a ground truth mask ndarray file, which represents EACH of the cell by a positive int.
So my ground truth array looks like a thing as follows:
0 as background as those different positive int indicate different cells.
I have read your demo code for 2D images preprocessing: pre_grey_rgb2D.py
However, if I am not making it wrong, since your demo dataset only have one mask per image, your code is design for binary ground truth instead of multiple objects in one images.
I am trying to modify the code you provided to handle multiple objects, I am able to save my ground truth as it original way into gt_data
since it is already what I am expecting. I would like to ask:
finetune_and_inference_tutorial_2D_dataset.ipynb
file to produce multiple bounding box instead of one. Where should I modify it exactly?Thank you.
Hey,
I just wanted to let you know that I integrated MedSam already into my Napari SAM plugin: https://github.com/MIC-DKFZ/napari-sam
So you can check the mark on "3D slicer and napari support" on your todo list if you want ;)
Best,
Karol
Thank you so much for this great job. I notice that your image size is 256X256 in your example, as my imge size is 512X512, how can I modify parameters so that these images can fit for the SAM checkpoint?
Good morning. I have been implementing MedSAM on my data and I have run into a confusion with inference. When we have ground truths, everything runs well. However, when looking at the inference scripts provided, they all seem to require a ground truth, as that is used to generate a bounding box which is then fed into the segmenter as a prompt. Is there a script demonstrating inference on new data that I missed or an obvious work-around?
Thank you,
Chris
As the volume size differs among different 3D medical images, when I finetune SAM on my own dasaset, I resampled the 3D CT volume to a fixed space, like (1,1,1) before converting to 2D slices, but it seems there is no change for the result. Is it necessary to resample the 3d volume before training or fine-tuning SAM for medical images?
I fine-tuned the SAM model based on this project. I have prepared my own dataset and save ground truth to single-channel images.
I used the "pre_grey_rgb2D.py" script to convert the dataset into npz format. (Note: The images in the dataset have different sizes.)
When I trained the model following the "`finetune_and_inference_tutorial_2D_dataset.ipynb" , I encountered an error during the computation of the loss in the training loop: "ground truth has different shape (torch.Size([44, 1, 1024, 1024])) from input (torch.Size([44, 1, 256, 256]))".
Did I make any mistakes in the steps? How should I handle this?
I tried to " CUDA_VISIBLE_DEVICES=0,1", but it is still run on GPU 0.
I also tried "sam_model = DataParallel(sam_model, device_ids=[0,1])", but it does not work either.
Hi
Thanks for you effort on this great repo!
You've covered these branches:
Here is the correct version:
for name in tqdm(names[::]):
image_data = io.imread(join(data_tr_path, 'images', name))
if image_data.ndim == 3 and image_data.shape[-1]>3:
image_data = image_data[:,:,:3]
if image_data.ndim == 2:
image_data = np.repeat(image_data[:,:,None], 3, axis=-1)
sam_transform = ResizeLongestSide(sam_model.image_encoder.img_size)
resize_img = sam_transform.apply_image(image_data)
........
........
........
This is in finetune_and_inference_tutorial_auto_seg.ipynb
notebook
Cheers
If my dataset's images and masks are like Pascal Voc, which means they are .png or .jpg. How to transfer for that kind of dataset.
Thank you very much for you kindly help.
The documentation says that should use Python 3.10, but when I am running the install in my environment a bug is happening asking for my Python to be downgraded to install. Please look at this. The error is pasted below.
ERROR: Package 'medsam' requires a different Python: 3.8.10 not in '>=3.9'
I'm trying out the code, with 2D dataset as suggested in the documentation. But i'm having an runtime error, "he size of tensor a (4096) must match the size of tensor b (64) at non-singleton dimension 0".
self.img_embeddings.shape=(456, 256, 64, 64), self.ori_gts.shape=(456, 256, 256)
img_embed.shape=torch.Size([8, 256, 64, 64]), gt2D.shape=torch.Size([8, 1, 256, 256]), bboxes.shape=torch.Size([8, 4])
The tensor shapes seems okay.
Hey, this is really cool work and the result looks really promising!
There is no license on this, I'm assuming the intent is that its freely usable everywhere? If so, could you please attach a standard MIT license to the repo? 🙏
Thanks again for the awesome research!
I have read the preprocess script in the dir (pre_CT,pre_MR),but they are both used to handle 3d data.I want to test MedSAM on my 2d x-ray ribs segmentation dataset,but don't know how to rewrite the script to fit 2d data.Do you have the same issue or can someone help me?
Hello, while trying to infer 2D images of different sizes (which have been successfully trained), I found a problem with the model output medsam_ seg_ Prob=256 * 256, which is inconsistent with the original image segmentation position. I believe that after the inference is completed, "sam_model. postprocess_masks (medsam_seg, input_image. shape, gt_data. shape)" is required, but this leads to the direct loss of the predicted mask
Hi,
Thanks for sharing a nice work. I have a question about the data splitting.
As mentioned, using pre_CT.py
to split medical dataset, where 80% for training and 20% for testing
Also, we need to download the testing dataset from GoogleDrive
So, are these two part different? What's goal of generating 20% testing data during pre_CT.py
?
Best.
Thanks for sharing and building this repo! I got two questions:
Why use the 3D image itself as image embeddings? Why not use the concatenation of 2D embeddings derived from SAM's own image encoder?
You passed the bbox to the prompt encoder providing that you know the labels for any input image. But for lesion detection tasks, there's usually no way we can somehow locate the area of all lesions. Is it more practical to fine-tune the model without prompt inputs?
Hello, Dr Ma and Dr Wang!
First of all, Thanks for share your code and paper!
I have question about a specific part in
line 60, pre_MR.py
if np.sum(gt_data)>1000:
why you setting 1000? Does it have any special meaning?
In my case, empty mask and disease mask are mixed in same nii.gz. so i'll change code to
if np.sum(gt_data)>=0:
it can be a problem?
Thank you,
Hello, can this model be used for the vocdevkit dataset?
How to convert PNG images and masks to NPZ format?
Hi, does it only support single-class segmentation?
As far as I understood, your code only supports single-class segmentation. Am I correct?
Hi, I am trying to use the code on a new dataset, BraTS for brain tumor segmentation. I meet one problem about the ground truth. The ground truth of this dataset is not binary, it has 4 labels.
Can this model be used for this dataset?
Sincerely,
I am curious about the evaluation step. It requires the bounding box input, so do you guys input the bounding box manually or do you use a better idea?
Hi, Thanks for the great work.
It seems the training data is class-agnostic, with only image and binary gt-mask provided. Could you please provide specific lesion/organ label for each training sample?
Hi there,
I have been trying to follow your code for my own custom fine tuning on 3d images. However, I have some doubts.
So, in the pre_CT.py file, after the image embedding computation, the image size is (1,3,1024,1024), but when the embeddings are being stacked, you have mentioned the shape as (n, 1, 256, 64, 64) and the same has been used in the train.py as well.
img_embeddings = np.stack(img_embeddings, axis=0) # (n, 1, 256, 64, 64)
I don't see any other transform being applied to the embeddings before being stacked. What am I missing here?
thanks for your project, and i want to know can i try
sam_transform.apply_image need (256,256,3) but my npz.imgs are (159,256,256,3) the 159 is section
for name in tqdm(npz_files):
img = np.load(join(pre_img_path, name))['imgs'] # (256, 256, 3)
gt = np.load(join(pre_img_path, name))['gts']
#resize_img = sam_transform.apply_image(img)
#fixme:maybe make a loop better
resize_img = sam_transform.apply_image(img)
Hi,
Changing 'cuda:0' to CPU does not work to run the MedSAM_Inference.py.
Hello,
I believe there may be an issue with line 45 of the inference script. Specifically, the script is forcing the CUDA device, which may prevent CPU support when passing the argument '--device cpu'. Would it be possible for you to investigate this further?
Thank you.
Thank you for the great work.
I am getting the following error for batch size > 1.
src = src + dense_prompt_embeddings
RuntimeError: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 0
Could you please tell me how can I fix this error?
I really appreciate your help.
Sincerely,
Mostafij
Why not use a pretrained ViT-L model as an image encoder? In the original paper, ViT-L performed better than ViT-B.
Traceback (most recent call last):
File "/xxx/MedSAM/pre_CT.py", line 116, in <module>
sam_model = sam_model_registry[args.model_type](checkpoint=args.checkpoint).to(args.device)
File "/xxx/MedSAM/segment_anything/build_sam.py", line 38, in build_sam_vit_b
return _build_sam(
File "/xxx/MedSAM/segment_anything/build_sam.py", line 104, in _build_sam
with open(checkpoint, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'work_dir/SAM/sam_vit_b_01ec64.pth'
从这 https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth 下了缺少的文件放过去可以正常跑。感觉sam好像是有自动下载的不知道为啥我跑的时候会说缺文件。可能readme可以加一下?
How convert coco dataset to the required dataset
Hello Dr Ma and Dr Wang ,
Nice to see you again !(I am a participant NIPS-cell-seg) Thanks for this wonderful work!
The DICE in your report for breast ultrasound for original SAM in Table 2 is 78.01.However , the DICE reported in this [report]( breast ultrasound ) is around 0.4 /0.6(different setting ,and they use some information from ground-truth) .I my experiment , without any information from ground-truth,the zero-shot inference DICE I got is around 0.3 .Would you mind give some hint on inference SAM on breast ultrasound dataset?
Best regards,
BIzhe
I wonder whether MedSAM can only work on one class of segmentation target because it use box as prompt,and maybe instance segmentation is not possible now?
In my opinion,the defination about SAM and the MedSAM is same .
just like
ori_sam_model = sam_model_registrymodel_type.to(device)
sam_model = sam_model_registrymodel_type.to(device)
so i can not to what diffience bewteen the SAM and the MedSAM
because i modify the code in 'segment-anything'
and it apperence this picture
so why the SAM DESC is zero???
Hey.
As far as I understood your fine-tuning method, you marked a bounding box around your target area and then trained the model to segment that region better(In general). But if we don't know the bounding box starting point? Let's say we want to find a tumor inside a breast area, but we cannot decide where is the real tumor's spatial area. Have you tried to do it automatically?
Thank for your job。
I have a question about the model you released in this place "Download the model checkpoint (GoogleDrive)"
Is it a single application scenario of a label_id,for example ,only for 9 ? Because in the fine tuning code, you need to set label_id.
I found that the NpzDataset in finetune_and_inference_tutorial.py is mostly implemented using numpy, which caused this code to run very slowly on my machine. I changed it to the following code implemented using tensor, and got a significant speed increase. At the same time The DSC on the sample data set MICCAI FLARE2022 is 0.9008, which is not lower than the result of the original code. I hope you can also try the following code.
class NpzDataset(Dataset):
def __init__(self, data_root, image_size=256):
self.data_root = data_root
self.image_size = image_size
self.npz_files = sorted(os.listdir(self.data_root))
self.npz_data = [
np.load(join(data_root, f
), allow_pickle=True)
for f in self.npz_files
]
self.ori_gts = torch.vstack(
[torch.from_numpy(d['gts']) for d in self.npz_data])
self.img_embeddings = torch.vstack(
[torch.from_numpy(d['img_embeddings']) for d in self.npz_data])
print(self.ori_gts.shape, self.img_embeddings.shape)
def __len__(self):
return self.ori_gts.shape[0]
def __getitem__(self, index):
img_embed = self.img_embeddings[index]
gt2D = self.ori_gts[index]
y_indices, x_indices = torch.where(gt2D > 0)
x_min, x_max = torch.min(x_indices), torch.max(x_indices)
y_min, y_max = torch.min(y_indices), torch.max(y_indices)
H, W = gt2D.shape
x_min = max(0, x_min - torch.randint(0, 20, (1, )).item())
x_max = min(W, x_max + torch.randint(0, 20, (1, )).item())
y_min = max(0, y_min - torch.randint(0, 20, (1, )).item())
y_max = min(H, y_max + torch.randint(0, 20, (1, )).item())
bboxes = torch.tensor([x_min, y_min, x_max, y_max]).float()
return img_embed.float(), gt2D[None, :, :].long(), bboxes
Thank you for the detailed explanation of using the MedSAM model.
I have a dataset where bounding boxes are available during training but not during inference. If I train the model using the bounding boxes and perform inference without them, will I get comparable performance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.