nyukat / gmic Goto Github PK

View Code? Open in Web Editor NEW

161.0 10.0 49.0 492.53 MB

An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization

Home Page: https://doi.org/10.1016/j.media.2020.101908

License: GNU Affero General Public License v3.0

Shell 0.55% Python 39.51% Jupyter Notebook 59.95%

breast-cancer medical-imaging deep-learning pytorch breast-cancer-diagnosis breast-cancer-screening

gmic's People

Contributors

Stargazers

Watchers

gmic's Issues

Training

Hi there,

Thanks for sharing the code. I have a question in terms of the training part.
Given that the GMIC can be trained end-to-end, so the f_g and f_l would be updated simultaneously during training. But, at the early stage, the saliency maps could make mistake and cause the retrieve_roi function to extract incorrect patches (i.e., background), would that affect the convergence of the local module?

Cheers

Fine tuning

Hi Yiqiu,

I am considering to perform fine tuning of models in this repo on a couple of mammograms with image-level labels (only normal and malignant labels). I have the following concerns to seek your kind advice.

To fine tune the pretrained model, what loss function should I use? Should I use the binary CELoss of predicted output (line 183 of the run_model.py) and the ground-truth image label as the loss function or should I use the loss function given in Eq.(13) in your paper to re-train the model?
Is it reasonable to use the predicted benign probability, generated from line 183 of the run_model.py, as the probability of normal cases for fine tuning the model?
For image pre-processing, which method should be applied a). for each single training image, remove its mean value and divided by its std; b). for each batch_size training images, remove the mean value and divided by std (mean and std are calculated on all images from a mini-batch); c). for the whole training images, remove the mean value and divided by std (mean and std are calculated on all images from whole training set)

Thanks for your time and wait for your feedback.

Hello, I would like to test your model in my dataset. However, I don't know how you performed the pre-processing of the images (i.e image cropping to 2944 x 1920). I have thought of reshaping the images to that size. However, this could yield to underestimate the performance of the model. Can you tell me how to reproduce the image cropping in your model?

Any chance we could invite you guys to Kaggle comp?

Your models are great! Having you folks join in, even if just to give some encouragement and hints would be really fantastic.

https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/372673

If you can spread the word in the community that'd be helpful as well.

Batch Size

Hi, first of all, congratulations for your work!
While reading the code and the paper, I couldn't find which batch size did you use during training. Could you share that information?
Moreover, the code you are shearing here appears to be just an inference version, I would like to fine-tune the model to some images from a private dataset, so if you could share the training version you used it would be very nice. Furthermore, It would be nice if this repo was complete by itself, to make an inference with your own data you have to preprocess (cropping and reshaping) the images separately using 'breast_cancer_classifier' repo's code.
Thank you!

How to run GMIC on CBIS-DDSM?

Hi. I want to use GMIC to predict the benign/malignant label of CBIS-DDSM images. This repository provides five models pretrained on NYUBCS, and a run.sh script. To predict, I simply replaced the four NYUBCS samples (16 images) in sample_data/images/ by four CBIS-DDSM samples, and then executed run.sh. However, the sample_output/predictions.csv gave small probability values, mostly < 0.1. Therefore it is hard to tell benign/malignant. Which part went wrong?

I read the paper, which mentions "To preprocess mammography images in CBIS-DDSM, we first found the largest connected component containing only non-zero pixels to locate the breast. We then applied erosion and dilation to refine the breast margin. Lastly, we re-oriented all mammography images so that the breasts are always on the left side of the image. All images are resized to 2944 × 1920 pixels and pixels values were normalized to the range [0,1]." Obviously there were some preprocessing steps. Are these preprocessing steps included in run.sh? If not, where to obtain the code or how to implement these steps?

Thank you.

Missing parameter?

When loading models I get:

_IncompatibleKeys(missing_keys=[], unexpected_keys=['shared_rep_filter.weight'])

It looks like a 256x256x4x4 (4x4 conv?) weight but not implemented in the repo. Any chance this can be added?

Does this model need pixel-level segmentation masks of malignant and benign lesions?

Hi, according to your paper, it seems that training and inference of this model only requires image-level labels, no need for annotations of malignant and benign lesions. However, from the code in this repo, segmentation paths of malignant and benign lesions are required to run the code. Just wonder, if I don't have segmentation masks of malignant and benign lesions, how can I train and test your model on my own images? Waiting and appreciate for your response.

ROI

Hi, I am wondering is there any gradient flow in the retrieve_roi process?

The reason set the classification as a 2 class multi-label problem

Hi, thank you very much for your sharing.
I am newer to mammography classification.
I have read several papers but some of them treat this problem as 2-class classification problem.
May I know the reason why this paper utilizing multi-label to deal with this problem?

Thank you very much.

Request for help regarding how to implement the article titled: GMIC

How to evaluate the model performance?

Potential issue with dataloading flips

Hi,

Thanks a lot for the nice codebase, I'm trying to run inference on my own dataset and I'm seeing poor performance. I see that in the run_model script, horizontal_flip is always set to false (a boolean) 1, but in flip_image it is checked against a string 2 ('YES' or 'NO') -- is this intended behaviour?

how to create exam for data list?

as title

How to run the code with my own images?

Request for help regarding how to implement the article titled: GMIC

Greetings and Regards

My name is Mohsen Rostami, a final semester student of computer engineering majoring in artificial intelligence

Sorry for disturbing your time, I will give you my brief explanation

Due to my great interest in artificial intelligence and studying in this field, I have been spending some time on the article that you have designed and I am learning and reviewing it, but unfortunately, due to the fact that my scientific level is preliminary, I have not yet succeeded in I have not found the dataset of your article and how to implement it
Therefore, I am asking you to guide me on how to receive and download the datasets introduced in the article, and also please tell me how to label the images and the codes with which we can link this dataset to the main program code and load it. do Please send and guide me so that I can also learn useful things from my scientist and be grateful to you.
Thank you for applying using me

Tool for running the code

I want to reproduce the results of your paper. I have used the google colab but there are lots of issues. Can you plz assist me with which tool you have used for implementation?

Potential issue in patch map display

First congratulations for the great project!
I tried to run the run.sh file with the --visualization-flag on, the resulting patch maps were always aligned to the left border of the image and do not correspond to the activated regions in the heat maps.

I haven't figured out if this is a problem only in the visualization or it also affects the prediction accuracy.
Thank you very much!

How to interpret GMIC's prediction result on CBIS-DDSM ?

Hi. We tried to reproduce the result described in the GMIC paper on CBIS-DDSM, but did not seem to make it. Below are what we did:

We downloaded CBIS-DDSM from https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM, along with the four csv files (Mass/Calc, Training/Test)
The paper mentions that GMIC was evaluated on only a subset of CBIS-DDSM, which contains 188 exams defined by Shen et al. We identified and extracted this subset.
The sample_data/images contains 4 exams each of which includes 4 the original mammography images (L-CC, L-MLO, R-CC, R-MLO). Specifically, 0_R-CC, 0_R-MLO, 2_R-CC, 2_R-MLO have a benign_label of 1; 1_R-CC, 1_R-MLO, 3_L-CC, 3_L-MLO have a malignant_label of 1. To satisfy this configuration, we selected four exams, from the 188-exams subset, to have the same configuration. As a result, the selected four exams were P_02409, P_00146, P_01678, P_01669. The images were in DICOM format.
We used the python code snippet described in the metarepository's README to convert DICOM to PNG. The bitdepth parameter was set to 16. https://github.com/nyukat/mammography_metarepository#images
After DICOM-to-PNG conversion, we replaced the corresponding png files in sample_data/images by the converted png files of the four selected exams from CBIS-DDSM:
- 0_L-CC: Unaltered
- 0_L-MLO: Unaltered
- 0_R-CC: Replaced by CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Calc-Training_P_02409_RIGHT_CC/08-07-2016-DDSM-41108/1.000000-full mammogram images-67359/1-1.png
- 0_R-MLO: Replaced by CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Calc-Training_P_02409_RIGHT_MLO/08-07-2016-DDSM-46691/1.000000-full mammogram images-54510/1-1.png
- 1_L-CC: Unaltered
- 1_L-MLO: Unaltered
- 1_R-CC: Replaced by P_00146 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Mass-Training_P_00146_RIGHT_CC/07-20-2016-DDSM-61365/1.000000-full mammogram images-07790/1-1.png
- 1_R-MLO: Replaced by P_00146 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Mass-Training_P_00146_RIGHT_MLO/07-20-2016-DDSM-90212/1.000000-full mammogram images-33341/1-1.png
- 2_L-CC: Unaltered
- 2_L-MLO: Unaltered
- 2_R-CC: Replaced by P_01678 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Calc-Training_P_01678_RIGHT_CC/08-07-2016-DDSM-63063/1.000000-full mammogram images-39590/1-1.png
- 2_R-MLO: Replaced by P_01678 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Calc-Training_P_01678_RIGHT_MLO/08-07-2016-DDSM-33342/1.000000-full mammogram images-59283/1-1.png
- 3_L-CC: Replaced by P_01669 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Mass-Training_P_01669_LEFT_CC/07-20-2016-DDSM-68732/1.000000-full mammogram images-80465/1-1.png
- 3_L-MLO: Replaced by P_01669 CBIS-DDSM-All-doiJNLP-zzWs5zfZ/CBIS-DDSM/Mass-Training_P_01669_LEFT_MLO/07-20-2016-DDSM-14752/1.000000-full mammogram images-57568/1-1.png
- 3_R-CC: Unaltered
- 3_R-MLO: Unaltered.
  Note that eight files remained unaltered because their benign_label and malignant_label are both 0, and CBIS-DDSM has no normal images to substitute. Here is a snapshot of the 16 input image: https://freeimage.host/i/irlidu
We executed run.sh, and then got the output predictions.csv:

image_index	benign_pred	malignant_pred	benign_label	malignant_label
0_L-CC	0.1356	0.0081	0	0
0_R-CC	0.1747	0.0323	1	0
0_L-MLO	0.2368	0.0335	0	0
0_R-MLO	0.0696	0.0104	1	0
1_L-CC	0.0508	0.0144	0	0
1_R-CC	0.0515	0.0087	0	1
1_L-MLO	0.0545	0.0154	0	0
1_R-MLO	0.1115	0.0149	0	1
2_L-CC	0.0746	0.0160	0	0
2_R-CC	0.0809	0.0228	1	0
2_L-MLO	0.0953	0.0086	0	0
2_R-MLO	0.1155	0.0168	1	0
3_L-CC	0.2134	0.0407	0	1
3_R-CC	0.2945	0.2116	0	0
3_L-MLO	0.1639	0.0165	0	1
3_R-MLO	0.0722	0.0303	0	0

We were confused by the above result. The eight CBIS-DDSM-substituted images had very low probability values for both benign_pred and malignant_pred. For instance,
- 0_R-CC and 0_R-MLO have a benign_label of 1, but their benign_pred values are just 0.1747 and 0.0696.
- 3_L-CC and 3_L-MLO have a malignant_label of 1, but their malignant_pred values are just 0.0407 and 0.0165.

We wonder which part went wrong?

The five pretrained models provided in the models directory were trained on the NYUCBS dataset, which is proprietary and thus unavailable to us. Do we have to retrain GMIC on CBIS-DDSM in order to get good result on CBIS-DDSM? If so, how to perform re-training? Where can we find the code to retrain?

Thank you.

GMIC- Segmentation

Hello, I`m testing the results from the code for new images in a federal brazilian hospital. When i download the samples from git, it comes already cropped at sample images folder and besides that it looks like has a modification on histograms. Is this an error, the images wouldn´t come without cropping? Because new pacient´s samples images that could be introduced in the same place that is in the sample images folder comes with segmentation folder include before running the code. And when it replaced with new images for run the code again, the folder segmentation, not appear for the new sample images from my own.
Thanks for your attention,

Andressa.

Discrepancies between readme and example files

Hi!
I was trying to reproduce the results for the shared example images, but it seems that the directory "sample_data" wasn't updated properly.
The readme says:
"As a part of this repository, we provide 4 sample exams (in sample_data/cropped_images directory and exam list stored in sample_data/data.pkl), each of which includes 2 CC view images and 2 MLO view images.",
but there's no cropped_images folder in sample_data.
Moreover it seems to me that the images stored in "sample_data/images" are not the "original" ones but the cropped ones. But the exam_list_before_cropping.pkl file doesn't include the "best center" coordinates. I tried using them anyway as if those were the originals because the cropping and best center steps just use the "breast region" of the image, but I couldn't reproduce your results. Maybe applying the erosion and dilation steps over the already cropped image is not allowing me to get the rightmost and bottomost pixels positions in the right manner and then my extraction of the best center is differing from your computation.
If you could please upload the complete versions of the images in sample_data/images, or the exam_list.pkl file with the best_centers coordinates included, I would really appreciate it.
Thank you in advance!

SystemExit: 2

thnx 4 sharing yr helpful code I installed all dependencies from requirements.txt & also I'm in the project directory when running bash run.sh
usage: ipykernel_launcher.py [-h] --model-path MODEL_PATH --data-path
DATA_PATH --image-path IMAGE_PATH
--segmentation-path SEGMENTATION_PATH
--output-path OUTPUT_PATH
[--device-type {gpu,cpu}]
[--gpu-number GPU_NUMBER]
[--model-index MODEL_INDEX]
[--visualization-flag]
ipykernel_launcher.py: error: the following arguments are required: --model-path, --data-path, --image-path, --segmentation-path, --output-path
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

plz help me!

segmentation images

how are you getting segmentation images in sample_data folder ?

Fine Tunning

Hi again!
In order to fine-tune your model with my own images, I would like to know the exact hyperparameter combination that was used in each of the five models from which you are sharing their weights.
In your paper, you present the range of values that were used in the random search, and that you chose the best-performing 5 models. Could you share those hyperparameters?
Thanks!

without segmentation path

hi,

How i can run the model without segmentation path and segmentation folder inside sample_data
I have cropped images and i want visualization only.
Is it possible ?

train from scratch

Hi,

Just wondering if the code for training can be released? and could you please provide some hint about how to train this model from scratch?

Thanks

modules.py error

After running breast_cancer_classifier, i copied cropped images, data.pkl and cropped_exam_list.pkl into GMIC/sample_data and then ran bash run.sh command. Getting below error

Traceback (most recent call last):
File "src/scripts/run_model.py", line 297, in
main()
File "src/scripts/run_model.py", line 293, in main
turn_on_visualization=args.visualization_flag,
File "src/scripts/run_model.py", line 253, in start_experiment
output_df = run_single_model(single_model_path, data_path, parameters, turn_on_visualization)
File "src/scripts/run_model.py", line 218, in run_single_model
output_df = run_model(model, exam_list, parameters, turn_on_visualization)
File "src/scripts/run_model.py", line 181, in run_model
output = model(tensor_batch)
File "/home/username/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/username/GMIC/src/modeling/gmic.py", line 123, in forward
small_x_locations = self.retrieve_roi_crops.forward(x_original, self.cam_size, self.saliency_map)
File "/home/username/GMIC/src/modeling/modules.py", line 339, in forward
assert h_h == h, "h_h!=h"
AssertionError: h_h!=h