GithubHelp home page GithubHelp logo

pablojrios / kaggle_rsna_breast_cancer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dangnh0611/kaggle_rsna_breast_cancer

0.0 0.0 0.0 13.31 MB

1st place of Kaggle's RSNA Screening Mammography Breast Cancer Detection competition

License: MIT License

Dockerfile 0.05% Python 91.90% Shell 0.41% JavaScript 0.04% C++ 3.27% CMake 0.06% Java 0.18% Makefile 0.01% CSS 0.01% Jupyter Notebook 3.73% Cuda 0.34%

kaggle_rsna_breast_cancer's Introduction

Below you can find an outline of how to reproduce my solution for the RSNA Screening Mammography Breast Cancer Detection competition. If you run into any trouble with the setup/code or have any questions please contact me at [email protected].

Solution write up: https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/392449

Notes:

Please download those trained models and put in assets/trained/:

# this assume that kaggle api is installed: https://github.com/Kaggle/kaggle-api
kaggle datasets download -d dangnh0611/rsna-breast-cancer-detection-best-ckpts -p assets/trained
unzip rsna-breast-cancer-detection-best-ckpts.zip -d assets/trained/
rm assets/trained/rsna-breast-cancer-detection-best-ckpts.zip

TABLE OF CONTENTS

1. ARCHIVE CONTENTS

  • assets: contain neccessary data files, trained models
    • assets/data/: csv label for external datasets (BMCD and CMMD), breast ROI box annotation in YOLOv5 format
    • assets/public_pretrains/: publicly available pretrains
    • assets/trained/: trained models, used for winning submission
  • datasets/: where to store datasets (competition + external), expected to contain both raw and cleaned version.
    • datasets/raw/: raw version of competion data + all external datasets: BMCD, CDD-CESM, CMMD, MiniDDSM, Vindr. For how to correctly structure datasets, please refer to docs/DATASETS.md
  • docker/: Dockerfile
  • docs/: documentations
  • src/: contain almost source code for this project
    • src/roi_det: for training breast ROI detection model (YOLOX)
    • src/pytorch-image-models: for training classification model (Convnext-small)
    • src/submit: code to generate predictions (submission)
    • src/tools: contain python scripts, bash scripts to prepair datasets, training and convert models,..
    • src/utils: Utilities for dicom processing,..
  • SETTINGS.json: define relative paths for IO

SETTINGS.json defines base paths for IO:

  • RAW_DATA_DIR: Where to store raw dataset, including both competition dataset and external datasets.
  • PROCESSED_DATA_DIR: Where to store processed/cleaned datasets
  • MODEL_CHECKPOINT_DIR: Store intermediate checkpoints during training
  • MODEL_FINAL_SELECTION_DIR: Where to store final (best) models used for submission
  • SUBMISSION_DIR: Where to store final submission/inference results
  • ASSETS_DIR: Store trained models, manually annotated datasets/files. This must not be changed and define here for easier looking up only.
  • TEMP_DIR: Where to store intermediate results/files

2. HARDWARE

The following machine were used to create the final solution: NVIDIA DGX A100. Most of my experiments can be done using 1-3 A100 GPUs. However, final results can be easily reproduced using a single A100 GPU (40GB GPU Memory).

3. DATA SETUP

Refer to docs/DATASETS.md for details on how to correctly setup datasets.

4. SOLUTION PIPELINE

There are some stages to reproduce the entire solutions. I will briefly describe it for easier further understanding.

  1. Train a YOLOX on some of competition images for breast ROI detection
    • Convert competition dicom files to 8-bits png images
    • Convert detection labels in YOLOv5 format to COCO format (YOLOX accepts COCO format without any modifications)
    • Train a YOLOX-nano 416x416 model on those images (521 train images, 50 val images)
    • Convert trained YOLOX model from Torch to TensorRT engine.
  2. Using trained YOLOX TensorRT engine to crop breast ROI region, save to disk as 8-bits pngs
    • Clean and re-structure raw datasets (competition data + external data) in an unified way (standardize the format/structure)
    • Dicom decoding --> ROI detection (YOLOX) --> ROI crop --> normalization --> save to disk
  3. Train Convnext-small model for classification using those saved ROI images
    • Do a 4-folds splits on competition data.
    • Train 4 Convnext-small model on each folds
    • Select best checkpoint for each fold
    • Convert those models from Torch to TensorRT
  4. Inference on test data (submission)

5. SOLUTION REPRODUCING

All the following instructions assume that datasets (competition + external data) are all set up. There are 4 options to reproduce the solutions:

  1. Use trained models

    • No training, just use trained models in assets/trained to make predictions
  2. Do not re-train YOLOX, fully reproduce Convnext-small classification models

    • Skip re-train the YOLOX part, use (my) trained YOLOX for further steps
    • Re-train 4x Convnext-small classification models. This part can be 100% reproduced (give you identical models/training log/result) without any randomness.
    • This method should give 100% identical score on both CV/LB/PB
  3. Re-train all parts (reproduce from scratch)

    • Won't use any of (my) trained models in any parts, but re-train all of theme from scratch
    • This may not give 100% identical results/scores. The reason is that YOLOX can't be fully reproduced to get EXACTLY same model as used in winning submission. More details here
    • Note that dataset used for training Convnext-small classification models is generated base on YOLOX's prediction, so changes in YOLOX will cause changes in Convnext-small classification models --> Convnext-small classification models will also be unreproducible (in a 100% way).
    • But in general, it should give nearly identical results/scores within a reasonable margin.

5.1. Use trained models to make predictions

5.1.1. Convert trained YOLOX to TensorRT

A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:

PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/trt.py \
    -expn trained_yolox_nano_416_to_tensorrt \
    -f src/roi_det/YOLOX/exps/projects/rsna/yolox_nano_bre_416.py \
    -c assets/trained/yolox_nano_416_roi_torch.pth \
    --save-path assets/trained/yolox_nano_416_roi_trt.pth \
    -b 1

Behaviors:

  • Create new directory {MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/.
  • The converted YOLOX TensorRT engine will also be saved to ./assets/trained/yolox_nano_416_roi_trt.pth

5.1.2. Convert trained 4 x Convnext-small models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode trained

Behaviours: Save a 4-folds combined TensorRT engine to ./assets/trained/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.1.3. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode trained --trt

Behaviours:

  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv

5.2. Keep trained YOLOX, re-train Convnext-small classification models

5.2.1. Convert trained YOLOX to TensorRT

A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:

PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/trt.py \
    -expn trained_yolox_nano_416_to_tensorrt \
    -f src/roi_det/YOLOX/exps/projects/rsna/yolox_nano_bre_416.py \
    -c assets/trained/yolox_nano_416_roi_torch.pth \
    --save-path assets/trained/yolox_nano_416_roi_trt.pth \
    -b 1

Behaviors:

  • Create new directory {MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/.
  • The converted YOLOX TensorRT engine will also be saved to ./assets/trained/yolox_nano_416_roi_trt.pth

5.2.2. Prepair datasets to train classification models

python3 src/tools/prepair_classification_dataset.py --num-workers 8 --roi-yolox-engine-path assets/trained/yolox_nano_416_roi_trt.pth

Behaviors:

  • Create a stage1_images in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images for the intermediate stage.
  • Create a new directory {PROCESSED_DATA_DIR}/classification/ contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/ and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv for each dataset.

5.2.3. Perform 4-folds splitting on competition data

python3 src/tools/cv_split.py

Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/

5.2.4. Training 4 x Convnext-small classification models

python3 src/tools/make_train_bash_script.py --mode fully_reproduce

This will save a file named _train_script_auto_generated.sh in current directory, which include commands and instructions to train Convnext-small classification models. To reproduce using single GPU, simply run

sh ./_train_script_auto_generated.sh

This could take 8 days to finish training (around 2 days for each fold).

Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script _train_script_auto_generated.sh and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training

Behaviours:

  • This assumes that directory {MODEL_CHECKPOINT_DIR}/timm_classification/ is empty before start any train commands
  • Saving checkpoints/logs to {MODEL_CHECKPOINT_DIR}/timm_classification/, contains 6 sub-directories named
    • fully_reproduce_train_fold_2
    • fully_reproduce_train_fold_3
    • stage1_fully_reproduce_train_fold_0
    • stage1_fully_reproduce_train_fold_1
    • stage2_fully_reproduce_train_fold_0
    • stage2_fully_reproduce_train_fold_1

5.2.5. Checkpoints selection

python3 src/tools/select_classification_best_ckpts.py --mode fully_reproduce

Behaviours:

  • This could overwrite convnext checkpoint files in {MODEL_FINAL_SELECTION_DIR}/
  • Select and copy the 4 best checkpoints for each folds to {MODEL_FINAL_SELECTION_DIR}/:
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar

5.2.6. Convert selected best Convnext models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode reproduce

Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.2.7. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode partial_reproduce --trt

Behaviours:

  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv

5.3. Re-train all parts from scratch

5.3.1. Prepair dataset for training YOLOX ROI detector

python3 src/tools/prepair_roi_det_dataset.py --num-workers 4

Behaviors:

  • Copy mannual annotated breast ROI box in YOLOv5 format from ./assets/data/roi_det_yolov5_format/ to {PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/
  • Decode 571 dicom files in competition dataset to 8-bits png, stored at {PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/images/
  • Convert from YOLOv5 format to COCO format, stored at {PROCESSED_DATA_DIR}/roi_det_yolox/coco_format/

5.3.2. Retrain YOLOX for breast ROI detection

sh src/tools/train_and_convert_yolox_trt.sh

Behaviors:

  • Train YOLOX, saving checkpoints to {MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
  • (Optional) Perform evaluation on best checkpoint, print results
  • Convert newly trained best checkpoint to TensorRT, stored in {MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
  • Copy best Torch checkpoint to {MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_torch.pth
  • Copy the converted best TensorRT engine in previous step to {MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_trt.pth

5.3.3. Prepair datasets to train classification models

This will use newly trained YOLOX in previous step as breast ROI extractor.

python3 src/tools/prepair_classification_dataset.py --num-workers 8

Behaviors:

  • Create a stage1_images in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images for the intermediate stage.
  • Create a new directory {PROCESSED_DATA_DIR}/classification/ contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/ and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv for each dataset.

5.3.4. Perform 4-folds splitting on competition data

python3 src/tools/cv_split.py

Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/

5.3.5. Training 4 x Convnext-small classification models

python3 src/tools/make_train_bash_script.py --mode fully_reproduce

This will save a file named _train_script_auto_generated.sh in current directory, which include commands and instructions to train Convnext-small classification models. To reproduce using single GPU, simply run

sh ./_train_script_auto_generated.sh

This could take 8 days to finish training (around 2 days for each fold).

Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script _train_script_auto_generated.sh and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training

Behaviours:

  • This assumes that directory {MODEL_CHECKPOINT_DIR}/timm_classification/ is empty before start any train commands
  • Saving checkpoints/logs to {MODEL_CHECKPOINT_DIR}/timm_classification/, contains 6 sub-directories named
    • fully_reproduce_train_fold_2
    • fully_reproduce_train_fold_3
    • stage1_fully_reproduce_train_fold_0
    • stage1_fully_reproduce_train_fold_1
    • stage2_fully_reproduce_train_fold_0
    • stage2_fully_reproduce_train_fold_1

5.3.6. Checkpoints selection

python3 src/tools/select_classification_best_ckpts.py --mode fully_reproduce

Behaviours:

  • This could overwrite convnext checkpoint files in {MODEL_FINAL_SELECTION_DIR}/
  • Select and copy the 4 best checkpoints for each folds to {MODEL_FINAL_SELECTION_DIR}/:
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar

5.3.7. Convert selected best Convnext models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode reproduce

Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.3.8. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode reproduce --trt

Behaviours:

  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv

kaggle_rsna_breast_cancer's People

Contributors

dangnh0611 avatar pablojrios avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.