GithubHelp home page GithubHelp logo

epla's Introduction

Development and interpretation of a pathomics-based model for the prediction of microsatellite instability in Colorectal Cancer

This package provides an implementation of the prediction of microsatellite instability in whole slide imaging of Colorectal Cancer patients using deep learning

Citation

Cao R, Yang F, Ma SC, Liu L, Zhao Y, Li Y, Wu DH, Wang T, Lu WJ, Cai WJ, Zhu HB, Guo XJ, Lu YW, Kuang JJ, Huan WJ, Tang WM, Huang K, Huang J, Yao J, Dong ZY. Development and interpretation of a pathomics-based model for the prediction of microsatellite instability in Colorectal Cancer. Theranostics 2020; 10(24):11080-11091. doi:10.7150/thno.49864. Available from http://www.thno.org/v10p11080.htm

Setup

Dependencies

Processing pipelines are implemented in python.

Python 3.6+

  • torch 1.1.0
  • torchvision 0.2.1
  • numpy 1.15.2
  • pandas 1.0.3
  • xgboost 0.90
  • pillow 5.3.0
  • scikit-learn 0.23.1
  • logging 0.5.1.2
  • joblib 0.15.1
  • pickle 4.0

Requirements to run the algorithm and average test time

  • 0.5118 s/patient using Nvidia GPUs (P40)
  • 20.9291 s/patient using regular CPU machines

Data

Data are in png format, and the patient info table is in csv format. Results will be saved in csv format.

Input patient info table format

A csv file in the following formate is needed for prediction:

patient_ID
TCGA-AA-3812-01
TCGA-AA-A00E-01
TCGA-AA-A01Q-01
TCGA-AA-A02R-01

If you use other kind of names, please change related codes. There are also some cropped patches in the folder to help you go through the whole process.

Model checkpoints

All models and feature matrix needed for processing the interpretation is stored in this fold.

  • dnnPatchClser.pt
  • bow.model
  • palhi.model
  • bow_feature.pkl
  • bow_tfidftransformer.pkl

Calculate the MSI likelihood of each patch (Step 1)

Prediction on the patch level code will make prediction on each patch. To run the code:

python dnn_patch_clser.py --model_folder=./models --test_data_search_path=./data --gt_table=./gt_tbl.csv --model_name=dnnPatchClser.pt --result_folder=./results

The model_folder is the folder that stores the model.

The result_folder is the folder that stores the results.

The test_data_search_path is the folder path that stores image data.

The gt_table is the path of patient info table and the format is shown above.

The model_name should be the checkpoint of the patch-level model.

A file named "pred.csv" will be saved containing the patch-level prediction results in result_folder. Use this file as the input of next step.

Prediction on the patient level (Step 2)

This prediction code will predict MSI probability for each patient.

PALHI (Step 2.1)

python palhi.py --model_folder=./models --llh_file=./results/pred.csv --log_file=PALHI.log --model_file=palhi.pickle.dat --result_folder=./results

The precess will be logged in log_file. The model needed is provided in this github and should be stored in the model_folder. A file named "PALHI_wsi_pred.csv" will be saved containing the patch-level prediction results in result_folder. Use this file as the input of Step3.

bow (Step 2.2)

python bow.py --model_folder=./models --llh_file=./results/pred.csv --log_file=bow.log --feature_file=bow_feature.pkl --tfidftransformer_file=bow_tfidftransformer.pkl --model_file=bow.pickle.dat --result_folder=./results

The precess will be logged in 'log_file'. The model needed is provided in this github and should be stored in the "model_folder". A file named "BOW_wsi_pred.csv" will be saved containing the patch-level prediction results. Use this file as the input of Step3.

Ensemble (Step 3)

python ensemble.py --PALHI_tbl=./results/PALHI_wsi_pred.csv --BOW_tbl=./results/BOW_wsi_pred.csv --result_folder=./results

The input "PALHI_tbl" and "BOW_tbl" is generated in Step1 and Step2. A file named "EPLA_output.csv" is generated as the final output saved in result_folder.

5 values are generated:

Value Explaination
Sample.ID The input patient ID
WSI.Score_x MSI probability from PALHI
WSI.pred_x MSI status from bow
WSI.Score_y MSI probability from bow
WSI.pred_y MSI status from bow
WSI.Score EPLA final MSI probability
WSI.pred EPLA final MSI status

The threshold is 0.5 by default and can be reset using optimal threshold.

Disclaimer

This tool is for research purpose and not approved for clinical use.

This is not an official Tencent product.

epla's People

Contributors

tencentailabhealthcare avatar yfzon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.