GithubHelp home page GithubHelp logo

novasmedley / deepradiogenomics Goto Github PK

View Code? Open in Web Editor NEW
12.0 2.0 2.0 3.29 MB

Methods for training and interpreting deep radiogenomic neural networks

License: GNU General Public License v3.0

Python 100.00%
neural-networks imaging-genomics radiogenonmics glioblastoma gene-expression-profiles autoencoder

deepradiogenomics's Introduction

Radiogenomic neural networks

deepRadiogenomics contains the source code to analyses in the paper:

Nova F Smedley, Suzie El-Saden, William Hsu, Discovering and interpreting transcriptomic drivers of imaging traits using neural networks, Bioinformatics, btaa126, https://doi.org/10.1093/bioinformatics/btaa126

Note: an updated version has been made and will be added in the future.

Repo contents

  • data full gbm dataset used in analysis - to be released
  • demo_data toy data
  • R: scripts for most post-modeling analyses, association testing, etc.
  • general modeling functions:

    • neuralnet.py
    • other_models_utils.py
    • bootstrap.py
    • custom_callbacks.py
    • sparse_x_generator.py
  • glioblastoma (gbm) specific functions:

    • training:
      • setup.py
      • train_gene_ae.py deep transcriptomic autoencoder
      • train_nn.py supervised radiogenomic neural network
      • train_others.py comparative models (logit, gbt, rf, svm)
    • extract radiogenomic associations
      • gene_masking.py
      • get_masking.py
      • gene_saliency.py
      • get_saliency.py
    • misc.
      • parse_cv.py
      • demo_gene_masking.py
      • demo_gene_saliency.py
    • all others, see R

Data

All data was originally taken from public repositories, where identifiable information was scrubbed.

  • Transcriptomic data was downloaded from the legacy version of The Cancer Genome Archive (TCGA).

  • Imaging studies were download from from The Cancer Imaging Archive (TCIA). Vasari traits were annotated by Dr. Suzie El-Saden and based on pre-operative magnetic resonance imaging studies.

    • Vasari MR Feature Guide_v1.1.pdf Vasari guidelines for imaging annotations
    • Our annotation form was based on the Round 2 Google Form used by Vasari Project

Datasets are available in the data folder.

For more details, see our paper.

Install

  • Neural networks were trained on Amazon Web Services using Deep Learning AMI with Ubuntu 16.04.4 LTS and the tensorflow_p36 environment. All other classifiers were implemented on an Ubuntu 18.04.1 LST machine.

    • Check out AWS's environment documentation.

    • Python 3.6 dependencies (training of comparative models, gene masking, gene saliency):

      keras 2.2.2
      keras-vis 0.4.1
      numpy 1.14.3
      pandas 0.23.0
      scikit-learn 0.20.0
      scipy 1.1.0
      seaborn 0.8.1
      tensorflow 1.10.0
      xgboost 0.80
      
    • R 3.4.4 dependencies (gene set enrichment analysis, but mostly figure generation):

      awtools 0.1.0
      broom 0.5.1
      cowplot 0.9.3
      data.table 1.11.8
      doParallel 1.0.14
      dplyr 0.7.6
      egg 0.4.0
      fgsea 1.4.1
      foreach 1.4.4
      ggrepel 0.8.0
      ggplot2 3.1.0
      grid 3.4.4
      gridExtra 2.3
      ggpubr 0.2
      pheatmap 1.0.12
      plyr 1.8.4
      qvalue 2.10.1
      rcartocolor 1.0.0
      RColorBrewer 1.1
      reshape2 1.4.3
      scales 1.0.0
      tidyr 0.8.1
      tidyverse 1.2.1
      viridis 0.5.1
      wesanderson 0.3.6
      
  • install from Github using git:

    git clone https://github.com/novasmedley/deepRadiogenomics.git
    

Usage

Demos were run using demo data, a small subset of the published dataset, on Ubuntu 18.04.1 LTS with 15.5 GB memory. It has also been tested on macOS 10.14.5.

Neural network pipeline:

  1. Train gene expression autoencoder (ae) - cross-validation(cv), 15 secs:

    $ python3 train_gene_ae.py --exp ae_cv --dir demo --data demo_data \
    --label autoencoder --predType regression --loss mae --opt Nadam --act tanh \
    --h1 200 --h2 100 --h3 50 --epoch 2 --folds 2 --patience 2
    
  2. (optional) Parse cv results: $ python3 parse_cv.py --dir demo/ae_cv --model nn

  3. Retrain ae - 15 secs:

    run train_gene_ae.py from Step 1 except: change --exp ae_retrain and add --retrain 1

  4. Train radiogenomic model - cv, 21 secs:

    $ python3 train_nn.py --exp nn_cv --dir demo --data demo_data \
    --pretrain demo_results/ae_retrain/autoencoder/neuralnets/200_100_50_0_0_tanh_decay_0_drop_0_opt_Nadam_loss_mae_bat_10_eph_2 \
    --label f5 --opt Nadam --act tanh \
    --h1 200 --h2 100 --h3 50 --epoch 2 --folds 2 --patience 2 --freeze 0 --num_ae_layers 3
    
  5. (optional) Parse cv results: $ python3 parse_cv.py --dir demo/nn_cv --model nn

  6. Retrain radiogenomic model - 16 secs:

    run train_nn.py from Step 4 except: change --exp nn_retrain and add --retrain 1

  7. Gene masking - 11 secs

    $ python3 demo_gene_masking.py --label f5 --geneset verhaak --cpus 7

  8. Gene saliency - 19 secs

    $ python3 demo_gene_saliency.py

Train other models:

  1. Fit logit with l1 regularization - 2 secs, fitting 1000 hyperparameters

    $ python3 train_others.py --exp other_cv --dir demo --data demo_data \
    --dataType vasari --predType binaryClass --label f5 --model logit1 --folds 2 --cpus 7
    
  2. Parse cv results:

    $ python3 parse_cv.py --dir demo/other_cv --model other

Citation

If you want to cite this work, please cite the paper:

TBA!

and the repo:

@misc{smedleyRadiogenomics,
  title={deepRadiogenomics},
  author={Smedley, Nova F},
  year={2019},
  publisher={GitHub},
  howpublished={\url{https://github.com/novasmedley/deepRadiogenomics}},
}

deepradiogenomics's People

Contributors

novasmedley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.