GithubHelp home page GithubHelp logo

kewalinsamart / mtm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yangence/mtm

0.0 0.0 0.0 648.75 MB

A unified multi-task learning framework for prediction of individualized tissue gene expression profiles.

License: GNU General Public License v3.0

Python 100.00%

mtm's Introduction

MTM

Introduction

MTM (Multi-tissue Transcriptome Mapping) is a unified deep multi-task learning approach to predict tissue-specific gene expression profiles using any available tissue expression profile (such as blood gene expression) from the same donor.

Requirements

  • Python 3.8
  • PyTorch 1.10.2
  • Numpy 1.20.3
  • Pandas 1.2.4
  • scikit-learn 0.24.2

Pre-processing

Download data from GTEx Portal:

  • expression data: GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz
  • sample attributes: GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt

Filter the downloaded data (expression data and sample attribute data) based on:

  • tissue types: select tissues with at least 50 samples
  • individuals: select individuals with at least 2 tissue samples
  • genes: select genes of interests (for example, protein-coding genes)

The donor id should be added to the sample attributes file as the 'Subject_id' column. The filtered data should be saved as tab-delimited text files for model training, including:

  • expr: expression data, row - sample, column - gene
  • sample_attr: sample attributes, row - sample, column - attributes, including tissue type (the 'SMTSD' column) and individual id (the 'Subject_id' column)
  • gene_id: filtered gene ids
  • indiv_id: filtered individual ids
  • tissue_type: filtered tissue types

Train

Train the MTM model to learn the mapping between diffirent tissues with GTEx data:

Example:

python train.py \
    --input_dir ../input_dir \
    --expr GTEx_expr.txt \
    --sample_attr GTEx_sample_attributes.txt \
    --gene_id GTEx_gene_id.txt \
    --indiv_id GTEx_individual_id.txt \
    --tissue_type GTEx_tissue_type.txt \
    --device "cuda:0" \
    --output_dir ../output_dir

The trained model will be saved in the ../${output_dir}/models directory. The individuals are randomly split for training (80%) and evaluation (20%), and the individual ids are saved in the ../${output_dir}/data_split directory.

Predict

Utilize the trained MTM model to predict tissue-specific gene expression profiles on unseen individuals.

Example

Prepare the input expression profiles of a specific tissue type (source tissue), such as "Whole_Blood". We can make use of the GTEx expression data as the input expression profiles by the following steps:

  • Filter the GTEx expression data based on:
    • the selected genes (GTEx_gene_id.txt)
    • individuals (../${output_dir}/data_split/val_indivs.txt)
    • tissue type (Whole_Blood)
  • Save the resulting filtered expression data as tab-delimited text file, such as GTEx_expr.val_set.Whole_Blood.txt.

Then, we can predict the expression profiles of another tissue type (target tissue), such as "Lung", with the trained MTM model (../${output_dir}/models/model_ckpt.tar), by the following command:

python predict.py \
    --expr GTEx_expr.txt \
    --sample_attr GTEx_sample_attributes.txt \
    --gene_id GTEx_gene_id.txt \
    --indiv_id GTEx_individual_id.txt \
    --tissue_type GTEx_tissue_type.txt \
    --input_expr GTEx_expr.val_set.Whole_Blood.txt \
    --input_tissue_type "Whole_Blood" \
    --output_tissue_type "Lung" \
    --model_path ../output_dir/models/model_ckpt.tar \
    --output_expr ../output_dir/predicted/GTEx_expr.val_set.Whole_Blood.to.Lung.txt

mtm's People

Contributors

kewalinsamart avatar averyhe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.