iGTEx

This project describes an automated pipeline to quantify isoform expressions for the GTEx V8 RNA sequencing data. The protocol uses XAEM, a powerful method for isoform expression estimation across multiple samples. You also can find the detailed information about XAEM website and in the published paper in Bioinformatics.

Prerequisites

R (recommended version >= 3.5.1)
Python (recommended version >= 3.7)

Installation

Step 1: Setup XAEM

Install the XAEM tool for this protocol via the bash command:

git https://github.com/ZhengCQ/iGTEx.git

wget https://github.com/ZhengCQ/iGTEx/archive/refs/tags/iGTEx_XAEM_v0.1.2.zip
unzip iGTEx_XAEM_v0.1.2.zip
ln -fs iGTEx-iGTEx_XAEM_v0.1.2 iGTEx_XAEM

Step 2: Setup R dependencies

In R, install the R dependencies via:

install.packages("foreach")
install.packages("doParallel")

Step 3: Download the annotation reference

Run the following commands to download the reference annotating the transcripts:

cd /path/to/iGTEx_XAEM
python down_ref.py

(Optional) To down a particular reference with gencode hg38, use:

python down_ref.py -db gencode_38

Example

An example is prepared in the project Example folder, executable as

cd /path/to/iGTEx_XAEM/Example
sh run_example.sh

Isoform Estimation using GTEx data

XAEM performs better when multiple samples of similar data type are considered. In the GTEx V8 data, for each tissue, we create a project directory:

mkdir -p /path/to/Tissue1
cd /path/to/Tissue1

Input files

In /path/to/Tissue1, create a file /path/to/Tissue1/infastq_lst.tsv listing the FASTQ input files. The file is a tab-delimited text file with 4 columns: Sample name, Source name, FASTQ file name for paired-end read 1, and FASTQ file name for paired-end read 2. Source name indicates the batch or sequencing library of the sample, so that the same sample may correspond to more than one sources. A standard example, where each sample has only a single batch or multiple batches, is given as /path/to/iGTEx_XAEM/Example/infastq_lst.tsv:

single batch

sample4 S0007   S0007_1.fg.gz   S0007_2.fg.gz
sample5 S0008   S0008_1.fg.gz   S0008_2.fg.gz

multiple batches

sample1 S0001   S0001_1.fg.gz   S0001_2.fg.gz
sample1 S0002   S0002_1.fg.gz   S0002_2.fg.gz
sample2 S0003   S0003_1.fg.gz   S0003_2.fg.gz
sample2 S0004   S0004_1.fg.gz   S0004_2.fg.gz
sample3 S0005   S0005_1.fg.gz   S0005_2.fg.gz
sample3 S0006   S0006_1.fg.gz   S0006_2.fg.gz

Run XAEM

XAEM can be easily run with:

python /path/to/iGTEx_XAEM/run_xaem.py -i /path/to/Tissue1/infastq_lst.tsv

(Optional) To specify a particular reference with gencode hg38, use:

--ref gencode_38

(Optional) To specify a particular output directory, use:

-o /path/to/Tissue1_output_directory

(Optional) Further customized configuration of XAEM can be setup by:

-c /path/to/Tissue1_config.ini

An example of the config.ini file can be found in /path/to/iGTEx_XAEM/.

Calculate isoform ratio

Isoform ratio can be easily run with

python /path/to/iGTEx_XAEM/exp2ratio.py -i /path/to/XAEM_isoform_expression.RData

(Optional) To specify a particular reference with gencode hg38, use:

--ref gencode_38

(Optional) The default output directory as same as input isoform file. To specify a particular output directory, use:

-o /path/to/Tissue1_output_directory

(Optional) To specify a covariates file for linear regression:

--covariates /path/to/covariates_file

(For GTEx) For GTEx v8, we have prepared the covariates information for calculation, you can specify a tissue name:

--tissue Brain_Amygdala

(For GTEx) For GTEx v8 demo:

python /path/to/iGTEx_XAEM/exp2ratio.py -i /path/to/Brain_Amygdala_XAEM_isoform_expression.RData --ref gencode_38 --tissue Brain_Amygdala

sQTL THISTLE

BOD file

/path/to/osca --efile /path/to/TissueName/isoform_splice_ratio.tsv --gene-expression --make-bod --no-fid --out TissueName 
/path/to/osca --befile TissueName --update-opi /path/to/iGTEx_XAEM/ref/gencode_38/anno_gene_info.opi

sQTL

/path/to/osca --sqtl --bfile /path/to/Genotype/BED_All/TissueName_Genotype --befile TissueName --maf 0.05 --call 0.85 --cis-wind 1000 --thread-num 10 --task-num 1 --task-num 1 --task-id 1 --to-smr --bed /path/to/iGTEx_XAEM/ref/gencode_38/anno_gene_info.bed --out sQTL_results/TissueName

zhengcq / igtex Goto Github PK

igtex's Introduction

iGTEx

Prerequisites

Installation

Step 1: Setup XAEM

Step 2: Setup R dependencies

Step 3: Download the annotation reference

Example

Isoform Estimation using GTEx data

Input files

single batch

multiple batches

Run XAEM

Calculate isoform ratio

sQTL THISTLE

BOD file

sQTL

igtex's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs