The tigar-1 from xmeng34

TIGAR

"TIGAR" standing for Transcriptome-Intergrated Genetic Association Resource, which is developed using Python and BASH scripts. TIGAR can fit both Elastic-Net and nonparametric Beyesian model (Dirichlet Process Regression, i.e. DPR) for gene expression imputation, impute genetically regulated gene expression (GReX) from individual-level genotype data, and conduct transcriptome-wide association studies (TWAS) using both individual-level and summary-level GWAS data for univariate and multivariate phenotypes.

Software

Please add the executable file ./Model_Train_Pred/DPR to your linux ${PATH} directory. Assuming ~/bin/ is a directory added to your ${PATH} environmental variable, you can accomodate the following example command

cp ./Model_Train_Pred/DPR ~/bin/

BGZIP, TABIX, Python 3.5 and the following python libraries are required for running TIGAR

BGZIP: http://www.htslib.org/doc/bgzip.html
TABIX: http://www.htslib.org/doc/tabix.html
python 3.5
- dfply
- io
- subprocess
- multiprocess

Input file format

Example data provided here are generated artificially. All input files are tab delimited text files.

1. Gene Expression File (`./example_data/Gene_Exp.txt`)

First 5 columns specify chromosome number, gene start position, gene end position, target gene ID, gene name (optional, could be the same as gene ID).
Sample gene expression data start from the 6th column.

CHROM	GeneStart	GeneEnd	TargetID	GeneName	sample1	sample...
1	100	200	ENSG0000	X	0.2	...

2. Genotype File

vcf file (./example_data/example.vcf.gz)

Sorted by chromosome and base pair position, zipped by bgzip, and tabixed.
Example tabix commond, tabix -f -p vcf *.vcf.gz.
Genotype data start from the 10th column.
More information about VCF file format: http://www.internationalgenome.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-40/

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1	sample...
1	100	rs1	C	T	.	PASS	.	GT:DS	0/0:0.01	...

dosages file

The first 5 columns are of the same format as VCF file.
Dosage genotype data start from the 6th column.

CHROM	POS	ID	REF	ALT	sample1	sample...
1	100	rs1	C	T	0.01	...

3. PED File (`./example_data/example_PED.ped`)

More informationa bout PED file format: http://zzz.bwh.harvard.edu/plink/data.shtml#ped

FAM_ID	IND_ID	FAT_ID	MOT_ID	SEX	PHENO	COV1	COV...
11A	11A	X	X	1	0.2	0.3	...

4. Asso_Info file (`./example_data/Asso_Info_*.txt`)

Two columns with the first column specifying the Phenotype (P) and Covariate variables (C) from the PED file, and the second column specifying the corresponding variable names in the PED file. The variables specified in the Asso_Info file will be used in TWAS.

P	PHENO
C	COV1
C	COV2
C	SEX

5.Zscore File (`./example_data/CHR1_GWAS_Zscore.txt.gz`)

Sorted by chromosome and base pair position, zipped by bgzip, and tabixed
Example tabix commond, tabix -f -p vcf *_Zscore.txt.gz. The first 4 columns are of the same format as VCF file.

CHROM	POS	REF	ALT	Zscore
1	100	C	T	0.01

6. Genome block annotation file (`./example_data/block_annotation_EUR.txt`)

The block annotation file is a tab delimited text file with head row of CHROM Start End File, denoting the chromosome number, starting position, ending position, and corresponding reference VCF file name under specified --geno_path. Reference VCF files shall be of one per chromosome, or one for the whole genome-wide variants. Example block annotation file for European samples is provided ./TIGAR/example_data/block_annotation_EUR.txt.

CHROM	Start	End	File
1	100	20000	CHR1.vcf.gz

Block annotation files of other ethnicities can be adopted from the genome segmentation generated by LDetect, https://bitbucket.org/nygcresearch/ldetect-data/src/master/.

7. Gene Annotation File (`./example_data/Gene_annotation.txt`)

The same format as the first five columns of the Gene Expression File.

CHROM	GeneStart	GeneEnd	TargetID	GeneName
1	100	200	ENSG0000	X

8. Weight File used for TWAS with GWAS summary statistics (`./example_data/weight.txt`)

First 5 columns have to be of the following format, specifying chromosome number, base pair position, reference allele, alternative allele, and target gene ID.
The column ES (Effect Size) denotes the weights for this given SNP/TargetGene

CHROM	POS	REF	ALT	TargetID	ES
1	100	C	T	ENSG0000	0.2

Example Usage

Train gene expression imputation model

Train DPR imputation model

./TIGAR_Model_Train.sh --model DPR \
--Gene_Exp ${Gene_Exp_train_file} --train_sample ${train_sample_path} \
--chr 1 --train_dir ${train_dir} \
--geno_train vcf --FT DS \
--out ${out_prefix}

Train Elastic-Net imputation model

./TIGAR_Model_Train.sh --model elastic_net \
--Gene_Exp ${Gene_Exp_train_file} --train_sample ${train_sample_path} \
--chr 1 --train_dir ${train_dir} \
--geno_train vcf --FT DS \
--out ${out_prefix}

Predict GReX

./TIGAR_Model_Pred.sh --chr 1 \
--train_result_path ${train_result_path} \
--train_info_path ${train_info_path} \
--genofile_dir ${genofile_dir} \
--genofile_type vcf --Format GT \
--out ${out_prefix}

TWAS

Using individual-level GWAS data. Take the output *_GReX_prediction.txt from gene expression prediction as the input for --Gene_EXP here.

./TIGAR_TWAS.sh --asso 1 \
--Gene_EXP ${Gene_Exp_prediction_file} --PED ${PED} --Asso_Info ${asso_Info} \
--out ${out_prefix}

Using summary-level GWAS data. Take the output *_training_param.txt from imputation model training as the input Weight file here. The first five columns of the gene expression file will be taken as gene annotation file here for --Gene_anno. The same gene expression file can be used as input for --Gene_anno.

./TIGAR_TWAS.sh --asso 2 \
--Gene_anno ${Gene_anno_file} --Zscore ${Zscore} --Weight ${Weight} \
--Covar ${Ref_Covariance_file} --chr 22 \
--out ${out_prefix}

Generate reference covariance files

.TIGAR_Covar.sh --block ${block_annotation} \
--geno_path ${geno_path} --geno vcf \
--chr 22 --Format GT \
--out ${out_prefix}

Reference

Elastic Net: https://github.com/hakyimlab/PrediXcan
DPR: https://github.com/biostatpzeng/DPR

xmeng34 / tigar-1 Goto Github PK

tigar-1's Introduction

TIGAR

Software

Input file format

1. Gene Expression File (`./example_data/Gene_Exp.txt`)

2. Genotype File

3. PED File (`./example_data/example_PED.ped`)

4. Asso_Info file (`./example_data/Asso_Info_*.txt`)

5.Zscore File (`./example_data/CHR1_GWAS_Zscore.txt.gz`)

6. Genome block annotation file (`./example_data/block_annotation_EUR.txt`)

7. Gene Annotation File (`./example_data/Gene_annotation.txt`)

8. Weight File used for TWAS with GWAS summary statistics (`./example_data/weight.txt`)

Example Usage

Reference

tigar-1's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

xmeng34 / tigar-1 Goto Github PK

tigar-1's Introduction

TIGAR

Software

Input file format

1. Gene Expression File (./example_data/Gene_Exp.txt)

2. Genotype File

3. PED File (./example_data/example_PED.ped)

4. Asso_Info file (./example_data/Asso_Info_*.txt)

5.Zscore File (./example_data/CHR1_GWAS_Zscore.txt.gz)

6. Genome block annotation file (./example_data/block_annotation_EUR.txt)

7. Gene Annotation File (./example_data/Gene_annotation.txt)

8. Weight File used for TWAS with GWAS summary statistics (./example_data/weight.txt)

Example Usage

Reference

tigar-1's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. Gene Expression File (`./example_data/Gene_Exp.txt`)

3. PED File (`./example_data/example_PED.ped`)

4. Asso_Info file (`./example_data/Asso_Info_*.txt`)

5.Zscore File (`./example_data/CHR1_GWAS_Zscore.txt.gz`)

6. Genome block annotation file (`./example_data/block_annotation_EUR.txt`)

7. Gene Annotation File (`./example_data/Gene_annotation.txt`)

8. Weight File used for TWAS with GWAS summary statistics (`./example_data/weight.txt`)