GithubHelp home page GithubHelp logo

tigar-1's Introduction

TIGAR

"TIGAR" standing for Transcriptome-Intergrated Genetic Association Resource, which is developed using Python and BASH scripts. TIGAR can fit both Elastic-Net and nonparametric Beyesian model (Dirichlet Process Regression, i.e. DPR) for gene expression imputation, impute genetically regulated gene expression (GReX) from individual-level genotype data, and conduct transcriptome-wide association studies (TWAS) using both individual-level and summary-level GWAS data for univariate and multivariate phenotypes.

Software

  • Please add the executable file ./Model_Train_Pred/DPR to your linux ${PATH} directory. Assuming ~/bin/ is a directory added to your ${PATH} environmental variable, you can accomodate the following example command
cp ./Model_Train_Pred/DPR ~/bin/
  • BGZIP, TABIX, Python 3.5 and the following python libraries are required for running TIGAR
  1. BGZIP: http://www.htslib.org/doc/bgzip.html
  2. TABIX: http://www.htslib.org/doc/tabix.html
  3. python 3.5
    • dfply
    • io
    • subprocess
    • multiprocess

Input file format

Example data provided here are generated artificially. All input files are tab delimited text files.

1. Gene Expression File (./example_data/Gene_Exp.txt)

  • First 5 columns specify chromosome number, gene start position, gene end position, target gene ID, gene name (optional, could be the same as gene ID).
  • Sample gene expression data start from the 6th column.
CHROM GeneStart GeneEnd TargetID GeneName sample1 sample...
1 100 200 ENSG0000 X 0.2 ...

2. Genotype File

  1. vcf file (./example_data/example.vcf.gz)
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample...
1 100 rs1 C T . PASS . GT:DS 0/0:0.01 ...
  1. dosages file
  • The first 5 columns are of the same format as VCF file.
  • Dosage genotype data start from the 6th column.
CHROM POS ID REF ALT sample1 sample...
1 100 rs1 C T 0.01 ...

3. PED File (./example_data/example_PED.ped)

FAM_ID IND_ID FAT_ID MOT_ID SEX PHENO COV1 COV...
11A 11A X X 1 0.2 0.3 ...

4. Asso_Info file (./example_data/Asso_Info_*.txt)

  • Two columns with the first column specifying the Phenotype (P) and Covariate variables (C) from the PED file, and the second column specifying the corresponding variable names in the PED file. The variables specified in the Asso_Info file will be used in TWAS.
P PHENO
C COV1
C COV2
C SEX

5.Zscore File (./example_data/CHR1_GWAS_Zscore.txt.gz)

  • Sorted by chromosome and base pair position, zipped by bgzip, and tabixed
  • Example tabix commond, tabix -f -p vcf *_Zscore.txt.gz. The first 4 columns are of the same format as VCF file.
CHROM POS REF ALT Zscore
1 100 C T 0.01

6. Genome block annotation file (./example_data/block_annotation_EUR.txt)

  • The block annotation file is a tab delimited text file with head row of CHROM Start End File, denoting the chromosome number, starting position, ending position, and corresponding reference VCF file name under specified --geno_path. Reference VCF files shall be of one per chromosome, or one for the whole genome-wide variants. Example block annotation file for European samples is provided ./TIGAR/example_data/block_annotation_EUR.txt.
CHROM Start End File
1 100 20000 CHR1.vcf.gz

7. Gene Annotation File (./example_data/Gene_annotation.txt)

  • The same format as the first five columns of the Gene Expression File.
CHROM GeneStart GeneEnd TargetID GeneName
1 100 200 ENSG0000 X

8. Weight File used for TWAS with GWAS summary statistics (./example_data/weight.txt)

  • First 5 columns have to be of the following format, specifying chromosome number, base pair position, reference allele, alternative allele, and target gene ID.

  • The column ES (Effect Size) denotes the weights for this given SNP/TargetGene

CHROM POS REF ALT TargetID ES
1 100 C T ENSG0000 0.2

Example Usage

  • Train gene expression imputation model

Train DPR imputation model

./TIGAR_Model_Train.sh --model DPR \
--Gene_Exp ${Gene_Exp_train_file} --train_sample ${train_sample_path} \
--chr 1 --train_dir ${train_dir} \
--geno_train vcf --FT DS \
--out ${out_prefix}

Train Elastic-Net imputation model

./TIGAR_Model_Train.sh --model elastic_net \
--Gene_Exp ${Gene_Exp_train_file} --train_sample ${train_sample_path} \
--chr 1 --train_dir ${train_dir} \
--geno_train vcf --FT DS \
--out ${out_prefix}
  • Predict GReX
./TIGAR_Model_Pred.sh --chr 1 \
--train_result_path ${train_result_path} \
--train_info_path ${train_info_path} \
--genofile_dir ${genofile_dir} \
--genofile_type vcf --Format GT \
--out ${out_prefix}
  • TWAS

Using individual-level GWAS data. Take the output *_GReX_prediction.txt from gene expression prediction as the input for --Gene_EXP here.

./TIGAR_TWAS.sh --asso 1 \
--Gene_EXP ${Gene_Exp_prediction_file} --PED ${PED} --Asso_Info ${asso_Info} \
--out ${out_prefix}

Using summary-level GWAS data. Take the output *_training_param.txt from imputation model training as the input Weight file here. The first five columns of the gene expression file will be taken as gene annotation file here for --Gene_anno. The same gene expression file can be used as input for --Gene_anno.

./TIGAR_TWAS.sh --asso 2 \
--Gene_anno ${Gene_anno_file} --Zscore ${Zscore} --Weight ${Weight} \
--Covar ${Ref_Covariance_file} --chr 22 \
--out ${out_prefix}
  • Generate reference covariance files
.TIGAR_Covar.sh --block ${block_annotation} \
--geno_path ${geno_path} --geno vcf \
--chr 22 --Format GT \
--out ${out_prefix}

Reference

tigar-1's People

Contributors

xmeng34 avatar yjingj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.