GithubHelp home page GithubHelp logo

greenelab / tad_pathways_pipeline Goto Github PK

View Code? Open in Web Editor NEW
11.0 7.0 5.0 2.3 MB

Pipeline to implement a "TAD_Pathways" analysis. Discover candidate genes based on association signals in TADs

License: MIT License

Shell 13.33% Python 51.24% R 35.43%
gwas candidate-genes tad analysis methodology workflow pathways tads

tad_pathways_pipeline's Introduction

TAD_Pathways

Leveraging TADs to identify candidate genes at GWAS signals

Gregory P. Way, Casey S. Greene, and Struan F.A. Grant - 2017

DOI

Summary

The repository contains data and instructions to implement a "TAD_Pathways" analysis for over 300 different trait/disease GWAS or custom SNP lists.

TAD_Pathways uses the principles of topologically association domains (TADs) to define where an association signal (typically a GWAS signal) can most likely impact gene function. We use TAD boundaries as defined by Dixon et al. 2012 and hg19 Gencode genes to identify which genes may be implicated. We then perform an overrepresentation pathway analysis to identify significantly associated pathways implicated by the input TAD-defined geneset.

For more specific details about our method, refer to our short report at the European Journal of Human Genetics.

We also present a 6 minute video introducing the method and discussing the experimental validation at EJHG-tube.

Setup

First, clone the repository and navigate into the top directory:

git clone [email protected]:greenelab/tad_pathways_pipeline.git
cd tad_pathways_pipeline

Before you begin, download the necessary TAD based index files and GWAS curation files and setup python environment:

bash initialize.sh

# Using conda version 4.4.11
conda activate tad_pathways

Now, a TAD_Pathways analysis can proceed. Follow an example pipeline to work from an existing GWAS or the custom pipeline example for insight on how to run TAD_Pathways on user curated SNPs.

Examples

We provide three different examples for a TAD pathways analysis pipeline. To run each of the analyses:

source activate tad_pathways

# Example using Bone Mineral Density GWAS
bash example_pipeline_bmd.sh

# Example using Type 2 Diabetes GWAS
bash example_pipeline_t2d.sh

# Example using custom input SNPs
bash example_pipeline_custom.sh

General Usage

There are two ways to implement a TAD_Pathways analysis:

  1. GWAS
  2. Custom

GWAS

To perform a TAD_Pathways analysis on publicly available GWAS results, simply browse the data/gwas_catalog/ directory to select a valid GWAS file. These files contain a curation of all significant SNPs mapped to specific traits as distributed by the NHGRI-EBI GWAS Catalog.

Each file in this directory is a tab separated text file of genome-wide significant SNPs and their genomic location along with their reported nearest gene and associated PUBMED id. For complete information on how these files were constructed, refer to https://github.com/greenelab/tad_pathways.

Each GWAS has 3 associated files, including files in data/gwas_catalog/. The other files are located in data/gwas_tad_snps/ and data/gwas_tad_genes/. All files are important for performing a TAD_Pathways analysis. See the GWAS example files for instructions on how to implement the necessary scripts.

Custom

To perform a TAD_Pathways analysis on a list of custom SNPs, generate a comma separated text file. The first row of the text file should have group names and subsequent rows should list the rs numbers of interest. There can be many columns with variable length rows.

E.g.: custom_example.csv

Group 1 Group 2
rs12345 rs67891
rs19876 rs54321
... ...

Then, perform the following steps:

source activate tad_pathways

# Map custom SNPs to genomic locations
Rscript --vanilla scripts/build_snp_list.R \
        --snp_file "custom_example.csv" \
        --output_file "mapped_results.tsv"

# Build TAD based genelists for each group
python scripts/build_custom_TAD_genelist.py \
       --snp_data_file "mapped_results.tsv" \
       --output_file "custom_tad_genelist.tsv"

The output of these steps are Group specific text files with all genes in TADs harboring an input SNP. See example_pipeline_custom.sh for more details.

Contact

For all questions and bug reporting please file a GitHub issue

For all other questions contact Casey Greene at [email protected] or Struan Grant at [email protected]

tad_pathways_pipeline's People

Contributors

gwaybio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

tad_pathways_pipeline's Issues

checkpoint in "build_snp_list.R"

Hi gwaygenomics!
I found the checkpoint usd in "build_snp_list.R" is "2016-02-25", but in "install.R" the R libraries are installed under "2017-05-22". So I met some problems as can't load some libraries. Should it be "2017-05-22" in "build_snp_list.R"?

Thanks!

Transition to Notebook Style Analyses

Now that the WebGestalt step can be automated, the pipeline will benefit from a transition to notebook style analyses.

Currently, there are two ways to interact with a TAD_Pathways analysis:

  1. Input Trait/GWAS
  2. Custom SNP list input

For the first option, all of the analyses can be precompiled and presented as individual notebooks with analysis results and visualizations.

For the second option, the user can interact with an example notebook to input their own custom derived SNP list.

chr missing annotations

Some genes are missing chr annotations in *_gene_evidence_summary.tsv file:

gene evidence TAD ID chromosome TAD Start TAD End UCSC
ZCCHC12 mut 3028 chr 117555972 118355972 chr:117555972-118355972
CD99L2 mut 3055 chr 149929342 150249342 chr:149929342-150249342
FOXO4 mut 2986 chr 70203275 70923275 chr:70203275-70923275
OGT tad 2986 chr 70203275 70923275 chr:70203275-70923275
ZIC3 mut 3045 chr 136172334 137132334 chr:136172334-137132334
POF1B mut 2997 chr 84353344 84833344 chr:84353344-84833344
ZXDA mut 2975 chr 56783275 58582012 chr:56783275-58582012

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.