GithubHelp home page GithubHelp logo

tbrunetti / gp3 Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 1.66 MB

GWAS Pre-Processing Pipeline

License: MIT License

R 39.70% Python 59.80% Shell 0.49%
gwas-pipeline gwas quality-metrics quality-assurance quality-control hardy-weinberg-equilibrium heterozygosity linkage-disequilibrium pca 1000genomes

gp3's Introduction

GP3

GWAS Pre-Processing Pipeline

Overview and Purpose


An automated pipeline to pre-process GWAS data to determine samples to remove prior to input into imputation pipelines and association testing pipelines. This should be used after initial round of QC/filtering has been performed (i.e. removing SNPs and samples that fail due to poor snp/sample quality from idats).

Software Requirements


The following are the minimum software requirements:

--R libraries that need to be installed manually--
The following list of R libraries, including their dependencies must be installed and functional:

--Software Requirements that can be installed automatically--
The following list of Python libraries are required but the pipeline can automatically install them if pip is available:

User Generated File Requirements


There are two files that are minimally required in order to run the pipeline:

  • Input PLINK file either in .bed or .ped format
  • Populated sample_sheet_template.xlsx

Installation of virtual environment, chunkypipes, and pipeline


Please click here for detailed instructions on setting up a virtual environment for shared systems or for installation on systems with root privledges.

ALREADY INSTALLED CHUNKYPIPES AND PIPELINE? Click here for quick start.

Output Files


If you navigate to your output directory you should notice a new directory matching the project name you specified at the time of the run. Navigate into this directory and you should see new directories based on the ethnic group names you specified in your sample_sheet.xlsx as well as a set of PDFs. These PDFs are the ones promised above in the diagram. If navigate into one of the directories of your ethnic group you will notice several PLINK files that were generated at each step of the pipeline.

Addtionally, here are the notable final files of interest if the --TGP flag is specified:

  1. <ethnic group name>_all_samples_to_remove_from_original.txt
  2. <ethnic group name>_all_steps_completed_TGP_final followed by the following suffixes: * .bed * .bim * .fam * .kin * .kin0 * .gds
  3. <ethnic group name>_all_steps_completed_TGP_final_GENESIS.Rdata
  4. <ethnic group name>_all_steps_completed_TGP_final_phenoGENESIS.txt
  5. <ethnic group name>_TGP_PCA_plots.pdf

If no --TGP flag is specified here are the final output file names:

  1. <ethnic group name>_all_samples_to_remove_from_original.txt
  2. <ethnic group name>_all_steps_completed_final followed by the following suffixes: * .bed * .bim * .fam * .kin * .kin0 * .gds
  3. <ethnic group name>_all_steps_completed_TGP_final_GENESIS.Rdata
  4. <ethnic group name>_all_steps_completed_TGP_final_phenoGENESIS.txt
  5. <ethnic group name>_all_steps_completed_final_GENESIS_sample_key_file.txt
  6. <ethnic group name>_individual_PCA_plots.pdf

Questions?


For more information, please visit the Wiki on or contact me (tbrunetti) and I would be happy to address any issues.

gp3's People

Contributors

tbrunetti avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.