GithubHelp home page GithubHelp logo

msesia / knockoffgwas Goto Github PK

View Code? Open in Web Editor NEW
14.0 1.0 0.0 139.27 MB

A flexible tool for the multi-resolution localization of causal variants across the genome, accounting for population structure.

Home Page: https://msesia.github.io/knockoffgwas

Makefile 0.11% C++ 92.79% C 5.17% Perl 0.01% M4 0.01% Batchfile 0.01% Shell 0.04% Max 0.02% XSLT 0.05% CSS 0.12% Python 0.22% HTML 1.15% Assembly 0.09% TeX 0.01% CWeb 0.10% Tcl 0.01% JavaScript 0.08% Roff 0.01% C# 0.02% Cuda 0.02%
causal-variants population-structure uk-biobank gwas

knockoffgwas's Introduction

KnockoffGWAS

Build Status

A powerful and versatile statistical method for the analysis of genome-wide association data with population structure. This method localizes causal variants while controlling the false discovery rate, and is valid even if the samples have diverse ancestries and familial relatedness.

Accompanying paper:

False discovery rate control in genome-wide association studies with population structure
M. Sesia, S. Bates, E. Candès, J. Marchini, C. Sabatti
Proceedings of the National Academy of Sciences (2021) https://www.pnas.org/content/118/40/e2105841118

For more information, visit: https://msesia.github.io/knockoffgwas.

For an earlier version of this method restricted to homogeneous populations, see also KnockoffZoom.

Overview

The goal of KnockoffGWAS is to identify causal variants for complex traits effectively and precisely through genome-wide fine-mapping, accounting for linkage disequilibrium and controlling the false discovery rate. The results leverage the genetic models used for phasing and are equally valid for quantitative and binary traits. The main innovation KnockoffGWAS is to support the analysis of diverse populations, with different ancestries and possibly close familial relatedness. Furthermore, KnockoffGWAS includes a highly efficient standalone C++ program for generating genetic knockoffs for large data sets, which facilitates applications compared to KnockoffZoom.

The code contained in this repository is designed to allow the application of KnockoffGWAS to large datasets, such as the UK Biobank. Some of the code is provided in the form of Bash and R scripts, while the core algorithms for Monte Carlo knockoff sampling are implemented in C++.

The KnockoffGWAS methodology is divided into different modules, each corresponding to a separate Bash script contained in the directory knockoffgwas/.

Dependencies

Recommended OS: Linux. Mac OS is not supported but should be compatible.

The following software should be available from your user path:

The following R (version 4.0.2) packages are required:

The above version numbers correspond to the configuration on which this software was tested. Newer version are likely to be compatible, but have not been tested.

Installation

Clone this repository on your system and install any missing dependencies. Estimated installation time (dependencies): 5-15 minutes. Compile the C++ program for knockoff generation by entering the directory snpknock2 and running make.

Toy dataset and tutorial

A toy dataset containing 1000 artificial samples typed at 2000 loci (divided between chromosome 21 and 22) is offered as an example to test KnockoffGWAS. To run the example, simply execute the script analyze.sh.

./analyze.sh

This script will also verify whether required R packages are available and install them otherwise.

The analysis should take less than 5 minutes on a personal computer. The results can be visualized interactively with the script visualize.sh, which will launch a Shiny app in your browser. Some additional R packages are required by the visualization tool, and will be automatically installed if not found.

./visualize.sh

The expected results for the analysis of this toy dataset are provided in the directory results/ and can be visualized by running the script visualize.sh before running analyze.sh. Note that the script analyze.sh will overwrite the default results.

See https://msesia.github.io/knockoffgwas/tutorial.html for a more detailed tutorial.

Large-scale applications

KnockoffGWAS is computationally efficient and we have successfully applied it to the analysis of the genetic data in the UK Biobank. For more information, visit https://msesia.github.io/knockoffgwas/ukbiobank.html. The analysis of large datasets cannot be carried out on a personal computer. The computational resources required for the analysis of the UK Biobank data are summarized in the accompanying paper.

The modular nature of our method allows the code contained in each of the 4 main scripts to be easily deployed on a computing cluster for large-scale applications. This task will require some additional user effort compared to the toy example, but the scripts for each module are documented and quite intuitive.

Authors

Contributors

License

This software is distributed under the GPLv3 license.

Further references

Read more about:

knockoffgwas's People

Contributors

msesia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

knockoffgwas's Issues

Problem Compiling Program

Hi,

I'm trying to compile your program, but I'm running into some problems. When I run make, I get the following error:
In file included from src/utils.cpp:4:0:
src/utils.h:27:42: fatal error: boost/integer/integer_log2.hpp: No such file or directory
#include <boost/integer/integer_log2.hpp>
^
compilation terminated.

Description of input data files and output results

Hi professor,

I have a whole-genome dataset and would like to try out knockoffgwas. Is there a description of the input and output files of the package? For example the meaning of each column and which public-available resources were used. Thank you!

James

How many IBD segments is too many?

Hello,

I simulated 2k samples and 50k SNPs, for which RaPID returned around 6000 IBD segments.
On this data, snpknock2 seems basically stuck at Generating related knockoffs (waited ~30 min and the progress bar did not move at all). After decreasing IBD segments to 20, the knockoffs were generated in ~20 min or so.

  • I wonder how many IBD segments is too many? According to RaPID paper, it seems millions of IBD segment is common for UKB data, so 6k shouldn't be that much? Should I have just waited longer?
  • Also, I wonder if multithreading (via the n_threads option) work beyond importing BGEN data?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.