GithubHelp home page GithubHelp logo

phelex's Introduction

PheLEx : (Phenotype Latent variable Extraction of disease misdiagnosis)

An R package for extracting differentially misclassified samples from GWAS phenotypes to improve statistical power of association analysis and identify novel disease-associated loci.

Useful applications include investigating GWAS phenotypes with lower statistical power or GWAS studies that fail to produce any results using traditional methodologies (i.e. Linear Mixed Model).

For more details, please refer to our paper Identifying misclassified samples in GWAS phenotypes using PheLEx (in preparation)

This repository includes four methods to extract misclassified samples from GWAS phenotypes and demo dataset to present a demonstration on how the method works. For this tutorial we only include documentation for PheLEx, the recommended method for extracting misclassifications.

Installation

You may install phelex using one of the following options. Pre-requisites for phelex are: R packages modeest, truncdist, MASS and stats. Please ensure these are installed before installing phelex.

  1. Run devtools function install_github("phelex","afrahshafquat") in R/RStudio Console.
  2. Download the git repository using git clone http://github.com/afrahshafquat/phelex and then in R/RStudio console install phelex using R devtools function install("./phelex") or install.packages("./phelex", dependencies=TRUE).

PheLEx Model and Pipeline

model

Basic Usage

Step 1: Perform GWAS using your favourite program (e.g. PLINK, GEMMA, lrgpr).

Step 2: If GWAS produces statistically significant SNPs (according to Bonferroni-corrected p-value threshold), only provide those as input to PheLEx. Alternatively, you may use a reasonable p-value threshold or other information statistics to filter SNPs.

Step 3: You may use PhenotypeSimulator or GEMMA to produce a relatedness/kinship matrix. Other software that do the same should be fine as well. Please ensure that the matrix is positive-definite

Step 4: Use PheLEx to extract misclassified samples. We provide the code below as an example to extract misdiagnosis in cases from GWAS phenotypes. (Please note that there are several parameters in the method that can be customized according to the kind of analysis being pursued. The following is only to be considered as one of many different possibilties)

library(phelex)

# Load dataset
x = read.matrix('genotypes.txt')  # Genotypes should be in 0,1,2 format AND **filtered**
y = read.table('phenotype.txt')  # Phenotypes vector
A = as.matrix(read.table('kinship_matrix.txt'))  # Kinship/Genetic relatedness matrix


# Extract misclassified samples
phelex.results = phelex(x = x,
                        y = y,
                        A = A,
                        alpha.prior = c(10, 1), # Beta prior for true-positive rate
                        iterations = 1e5)  # Total number of iterations for method to run
                        
# Misclassification probabilities estimated in cases                      
misclassification.pr.cases = estimate_misclassification_probability(misclassified.samples = phelex.results$misclassified.cases)  

# Corrected phenotype computed. With this command, only a fraction of cases will be switched to controls.
corrected_phenotype = get_phenotype(misclassified.p.case = misclassification.pr.cases, y=y)  

Step 5: Perform GWAS again using corrected phenotype.

Optional parameters

  • alpha.prior and lambda.prior: Beta Priors for true positive and false-positive rates respectively. Examples:

    • alpha.prior or lambda.prior = c(1,1) would give a more uniform Beta distribution with all misclassification rates in cases or controls being equally likely sampled.
    • alpha.prior= c(10, 1) recommends that most true cases stay cases and are not switched to controls. This would extract overdiagnosis in cases, rather than discern misclassification in controls.
    • lambda.prior= c(1, 4) recommends that true controls may be observed as cases with rates usually between (0-0.25).
  • beta.prior.params: Prior on effect sizes of SNPs. The prior is normally-distributed by default (alt. uniform prior may be specified). You may want to change these parameters depending on prior knowledge about effect sizes.

For documentation on other parameters, please refer to documentation provided.

Common issues

  • Relatedness matrix: this method doesn't support non-postive-definite relatedness matrices, so please make sure that the matrix used is positive-definite.

Alternative methods

PheLEx encloses three methods other than PheLEx:

  1. phelex_mm: PheLEx without mixed model
  2. rekaya: Rekaya's method
  3. phelex_mh: PheLEx with Gibbs sampling instead of Adaptive Metropolis-Hastings within Gibbs

Please refer to the manuscript and/or documentation for details on each method.

phelex's People

Contributors

afrahshafquat avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

phelex's Issues

Error in x %*% betas

Thanks for creating this library I think it will super useful for me! I am trying to use the library but I'm running to an error:

> phelex.results = phelex(x = x,
+                         y = y,
+                         A = A,
+                         alpha.prior = c(10, 1), # Beta prior for true-positive rate
+                         iterations = 1e5)
Error in x %*% betas : requires numeric/complex matrix/vector arguments
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Any ideas on where it could have gone wrong?

Thanks,
Jody

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.