fboehm / qtl2pleio Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 1.0 5.35 MB

Testing pleiotropy vs. separate QTL in multiparental populations

License: Other

R 70.28% C++ 23.31% C 2.66% TeX 3.21% Dockerfile 0.55%

quantitative-trait quantitative-genetics multiparental-populations

qtl2pleio's Introduction

qtl2pleio

Overview

qtl2pleio is a software package for use with the R statistical computing environment. qtl2pleio is freely available for download and use. I share it under the MIT license. The user will also want to download and install the qtl2 R package.

Click here to explore qtl2pleio within a live Rstudio session in “the cloud”.

Contributor guidelines

We eagerly welcome contributions to qtl2pleio. All pull requests will be considered. Features requests and bug reports may be filed as Github issues. All contributors must abide by the code of conduct.

Technical support

For technical support, please open a Github issue. If you’re just getting started with qtl2pleio, please examine the vignettes on the package’s web site. You can also email [email protected] for assistance.

Goals

The goal of qtl2pleio is, for a pair of traits that show evidence for a QTL in a common region, to distinguish between pleiotropy (the null hypothesis, that they are affected by a common QTL) and the alternative that they are affected by separate QTL. It extends the likelihood ratio test of Jiang and Zeng (1995) for multiparental populations, such as Diversity Outbred mice, including the use of multivariate polygenic random effects to account for population structure. qtl2pleio data structures are those used in the rqtl/qtl2 package.

Installation

To install qtl2pleio, use install_github() from the devtools package.

install.packages("qtl2pleio")

You may also wish to install the R/qtl2. We will use it below.

install.packages("qtl2")

Example

Below, we walk through an example analysis with Diversity Outbred mouse data. We need a number of preliminary steps before we can perform our test of pleiotropy vs. separate QTL. Many procedures rely on the R package qtl2. We first load the qtl2 and qtl2pleio packages.

library(qtl2)
library(qtl2pleio)
library(ggplot2)

Reading data from `qtl2data` repository on github

We’ll consider the DOex data in the qtl2data repository. We’ll download the DOex.zip file before calculating founder allele dosages.

file <- paste0("https://raw.githubusercontent.com/rqtl/",
               "qtl2data/master/DOex/DOex.zip")
DOex <- read_cross2(file)

probs <- calc_genoprob(DOex)

Let’s calculate the founder allele dosages from the 36-state genotype probabilities.

pr <- genoprob_to_alleleprob(probs)

We now have an allele probabilities object stored in pr.

names(pr)
#> [1] "2" "3" "X"
dim(pr$`2`)
#> [1] 261   8 127

We see that pr is a list of 3 three-dimensional arrays - one array for each of 3 chromosomes.

Kinship calculations

For our statistical model, we need a kinship matrix. We get one with the calc_kinship function in the rqtl/qtl2 package.

kinship <- calc_kinship(probs = pr, type = "loco")

Statistical model specification

We use the multivariate linear mixed effects model:

vec(Y) = Xvec(B) + vec(G) + vec(E)

where Y contains phenotypes, X contains founder allele probabilities and covariates, and B contains founder allele effects. G is the polygenic random effects, while E is the random errors. We provide more details in the vignette.

Simulating phenotypes with `qtl2pleio::sim1`

The function to simulate phenotypes in qtl2pleio is sim1.

# set up the design matrix, X
pp <- pr[[2]] #we'll work with Chr 3's genotype data

#Next, we prepare a design matrix X
X <- gemma2::stagger_mats(pp[ , , 50], pp[ , , 50])

# assemble B matrix of allele effects
B <- matrix(data = c(-1, -1, -1, -1, 1, 1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1), nrow = 8, ncol = 2, byrow = FALSE)
# set.seed to ensure reproducibility
set.seed(2018-01-30)
Sig <- calc_Sigma(Vg = diag(2), Ve = diag(2), kinship = kinship[[2]])
# call to sim1
Ypre <- sim1(X = X, B = B, Sigma = Sig)
Y <- matrix(Ypre, nrow = 261, ncol = 2, byrow = FALSE)
rownames(Y) <- rownames(pp)
colnames(Y) <- c("tr1", "tr2")

Let’s perform univariate QTL mapping for each of the two traits in the Y matrix.

s1 <- scan1(genoprobs = pr, pheno = Y, kinship = kinship)

Here is a plot of the results.

And here are the observed QTL peaks with LOD > 8.

find_peaks(s1, map = DOex$pmap, threshold=8)
#>   lodindex lodcolumn chr      pos       lod
#> 1        1       tr1   3 82.77806 20.703383
#> 2        2       tr2   3 82.77806 14.417924
#> 3        2       tr2   X 48.10040  8.231551

Perform two-dimensional scan as first step in pleiotropy vs. separate QTL hypothesis test

We now have the inputs that we need to do a pleiotropy vs. separate QTL test. We have the founder allele dosages for one chromosome, i.e., Chr 3, in the R object pp, the matrix of two trait measurements in Y, and a LOCO-derived kinship matrix, kinship[[2]].

out <- suppressMessages(scan_pvl(probs = pp,
                pheno = Y,
                kinship = kinship[[2]], # 2nd entry in kinship list is Chr 3
                start_snp = 38,
                n_snp = 25
                ))

Create a profile LOD plot to visualize results of two-dimensional scan

To visualize results from our two-dimensional scan, we calculate profile LOD for each trait. The code below makes use of the R package ggplot2 to plot profile LODs over the scan region.

library(dplyr)
out %>%
  calc_profile_lods() %>%
  add_pmap(pmap = DOex$pmap$`3`) %>%
  ggplot() + geom_line(aes(x = marker_position, y = profile_lod, colour = trait))

Calculate the likelihood ratio test statistic for pleiotropy v separate QTL

We use the function calc_lrt_tib to calculate the likelihood ratio test statistic value for the specified traits and specified genomic region.

(lrt <- calc_lrt_tib(out))
#> [1] 0

Bootstrap analysis to get p-values

Before we call boot_pvl, we need to identify the index (on the chromosome under study) of the marker that maximizes the likelihood under the pleiotropy constraint. To do this, we use the qtl2pleio function find_pleio_peak_tib.

(pleio_index <- find_pleio_peak_tib(out, start_snp = 38))
#> log10lik13 
#>         50

set.seed(2018-11-25) # set for reproducibility purposes.
b_out <- suppressMessages(boot_pvl(probs = pp,
         pheno = Y,
         pleio_peak_index = pleio_index,
         kinship = kinship[[2]], # 2nd element in kinship list is Chr 3
         nboot = 10,
         start_snp = 38,
         n_snp = 25
         ))

(pvalue <- mean(b_out >= lrt))
#> [1] 1

Citation information

citation("qtl2pleio")
#> 
#> To cite qtl2pleio in publications use:
#> 
#>   Boehm FJ, Chesler EJ, Yandell BS, Broman KW (2019) Testing pleiotropy
#>   vs. separate QTL in multiparental populations G3
#>   https://www.g3journal.org/content/9/7/2317
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{Boehm2019testing,
#>     title = {Testing pleiotropy vs. separate QTL in multiparental populations},
#>     author = {Frederick J. Boehm and Elissa J. Chesler and Brian S. Yandell and Karl W. Broman},
#>     journal = {G3},
#>     year = {2019},
#>     volume = {9},
#>     issue = {7},
#>     url = {https://www.g3journal.org/content/9/7/2317},
#>     eprint = {https://www.g3journal.org/content/ggg/9/7/2317.full.pdf},
#>   }

qtl2pleio's People

Contributors

Stargazers

Watchers

Forkers

kbroman

qtl2pleio's Issues

Enable use of multiple cores

I need to look at R package "parallel" and Karl Broman's rqtl/qtl2 package for examples of using parallel

Use base 10 log instead of natural base

scan_pvl uses natural base logarithms. Also, need to revise column name in output of calc_profile_lods to clarify that logarithm base should be 10.

Issue initially raised by @HongHe0123 in a separate discussion

fix citations in Recla vignette

First, add rqtl2 2019 paper. Also, add a bib file and use pandoc citations! Add a "References" section at the end.

tests fail in v 0.1.1

Due to restructuring of scan_pvl output, I need to revise tests for scan_pvl. As of v 0.1.1, the output of scan_pvl is now a data frame with (n_snp)^d rows, where the last column is log likelihood values.

Reword README and vignette sections

The README.md needs more detailed, accurate explanations. Think about explaining each step to a novice R user. Also consider showing the results of calculations, rather than assuming that the user knows to examine them. For some large results, kinship matrices, etc, show dim or length only?

Check computing time for boot_pvl relative to scan_pvl

We anticipate that boot_pvl would be about the same duration, slightly longer, really, as scan_pvl when boot_pvl is called with only one bootstrap sample.

How to extract from addcovar input a matrix with linearly independent columns?

I need to look to see how others have addressed this... maybe even an R package has a function to handle this already.

The problem is that we need to think about whether, after subsetting to common subjects and removing subjects with missing phenotypes or covariates, the resulting covariates matrix's columns are linearly independent. If they have "full rank", then we're ok... but, if rank is less than the number of columns in the covariates matrix, how to proceed? For example, we can't merely drop an arbitrary column, since a column could have, for example, all zeros. But, if we rule out this case, that a matrix column is all zeros, then maybe we can just drop one column if the rank is only one less than the number of columns. I need to review my matrix algebra - if 4 vectors form a matrix that has rank 3, is it true that any 3 of them (again, assuming none is the zero vector) together have rank 3? No! We also need to consider the possibility that two columns are "parallel". In a more general setting, we need to check the rank of the subsets of columns. Since, for example, a three vectors could all be coplanar, while no two would be parallel. I am sure that others have written code to solve this. Can I find their code?

Equation rendering in README contains misplaced asterisks

Not sure yet how to fix this.

vec(Y) = Xve**c(B) + vec(G) + vec(E)

Some of the "e" letters are bolded when they shouldn't be (in the above equation)

Finish writing vignette for one-QTL scans

Right now, I omitted the call to perms function in the vignette. I need to finish writing the vignette!

How to "Create a profile LOD plot to visualize results of two-dimensional scan"?

The R code for "Create a profile LOD plot to visualize results of two-dimensional scan" is missing at webpage https://github.com/fboehm/qtl2pleio. Can I know how you created the figure? Thanks!

Fix find_pleio_peak_tib function

The argument's name is tib - use a more informative name.

Fix @details section for scan_pvl()

Since the updates to scan_pvl - especially the covariance estimation when kinship = NULL - I need to revise details section

Consider moving recla vignette to inst/ directory

With recla vignette in the vignettes directory, R CMD check takes a long time, because it must download files to run the vignette.

Update README.Rmd

After making changes to the code, I need to revise the R code in the README.Rmd to ensure that it uses currently available functions. Specifically, tidy_scan_pvl is no longer available, yet, I think, it's used in the README.Rmd. I probably should remove the plot_pvl call, too. I might consider revising plot_pvl to take the updated data structures outputted by calc_profile_lods.

README: add text to explain every section

Section headers are insufficient to explain what code is doing. Add text for each section

Add option to do two-dimensional scan with interaction covariates (ie, interact with founder allele probs)

We can probably assume that both dimensions of the scan would have these interaction covariates. By looking at rqtl/qtl2 code, especially that for scan1(), this can be addressed.

Decide which data to include in package

I can't write files (or even download to a tmp directory) per CRAN policy. Change / remove sim400. I also need to change examples, tests, vignettes to align with this requirement.

README: suppress warnings and messages when loading packages

Update README

remove text about rstudio session in the cloud.
add text and hyperlinks about multivariate, one-QTL scans

fix vignettes

vignettes("qtl2pleio")

returns message that no vignettes were found. Fix this!

Installation problem

Sorry for bothering you again. I got error again for the installation of new version and old version (install_github("fboehm/qtl2pleio", ref = "b7ce5bdba3b5a120a58af0f0337b72ac30cfed1a",force = TRUE)). The both have similar error massages. So I have not qtl2pleio to use now. Thanks for the help.

devtools::install_github("fboehm/qtl2pleio")
Downloading GitHub repo fboehm/qtl2pleio@master
√ checking for file 'C:\Users\60001296\AppData\Local\Temp\RtmpkvD45F\remotes1cd8448c04c1e\fboehm-qtl2pleio-0966d84/DESCRIPTION' (1.3s)

preparing 'qtl2pleio': (1.4s)
√ checking DESCRIPTION meta-information ...
cleaning src
checking for LF line-endings in source and make files and shell scripts (676ms)
checking for empty or unneeded directories
building 'qtl2pleio_1.1.0.tar.gz'

Installing package into ‘C:/Users/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
ERROR: failed to lock directory 'C:/Users/60001296/Documents/R/win-library/3.5' for modifying
Try removing 'C:/Users/60001296/Documents/R/win-library/3.5/00LOCK-qtl2pleio'
In R CMD INSTALL
Error: Failed to install 'qtl2pleio' from GitHub:
(converted from warning) installation of package ‘C:/Users/AppData/Local/Temp/RtmpkvD45F/file1cd8468252a38/qtl2pleio_1.1.0.tar.gz’ had non-zero exit status
In addition: Warning messages:
1: In base_download_noheaders(url, path, quiet, headers, method) :
R (< 3.6.0) cannot send HTTP headers with the wininet download method. This download will likely fail. Please choose a different download method, via the download.file.method option. The libcurl method is best, if available, and the wget and curl methods work as well, if the corresponding external tool is available. See ?download.file
2: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers
3: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers

Document C++ functions

Maybe I can use "tags" @author Fred boehm, etc. and then get Roxygen2 to document things?? I need to try this!

Fix wording in d-variate, d-QTL scan vignette

In the vignette, I talk about n_cores, the old name for what is now cores.
I say that I use n_cores = 1 to satisfy CRAN when, in fact, I buildignore the vignette, so that CRAN never sees it.

Need print methods for S3 objects that are intermediate outputs from qtl2 and qtl2pleio

in JOSS review, Vince Carey suggested this. Think about how to summarize (for printing purposes) the intermediate objects that I create in the vignettes.

openjournals/joss-reviews#1435

add bayes factor calcs

https://www.genetics.org/content/166/2/1025.short

Need unit tests for proper handling of checking on identical ordering of subject names

In scan_pvl, I have stopifnot() to check that subject names in the various inputs are identical. I need to write unit tests to ensure that these checks work as intended.

Not able to install the package qtl2peio

Could you please help me with the installation? Thanks. I used devtools::install_github("fboehm/qtl2pleio") to install the package. But have an error as follows:

C:/RBuildTools/3.5/mingw_32/bin/g++ -I"C:/PROGRA~~1/R/R-35~~1.3/include" -DNDEBUG -I"C:/Users//Documents/R/win-library/3.5/Rcpp/include" -I"C:/Users//Documents/R/win-library/3.5/RcppEigen/include" -O2 -Wall -mtune=generic -c test-example.cpp -o test-example.o
test-example.cpp:12:22: fatal error: testthat.h: No such file or directory
#include <testthat.h>
^
compilation terminated.
make: *** [C:/PROGRA~~1/R/R-35~~1.3/etc/i386/Makeconf:215: test-example.o] Error 1
ERROR: compilation failed for package 'qtl2pleio'

removing 'C:/Users//Documents/R/win-library/3.5/qtl2pleio'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package ‘C:/Users//AppData/Local/Temp/RtmpoX3waK/file5389835017109/qtl2pleio_1.0.7.tar.gz’ had non-zero exit status
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir) :
skipping pax global extended headers

Use rstudio/pins R package when downloading files

Need to try the pins R package on rstudio's github site

Explain how I used feedback when I next submit to CRAN

ie, include text in cran comments file the feedback that I received alongside my responses / revisions because of it.

Describe results for scan1 outputs

Need to elaborate text about scan1 results.

Add a vignette to explain profile LOD plots

Basically, I want a detailed explanation of profile LOD plots and how they're calculated. Think about Alan's question about negative LOD values...

update examples and tests after code changes for version 1.1.0

I updated the R code for version 1.1.0. However, I still need to ensure that the package's examples, vignettes, and tests work as intended. Some examples may fail due to changes in the code base.

Vignette code fails

in version 0.1.1, vignette code doesn't successfully run. This is due to a major restructuring of scan_pvl. I need to revise the tidy and plot functions to enable it to run successfully

clean up scan_pvl and boot_pvl code

I think that I'm using a lot of redundant code in these two functions. Make a third function that captures these redundancies, then call that third function from both scan_pvl and boot_pvl. adhere to the DRY (don't repeat yourself) principle.

Need scan function option for haley Knott regression, ie, without genetic random effect

Add code to check id names across different input objects and use only those subjects present in all objects

Instead of the current setup, in which the user must ensure that inputs to scan_pvl - pheno, probs, kinship, and covariates - all have the same subject ids, we want, instead, to build in a step early in the function scan_pvl that determines the intersection of the sets of subject ids and uses only those subject ids that are present in all four inputs.

Explain 2d scan in README.Rmd

Right now, we have only a statement like "Perform 2d scan". Explain to a new user.

CRAN and LTO issue

CRAN emailed me to say that I need to fix qtl2pleio.

Here is a discussion:

r-lib/testthat#1230

I may actually not use any C++ unit tests, so this may not be a big deal for me to remove testthat stuff that expects C++ unit tests.