kbroman / qtl Goto Github PK

R/qtl: A QTL mapping environment

License: GNU General Public License v3.0

R 55.88% CMake 0.74% C++ 5.06% Shell 0.17% Ruby 0.30% C 37.63% Batchfile 0.12% TeX 0.11%

qtl's Introduction

R/qtl: A QTL mapping environment

Authors: Karl W Broman and Hao Wu, with ideas from Gary Churchill and Śaunak Sen and contributions from Danny Arends, Robert Corty, Timothée Flutre, Ritsert Jansen, Pjotr Prins, Lars Rönnegård, Rohan Shah, Laura Shannon, Quoc Tran, Aaron Wolen, and Brian Yandell

R/qtl is an extensible, interactive environment for mapping quantitative trait loci (QTL) in experimental crosses. It is implemented as an add-on package for the freely available and widely used statistical language/software R. The development of this software as an add-on to R allows us to take advantage of the basic mathematical and statistical functions, and powerful graphics capabilities, that are provided with R. Further, the user will benefit by the seamless integration of the QTL mapping software into a general statistical analysis program. Our goal is to make complex QTL mapping methods widely accessible and allow users to focus on modeling rather than computing.

A key component of computational methods for QTL mapping is the hidden Markov model (HMM) technology for dealing with missing genotype data. We have implemented the main HMM algorithms, with allowance for the presence of genotyping errors, for backcrosses, intercrosses, and phase-known four-way crosses.

The current version of R/qtl includes facilities for estimating genetic maps, identifying genotyping errors, and performing single-QTL genome scans and two-QTL, two-dimensional genome scans, by interval mapping (with the EM algorithm), Haley-Knott regression, and multiple imputation. All of this may be done in the presence of covariates (such as sex, age or treatment). One may also fit higher-order QTL models by multiple imputation and Haley-Knott regression.

License

The R/qtl package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 3, as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.

A copy of the GNU General Public License, version 3, is available at https://www.r-project.org/Licenses/GPL-3

qtl's People

Contributors

Stargazers

Watchers

qtl's Issues

Getting an error in mqmaugment related to the X chromosome

testaugmentation.R gives an error. It seems that if the cross input to mqmaugment contains the X chromosome, you get a warning message, but then the number of rows in the genotype and phenotype data become offset.

> data(listeria)
> tmp <- mqmaugment(listeria)
INFO: VALGRIND MEMORY DEBUG BARRIERE TRIGGERED
Warning message:
In mqmaugment(listeria) :
  MQM not yet available for the X chromosome; omitting chr X
> nind(tmp)
Error in nind(tmp) :
  Different numbers of individuals in genotypes and phenotypes.

If I remove the X chromosome, I don't have this problem.

> listeria_noX <- listeria["-X",]
> tmp <- mqmaugment(listeria_noX)
INFO: VALGRIND MEMORY DEBUG BARRIERE TRIGGERED
> nind(tmp)
[1] 783

effectscan and effectplot give inconsistent results

Likely bug: effectscan() and effectplot() giving inconsistent results, as reported in an R/qtl discussion post.

read.cross for MapQTL format - locus file can't have split lines

I got an email from a user having trouble reading in data in MapQTL format. It turned out that the locus file had split lines, like this:

P486393_4HS    <nnxnp> {-0}    
nn    np  nn  np  nn  np  nn  np  np  nn
TP511441_4  <nnxnp> {-0}    
nn    np  --  np  nn  nn  nn  nn  np  np

We should be able to handle this, since the number of loci and number of individuals are provided in the file.

Documentation for sim.cross with 4way seems incomplete

Here is what I get with the code from CRAN:

> library(qtl)
> map <- sim.map(include.x=FALSE, sex.sp=TRUE)
> model <- rbind(c(1,45,1), c(5,20,-0.5))
> cross <- sim.cross(map, model, n.ind=100, type="4way")
Error in sim.cross.4way(map, model, n.ind, error.prob, missing.prob, partial.missing.prob,  :
  Model must be a matrix with 5 columns (chr, pos and effects).

Should there really be 5 columns, or should the "5" in the code be replaced by "3" to be consistent with the manual (obtained via ?sim.cross)?

stepwiseqtl requires covar argument to be a data frame

stepwisecovar seems to take the covariate names from names(covar). If covar is a matrix and not a data frame, this returns NULL, and so no covariates get included in the analysis.

We should convert to a data frame in this case

data(hyper)
hyper <- calc.genoprob(hyper)
x <- matrix(rnorm(nind(hyper)*2), ncol=2)
# no covariates will be included:
z <- stepwiseqtl(hyper, covar=x, method="hk", max.qtl=2)

x <- as.data.frame(x)
# now the covariates will be included:
z <- stepwiseqtl(hyper, covar=x, method="hk", max.qtl=2)

have lodint and bayesint give more informative errors

If you give a chromosome that is not within the scanone object, lodint() and bayesint() will give an error message like

Error in lodint(out, "c4") : Chromosome misspecified.

Better to say something like "Chromosome c4 not found."

Additive effect heatmap

I would like to plot a heatmap of the additive effects. Similar to the output in genstat (attached). It's useful to see differences in effects between environments and traits Would you have any suggestion? I think it would be a good addition to the chart package.

add format="mapqtl" to read.cross()

Hello,
some colleagues granted me access to already-published data they analyzed with JoinMap and MapQTL. As I prefer to work with free software, I started looking at R/qtl. I was wondering if there was some code in read.cross() that allows to directly load data from files in JoinMap/MapQTL formats? If no, I could maybe try to write an R function for this purpose (I say "try" because not all formats are tabulated, thus they are much harder to parse in R compare to, say, Python). Moreover, I can already say that I won't be able to write a parser able to distinguish all cases handled by JoinMap/MapQTL. In such a case, would you still be interested in integrating it into R/qtl? Otherwise, I'll write a quick-and-dirty parser for my own need.
Best,
Tim

scanone perms with 4-way cross giving all 0's

I have an example of 4-way cross data where scanone gives reasonable results, but scanone perms with perm.Xsp=TRUE gives all 0's on the X chromosome. If I permute the subjects' phenotypes, scanone gives 0's on X chromosome. So something seems messed up, with treatment of X.

create a small panel

My population contains 1500 BILs lines and I have a Genome coverage of 7000 SNP.

I want to create a small panel of lines from the BILs bins that coverage all the genome of the wild species instead to use all the 1500 lines.

Do you have any idea how can i use R for this issue?

plot.scanone() treat Inf as NA

If there are Inf values in scanone() output, plot.scanone() won't work. Revise to treat non-finite values like NAs.

stepwiseqtl can't handle missing phenotypes + covariates

If there are missing phenotypes, stepwiseqtl() seems to subset the phenotypes but not the covariates so you get an error like this:

Error in checkcovar(cross, pheno.col, addcovar, intcovar, perm.strata,  :                     
   Number of rows in additive covariates is incorrect

refineqtl can give min(diff(a)) warnings

It was reported in the R/qtl discussion group that refineqtl() can give a bunch of warnings about min(diff(a)), like this:

In min(diff(a)) : no non-missing arguments to min; returning Inf

I suspect this is in the case of a chromosome with a single marker, in which case diff(a) would be length 0. Here's an example:

data(fake.bc)
fake.bc <- fake.bc[c(2,5),]
fake.bc <- drop.markers(fake.bc, markernames(fake.bc, chr=2)[-1])
fake.bc <- calc.genoprob(fake.bc)
qtl <- makeqtl(fake.bc, chr=c(2,5), pos=c(0, 10), what="prob")
rqtl <- refineqtl(fake.bc, qtl=qtl, method="hk")

reduce2grid not working with scantwo permutations

Consider this example:

data(hyper)
hyper <- calc.genoprob(hyper, step=2.5)
hyper <- reduce2grid(hyper)
out <- scantwo(hyper, method="hk", assumeCondIndep=TRUE, n.perm=2)
## Doing permutation in batch mode ...
## Error in `[<-`(`*tmp*`, , colnames(gen), value = c(2, 2, 1, 2, 2, 1, 2,  :
##   subscript out of bounds

scantwopermhk works if you use assumeCondIndep=TRUE but not if assumeCondIndep=FALSE.

scantwo without permutations works fine.

read.cross may reorder chromosomes

If CSV file contains chromosomes 1-19, "X", and "un", the chromosome will get reordered as sorted character strings. Should stick with the order that's in the data file.

How to change x-axis tick labels(chr labels) size?

Hi,

cex.axis seems not work when plot LOD curve plot using plot(), any way to change the size of chr labels?

Best wishes,
Kun

read.cross with 4-way cross may mess up X chr

If the genotypes are all 1-4, it seems like read.cross() will first treat the data as a standard intercross and so omit het genotypes from the X chromosome in males, even if you indicate crosstype="4way".

plot.pxg gives error if one genotype group is missing

Here's an example:

data(hyper)
hyper <- fill.geno(hyper)
class(hyper)[1] <- "f2"
plot.pxg(hyper, markernames(hyper)[1])

Problem with write.cross with mapqtl

John Lovell pointed out a problem with write.cross with format="mapqtl":

https://groups.google.com/d/msg/rqtl-disc/A7POu6_kktc/lL6PmJi1DAAJ

The additive eﬀects of some markers is NA in the bcsft population

Hello,
I use the CIM algorithm for localization,my population type if BC4F3,but the dditive eﬀects of some markers is NA,and I can't find the reason, I ask for help to explain it,Thank you!

Points and lines in plot.scanone

Hi Karl,

I ended up changing plot.scanone in a very small way. Even though it is a small change, I thought would be useful to others.

When the SNP data is quite dense (as in the genome-wide scan below, where we have ~70,000 SNPs), it is often useful to show the genome-wide association signal (the LOD scores) with points instead of lines, so that you can distinguish when an association peak is driven by multiple SNPs in the same region, or by a single SNP. This is typically how things are visualized in human GWAS, as you surely know already.

In my version of plot.scanone, I modified the call to function "lines" like so:

lines(x,y,lwd = lwd[1],lty = lty[1],col = col[1],type = "b",pch = 20,cex = cex)

and I added a "cex" input argument to plot.scanone. (I suppose the default of cex could be 0 so that points are not shown by default, but perhaps there is a better way to achieve this.)

Of course, I'm sure you could do a much better job making sure that this would work smoothly with the other options.

Let me know if this sounds reasonable.

Cheers,
Peter

scantwopermhk ignores pheno.col argument

When using scantwopermhk, the first phenotype column is always used regardless of user specification.

scanone gives misleading error message when phenotype is not numeric

scanone() gives a misleading error message when the phenotype is not numeric. If you indicate a single phenotype column, it will give the same error message no matter which one you choose:

Error in checkcovar(cross, pheno.col, addcovar, intcovar, perm.strata,  :         
   Following phenotypes are not numeric: Column 1

summary and plot for comparegeno output

As in R/qtl2, make summary and plot functions for comparegeno() output.

Get rid of d.f. warnings

summary.scanone (and perhaps other functions) give warnings about degrees of freedom, mostly related to scanone permutations when you're comparing results with and without an interactive covariate.

Let's just drop those; I basically always ignore them, as they're not very trustworthy.

Can you add some cross pollinators(CP) population examples in tutorials?

Hi,
kbroman, I do some wood-tree and animal qtl mapping projects, and migrating from mapqtl to r/qtl recently, but I can not find any qtl mapping example about F1 (CP) population in r/qtl tutorials, can add some?

Best wishes.

cim should give error for 4-way cross

Analysis with cim() for 4-way cross is just wrong. cim() should not proceed, but should give an error.

In read.cross, use data.table::fread rather than read.csv

data.table::fread (see the data.table package) is way faster than read.csv. We should use it in read.cross.

Directed mqmplot.multitrait?

Hi Karl,
Would it be possible to develop a plot for a multi-trait directed plot?

pull.argmaxgeno issue (4-way, include.pos.info =T)

This is a bug report... it's not a big deal since one can easily get this information in other ways, but I figure I should report it.

data(fake.4way)
fake.4way<-argmax.geno(fake.4way)
test<-pull.argmaxgeno(fake.4way, include.pos.info=TRUE, rotate=TRUE)

Returns an erroneous mapping position for markers - likely because it turns the 2-row map matrix of a 4-way into a numeric vector.

Make estimate.map=FALSE the default in read.cross

I would never recommend estimate.map=TRUE in read.cross(), so we should change the default to FALSE.

Don't give error due to lack of class "map"

summary.map() issues an error if the input doesn't have class "cross" or "map". This means you can't use the function with a plain list, even if it conforms perfectly fine.

Best just to make this a warning, as for plotMap().

Look for other functions like this.

Check that the nb of loci is consistent when loading "MapQTL" files

On the devel branch, in the file read.cross.mq.R at line 197, could you add the following lines?

    if(locus.id > nb.loci + 1){
        msg <- paste("there seems to be more loci (", locus.id-1,
                             ") than indicated in the header (", nb.loci, ")")
        stop(msg, call.=FALSE)
    }

A similar check is already in place for the number of traits (line 662 of the same file).

Sorry for forgetting this in the first place.

explain format arg in read.cross

The meaning of the different format options in read.cross() are explained in details, but not elsewhere. We might as well explain them directly where the argument is explained.

Better error message from plotLodProfile() with null model

When plotLodProfile() is called with a null model, the error message is as if there were a model but no lod profile information:

Error in plotLodProfile(out) :
  You must first run refineqtl, using keeplodprofile=TRUE

It'd be better to say "null QTL model; no profiles to plot."

read.cross with crosstype="bcsft" doesn't allow sep=";"

In read.cross with crosstype="bcsft" and sep=";" we get the error:

 Error in convert2bcsft(cross, BC.gen, F.gen, estimate.map = estimate.map,  :
   unused argument (sep = ";")

malloc error in test_qtl.R

I'm getting a malloc error ("pointer being freed was not allocated") in test_qtl.R.

This bit of code gives the error:

data(hyper)
hyper <- fill.geno(hyper)
#Mess up the markers by shifting
temp <- shiftmap(hyper, offset=10^7)
out.temp <- mqmscan(temp,verb=TRUE,off.end=10)

LOD scores from scanone(...)

Hi Karl,

Looking at the code in scanone.R, it appears that the output from scanone(...) readjusts the LOD scores so that they are in base 10 (not in base e). Is this correct? I might be useful to include this detail in the documentation for scanone, and for other functions where this might be a question. Thanks,

Peter

Documentation for read.cross is incomplete

Not sure if this is the best way to report an issue (in my case, a small one) with the qtl library.

I noticed that the description of the 'csvs' format is incomplete. In particular, no description of the first row in the genotype file is given. While this information could probably be inferred from the description of the 'csv' format, it is not completely clear based on the current description.

More valgrind errors

See http://www.stats.ox.ac.uk/pub/bdr/memtests/valgrind/qtl-Ex.Rout.

Examples:

`subset.scantwo`

==9415== Conditional jump or move depends on uninitialised value(s)
==9415==    at 0xFF5848E: summary_scantwo (packages/tests-vg/qtl/src/summary_scantwo.c:224)
==9415==    by 0xFF588E3: R_summary_scantwo (packages/tests-vg/qtl/src/summary_scantwo.c:97)

`mqmaugment`

==9415== 1,600 bytes in 4 blocks are definitely lost in loss record 174 of 4,075
==9415==    by 0xFF3373E: calloc_init (packages/tests-vg/qtl/src/mqmdatatypes.cpp:111)
==9415==    by 0xFF337C4: newivector (packages/tests-vg/qtl/src/mqmdatatypes.cpp:123)
==9415==    by 0xFF30C9F: mqmaugment (packages/tests-vg/qtl/src/mqmaugment.cpp:288)
==9415==    by 0xFF32746: mqmaugmentfull (packages/tests-vg/qtl/src/mqmaugment.cpp:150)
==9415==    by 0xFF331A6: R_mqmaugment (packages/tests-vg/qtl/src/mqmaugment.cpp:687)

mqmscan should give warning about X chromosome

mqmscan() seems to skip over the X chromosome completely, without saying anything. It should give a warning, and this should be mentioned in the documentation.
If a cross consists of only the X chromosome, mqmscan() gives an error, as the output will be NULL and you can't set the class for NULL.
See this question at the R/qtl google group.

Allow custom colors in geno.image()

Add a col argument to geno.image() to enable custom colors.

R/Qtl error

I am new to R and when I am trying to read the file ,
Data.zip
I am getting below error when I am trying to read the csv file for QTL anaysis.

Data <- read.cross("csv",file="Data.csv",genotypes=c("AA","BB","AB"),na.strings=c("","N"))
Error in read.cross.csv(dir, file, na.strings, genotypes, estimate.map, :
You must include at least one phenotype (e.g., an index). There was this value in the first column of the second row 'Age (Days)' where was supposed to be nothing.
In addition: Warning message:
In read.cross.csv(dir, file, na.strings, genotypes, estimate.map, :
Including "" in na.strings will cause problems; omitted.

Here is my sample data

How to keep full chromosome name when my chromosome prefix is "chr" or "Chr"?

Hi, Kbroman,
I found that read.cross is always delete the chromosome prefix when it is "chr" or "Chr", for example, read.cross return 01 02 03 ... when my true chromosome name are "Chr01" "Chr02" "Chr03" ....
And I want to known is there any way to fixed the chromosome name when the name prefix is "chr" or "Chr"?

Cheers,
Kun.

Download link points to empty file

The download link on https://rqtl.org/download/:

https://rqtl.org/download/qtl_1.46-2.tar.gz

returns an empty (size 0 bytes) file.

read.cross with type=“4way” needs to force sex-specific maps

mycross_1 = read.cross(format=c("csv"),
+                        file="geneticMap_RupP_x_Ny84.csv",
+                        na.strings=c("-", "NA"),
+                        genotypes=c(NULL),
+                        estimate.map = FALSE,
+                        crosstype="4way")
 --Read the following data:
         245  individuals
         849  markers
         5  phenotypes
 --Cross type: 4way 
 
 
 
mycross <- calc.genoprob(mycross_1, step=0, off.end=0, error.prob=0.0001,
+                          map.function=c("kosambi"),
+                          stepwidth=c("fixed"))
Error in map[1, ] : incorrect number of dimensions

Constraint in forking due to large file

Hi Karl,

I wanted to point out a problem with your git repo that crops up only in some cases when I try to fork the qtl repo. This is the gist of the problem:

remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 349a55f65e2dfccd30a01bd4d044366e
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File contrib/bin/d.txt is 120.19 MB; this exceeds GitHub's file size limit of 100.00 MB

I found a simple solution using bfg. The command java -jar bfg.jar --strip-blobs-bigger-than 95M qtl.git detected a single file, d.txt (or, more correctly, "blob"), of size ~120 MB.

Removing this large blob will facilitate forking and maybe resolve other potential issues. If you'd like I can give you the exact sequence of commands I took to remove this blob from the forked repo.

Peter

write.cross.mq.loc forbids too many cases

For instance, the current version of write.cross.mq.loc (for the MapQTL/JoinQTL format) assumes that, if a marker has 3 different genotypes, then it has to be 1,10,4 or 2,3,9. However, it could also be 2,3,4.

This case just happened to me when I loaded a marker with genotypes 7 and 8, but many missing data. After imputation, the marker has genotypes 2,3,4 but not 1, thus causing an error in write.cross.mq.loc.

I will fork the repo, (try to) fix the bug, and make a pull request.

Getting an error in fitqtl

dear
I have a error when I use fitqtl in scan<-scanone(d,pheno.col=i,model="binary");

it turns out
Permutation 1
Error in solve.default(t(Z) %% Z, t(Z) %% X) :
Lapack routine dgesv: system is exactly singular: U[3,3] = 0
Calls: fitqtl -> fitqtlengine -> solve -> solve.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)

mqmaugment uses rand() rather than R's RNG functions

The R CMD check results on CRAN are showing the following:

checking compiled code ... NOTE
File ‘qtl/libs/qtl.so’:
Found ‘rand’, possibly from ‘rand’ (C)
Object: ‘mqmaugment.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor the C RNG.

In mqmaugment.cpp, Need to replace rand() calls with unif_rand() and include calls to GetRNGstate() and PutRNGstate().

kbroman / qtl Goto Github PK

qtl's Introduction

R/qtl: A QTL mapping environment

License

qtl's People

Contributors

Stargazers

Watchers

Forkers

qtl's Issues

Returns an erroneous mapping position for markers - likely because it turns the 2-row map matrix of a 4-way into a numeric vector.

Examples:

subset.scantwo

mqmaugment

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`subset.scantwo`

`mqmaugment`