famuvie / breedr Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 24.0 507.65 MB

Statistical methods for forest genetic resources analysts

Home Page: http://famuvie.github.io/breedR/

License: GNU General Public License v3.0

R 99.97% Makefile 0.03%

breedr's People

Contributors

Stargazers

Watchers

breedr's Issues

Interface for setting the convergence criterion in REML

Let the user specify optionally the convergence criterion (i.e. numerical tolerance) to be passed to (ai)remlf90

summary(): include computation time

Collect and report timing statistics.
Of interest is:

time running underlying fortran programs (e.g. (ai)remlf90)
time running breedR (preparation of model and parsing results)
total time

report automatically total time in summary()

BLUP standard errors (or PEV)

We need to compute the standard error for the BLUP estimates.
(AI)REMLF90 does not provide the SE.
Some versions seem to allow an
OPTION sol se [1]
but in the version we downloaded is not implemented.

[1] http://nce.ads.uga.edu/~shogo/html/research/Readme.aireml

breedR does not admit NAs in observed traits

It crashes badly.

This feature is useful for prediction.

AIREMLF90 output

In Linux (and Mac?) the output of AIREMLF90 gives S.D. for R and G, and the wrapper looks for this string in the output to grab the result.

However, under Windows, the program outputs SE for R and G. Which is technically more correct, but makes the wrapper fail.

Extractor function for breeding values

Right now, we can only access breeding values by scraping the internals of a resulting object with res$genetic$fit.
It is needed an extractor function to avoid this.

Specific knot locations in splines model

Currently the user can specify a number of (inner) knots to be evenly distributed in the field.
There is no reason to prevent the user to specify a custom set of knots.
It can be added as an optional argument to the splines model, and would be easily accounted for in the splines constructor

install only the binaries corresponding to the system architecture

Currently, breedR installs the 32bit version of remlf90 and airemlf90 compiled for windows, linux and Mac.

Ideally, the installation process should not download the versions for other systems.
Moreover, 64bit should be available when applicable (see #45).

Provide a "metagene" method for "coordinates<-"

Needed by inheritance from sp::coordinates generic
but not really used. Although it would be good, for completeness.

example('breedR', 'breedR') not working in Windows

remlf90 doesn't work in the example.
An inspection of the process output reveals that the temporary directory where the files parameter, data and pedigree is too long for AIREMLF90, as they get truncated at 41 characters.

Bug: check for symmetry gets confused by names

In the constructor of group effects effect_group() we check for the symmetry of the initial covariance matrix for the group using isSymmetric.matrix().
If the argument var.ini is specified with names, the function will return FALSE unless identical column and row names are defined.
We need to perform this check using unname().
Verify whether we use the function isSymmetric() somewhere else.

logLik() does not retrieve the number of parameters of the model nor the rank

... and issues an annoying warning each time it is called either directly or through summary().
This does not have any practical consequence, so we should consider removing the warning.

AIC is retrieved directly from (AI)REML's output, so I actually don't need the number of parameters, although it is not hard to retrieve.

For BIC, however, also the rank is needed (which takes into account the restrictions introduced by the REML procedure). This is what I am not sure whether the output from (AI)REML is the right rank.

write metagene-class doc

write an autocontained help doc for the metagene class and interface functions.
Make everything belong to the same family of functions.

Error msg. for pedigrees with same codes for founders and offspring

The error message of an incorrectly specified pedigree is very general:

library(breedR)
build_pedigree(1:3,
               data = data.frame(self = 1:6,
                                 sire = rep(1:3, 2),
                                 dam  = rep(4:6, 2)))

Error: all(check_pedigree(pedx)) is not TRUE
In addition: Warning messages:
1: In pedigree::orderPed(pedx) :
  Be carefull, there are loops in the pedigree, individuals involved in the loop are indicated with a -1

2: In build_pedigree(1:3, data = data.frame(self = 1:6, sire = rep(1:3,  :
  The pedigree has been recoded. Check attr(ped, 'map').

The user has to consult the help for check_pedigree() or build_pedigree() to learn that codes for parents and offspring must be different.
This situation could be checked for and yield a more informative error message.

On behalf of K. Vander.

summary(): include n_iterations

Report the number of REML iterations in the summary of a breedR result object.
Feature requested by CB

Error parsing results in models with many effects

The file of solutions from (ai)remlf90 is formatted with fixed width, leaving only four spaces between the column of trait and the column of effect. Thus, if for one trait one has more than 999 effects the two columns become one.

Although it is not very common to have more than 999 effects, it is not that rare to use a splines component with, say $30 x 30$ knots (which gives $32^2 = 1024$ effects).

This is a weakness of the format, and we will ask for a solution upstream.
But it is convenient to write a workaround for that.

Use 64bit progsf90 when possible

Currently, breedR ships 32bit version of remlf90 and airemlf90.
This puts a short limit to memory availability, even in systems with enough physical memory.

Testing issues

no problem with breedR

library(breedR)

Multitrait support

Enable compare.plots() to compare variograms under the same scale

As a workaround, it is possible to extract the data from the variograms an plot them manually under the same scale, as follows:

library(ggplot2)

coord <- expand.grid(1:50, 1:50)
z     <- rnorm(rnorm(coord))

vg1 <- variogram(coord = coord, z = z)
vg2 <- variogram(coord = coord, z = 2*z)

isotropic_variograms <- rbind(cbind(model = "M1", vg1[["isotropic"]]),
                              cbind(model = "M2", vg2[["isotropic"]]))

# Then, you can plot the 2 variograms together under the same scale
ggplot(isotropic_variograms, aes(distance, variogram)) +
  geom_point() +
  geom_line() +
  stat_smooth(se = FALSE, method = 'auto') + 
  facet_wrap(~ model)

need for updating devtools in Windows?

Is that step needed for windows installs?

Provide automatically a spatial autocorrelation test on residuals

e.g. Moran's I
On behalf of Bruno FADY, PhD, Directeur de recherches, INRA.

Perspective plot of sample variograms of residuals

Write a function to produce these kind of plots (See Gilmour et al. 1997).
They are useful for identifying sources of spatial variation, and for model diagnostics.

Ability to specify correlated random effects

In particular, for the generic model.
This is very important for fitting GxE/multisite models.

Make dependencies explicit

Move package dependencies to Imports, and make these dependencies explicit in the code by using pkg::func().

parallelise grid search in AR models

try to perform some automatic detection of number of cores with parallel::detectCores()
use foreach() with doParallel backend for the grid search

Use theme_bw() by default in breedR plots

Report p-values for the estimates of variance components

On behalf of Bruno FADY, PhD, Directeur de recherches, INRA.

AR model Error: non-conformable matrix dimensions in x %*% y

VS reports this error when doing

variogram(selmod)

where selmod is a spatial autoregressive model.

remlf90: Let the user specify initial residual and random effects' variances

So far, the user can specify the initial variance for the additive genetic effect, and for the spatial effects, as they are explicit arguments in the remlf90 call.

However, the residual initial variance is hardcoded at 10, while the initial variance for any other random effect in the argument 'random' is hardcoded at 1.
This values must be customizable.

Provide methods to assess the precision and significance of variance components

Following from #31, provide for example an anova() method to perform a Likelihood Ratio Test to compare two nested breedR objects.

Implement interface for OPTIONS

Include a new argument options (or perhaps control.reml) of the function remlf90() that passes directly the content to the OPTIONS of the (AI)REML programs.
This allows to manually take advantage of some options such as maxrounds.

Provide an appropriate R2 coefficient of determination

Apart from likelihood, AIC and BIC, check http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00261.x/abstract

breedR fails when using splines with different number of knots in rows and columns

At some point in build.splines.model() it tries to build a data.frame with the row and column knots, and fails due to the different length of the vectors.

pedigree spec: allow for character or factor codifications

Currently only numerical codes are allowed.
F. Lefevre requests allowing for character/factor codes as well.

Heritability and its SE

Take advantage of the ability to compute SE of functions of variances in the newest version of AIREMLF90.
Check the option se_covar_function in the online documentation

use viridis as default sequential colormap

viridis has all the good properties that matter

Error with unsorted pedigrees with unknown parents coded with 0

Unknown parents can be coded either with NA or 0.
However, if 0 is used and the pedigree needs to be recoded (either because the parents follow their offspring or the codes are not consecutive), the recoding routine in build.pedigree() fails.

The problem is that the 0 is not mapped to any other code nor NA, and the list of codes for self, sire and dam end up with different lengths, which fires an error when trying to put them into a data.frame.

Excesive memory consumption building AR model

Running below listed model system was out of memory.

res.ar1 <- remlf90(fixed = phe_X ~ poch,
                 # random = ~ block,
                   genetic = list(model = 'add_animal',
                                  pedigree = data[, c('self','dad','mum')],
                                  id = 'self'),
                   spatial = list(model = 'AR',
                                  coord = data[, c('x','y')] ,
                                  rho = c(.8,.8)), data = data,  method='em')

I found the solution:
It is important to use 64 bit version of R in order to use all system memory (more then 4 GB)

Help page for extraction functions

Create a help page for ?extractor or ?extraction where all the extraction functions such as fixef(), ranef(), model.matrix(), fitted(), residuals() and so on are listed and briefly explained, with links to their respective help pages.

Furthermore, include all of them within the same family extraction so that in each particular help page there are direct links to all the rest.

Reporting inner or full knots

In the splines model, the user specifies the number of inner knots by rows and columns.
However, in the summary() of results, breedR reports the total number of knots (i.e. inner + 6 in each dimension).
This is confusing as long as it is not explicit.
Should breedR ask and show the inner number of knots to avoid confusions?
Moreover, would it be more natural to deal with the number of splines (inner + 2) in the base rather than with the number of knots?

Variogram scale

VS requests to have some degree of control over the scale of the variogram.
Useful, for example, to compare the variograms from two different models under the same scale.

Include some 'generic' model

where a user can specify a custom incidence and covariance matrix

Include initial values in summary output

print.summary.remlf90() should always report which initial values were used for the maximization algorithm.

Spatial variance components from different models are not comparable

Currently, the variance component parameters from different spatial models are not comparable.
See, for example, the globulus demo, which fits the same dataset using blocks, splines and an autoregressive spatial models.
There are small changes in the genetic and residual variances, but the spatial variance components are 2.7, 21.3 and .5 respectively.

These values are meaningless and not comparable between them.
Furthermore, they don't sum up to the same value of phenotipical variance.

Error msg when response is not found

On behalf of Sebastien Lecourieux.
when the name of the response variable is not found in data, the resulting error message is very misleading.
This case should be properly checked and reported.

competition model w/o pec requires initial variance

Here is a MRE:

ped = data.frame(self = 3:11, dad = 1, mum = 2)
coord = expand.grid(x = 1:3, y = 1:3)
vi  <- diag(1, 2)
dat = data.frame(z = rnorm(9), coord, ped)

## This one works ok and returns a list with 
# $pec
# $pec$present
# [1] FALSE
# 
# $pec$var.ini
# [1] 1
breedR::check_genetic(model = 'competition',
                      pedigree = ped,
                      coordinates = coord,
                      id = 'self',
                      data = dat)

## When we specify initial variances it asks for the initial
## value of the pec variance, even if pec = FALSE by default
breedR::check_genetic(model = 'competition',
                      pedigree = ped,
                      coordinates = coord,
                      id = 'self',
                      var.ini = vi,
                      data = dat)
# Error in breedR::check_genetic(model = "competition", pedigree = ped,  : 
#   var.ini must be specified for pec as well, in the competition specification.
# e.g. pec = list(present = TRUE, var.ini = 1)

The expected behaviour would be to return pec = list(present = FALSE) in both cases.

Thanks to S. Lecourieux for reporting.

Automatically switch to 'em' method for certain models

I know that models that require several "fake" random effects to trick progsf90 (e.g. competition or splines) causes airemlf90 not to converge.
It's better to automatically switch to 'em' in those known cases, to make things easier for the user.

update sim.spatial.metagene

The function currently uses INLA, which is no longer a dependency (for the moment).
Better use breedR's simulation framework.

famuvie / breedr Goto Github PK

breedr's People

Contributors

Stargazers

Watchers

Forkers

breedr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs