GithubHelp home page GithubHelp logo

famuvie / breedr Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 24.0 507.65 MB

Statistical methods for forest genetic resources analysts

Home Page: http://famuvie.github.io/breedR/

License: GNU General Public License v3.0

R 99.97% Makefile 0.03%

breedr's People

Contributors

famuvie avatar slecourieux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

breedr's Issues

summary(): include computation time

Collect and report timing statistics.
Of interest is:

  • time running underlying fortran programs (e.g. (ai)remlf90)
  • time running breedR (preparation of model and parsing results)
  • total time

report automatically total time in summary()

AIREMLF90 output

In Linux (and Mac?) the output of AIREMLF90 gives S.D. for R and G, and the wrapper looks for this string in the output to grab the result.

However, under Windows, the program outputs SE for R and G. Which is technically more correct, but makes the wrapper fail.

Extractor function for breeding values

Right now, we can only access breeding values by scraping the internals of a resulting object with res$genetic$fit.
It is needed an extractor function to avoid this.

Specific knot locations in splines model

Currently the user can specify a number of (inner) knots to be evenly distributed in the field.
There is no reason to prevent the user to specify a custom set of knots.
It can be added as an optional argument to the splines model, and would be easily accounted for in the splines constructor

example('breedR', 'breedR') not working in Windows

remlf90 doesn't work in the example.
An inspection of the process output reveals that the temporary directory where the files parameter, data and pedigree is too long for AIREMLF90, as they get truncated at 41 characters.

Bug: check for symmetry gets confused by names

In the constructor of group effects effect_group() we check for the symmetry of the initial covariance matrix for the group using isSymmetric.matrix().
If the argument var.ini is specified with names, the function will return FALSE unless identical column and row names are defined.
We need to perform this check using unname().
Verify whether we use the function isSymmetric() somewhere else.

logLik() does not retrieve the number of parameters of the model nor the rank

... and issues an annoying warning each time it is called either directly or through summary().
This does not have any practical consequence, so we should consider removing the warning.

AIC is retrieved directly from (AI)REML's output, so I actually don't need the number of parameters, although it is not hard to retrieve.

For BIC, however, also the rank is needed (which takes into account the restrictions introduced by the REML procedure). This is what I am not sure whether the output from (AI)REML is the right rank.

write metagene-class doc

write an autocontained help doc for the metagene class and interface functions.
Make everything belong to the same family of functions.

Error msg. for pedigrees with same codes for founders and offspring

The error message of an incorrectly specified pedigree is very general:

library(breedR)
build_pedigree(1:3,
               data = data.frame(self = 1:6,
                                 sire = rep(1:3, 2),
                                 dam  = rep(4:6, 2)))
Error: all(check_pedigree(pedx)) is not TRUE
In addition: Warning messages:
1: In pedigree::orderPed(pedx) :
  Be carefull, there are loops in the pedigree, individuals involved in the loop are indicated with a -1

2: In build_pedigree(1:3, data = data.frame(self = 1:6, sire = rep(1:3,  :
  The pedigree has been recoded. Check attr(ped, 'map').

The user has to consult the help for check_pedigree() or build_pedigree() to learn that codes for parents and offspring must be different.
This situation could be checked for and yield a more informative error message.

On behalf of K. Vander.

Error parsing results in models with many effects

The file of solutions from (ai)remlf90 is formatted with fixed width, leaving only four spaces between the column of trait and the column of effect. Thus, if for one trait one has more than 999 effects the two columns become one.

Although it is not very common to have more than 999 effects, it is not that rare to use a splines component with, say $30 x 30$ knots (which gives $32^2 = 1024$ effects).

This is a weakness of the format, and we will ask for a solution upstream.
But it is convenient to write a workaround for that.

Use 64bit progsf90 when possible

Currently, breedR ships 32bit version of remlf90 and airemlf90.
This puts a short limit to memory availability, even in systems with enough physical memory.

Enable compare.plots() to compare variograms under the same scale

As a workaround, it is possible to extract the data from the variograms an plot them manually under the same scale, as follows:

library(ggplot2)

coord <- expand.grid(1:50, 1:50)
z     <- rnorm(rnorm(coord))

vg1 <- variogram(coord = coord, z = z)
vg2 <- variogram(coord = coord, z = 2*z)

isotropic_variograms <- rbind(cbind(model = "M1", vg1[["isotropic"]]),
                              cbind(model = "M2", vg2[["isotropic"]]))

# Then, you can plot the 2 variograms together under the same scale
ggplot(isotropic_variograms, aes(distance, variogram)) +
  geom_point() +
  geom_line() +
  stat_smooth(se = FALSE, method = 'auto') + 
  facet_wrap(~ model)

Make dependencies explicit

Move package dependencies to Imports, and make these dependencies explicit in the code by using pkg::func().

parallelise grid search in AR models

  • try to perform some automatic detection of number of cores with parallel::detectCores()
  • use foreach() with doParallel backend for the grid search

remlf90: Let the user specify initial residual and random effects' variances

So far, the user can specify the initial variance for the additive genetic effect, and for the spatial effects, as they are explicit arguments in the remlf90 call.

However, the residual initial variance is hardcoded at 10, while the initial variance for any other random effect in the argument 'random' is hardcoded at 1.
This values must be customizable.

Implement interface for OPTIONS

Include a new argument options (or perhaps control.reml) of the function remlf90() that passes directly the content to the OPTIONS of the (AI)REML programs.
This allows to manually take advantage of some options such as maxrounds.

Heritability and its SE

Take advantage of the ability to compute SE of functions of variances in the newest version of AIREMLF90.
Check the option se_covar_function in the online documentation

Error with unsorted pedigrees with unknown parents coded with 0

Unknown parents can be coded either with NA or 0.
However, if 0 is used and the pedigree needs to be recoded (either because the parents follow their offspring or the codes are not consecutive), the recoding routine in build.pedigree() fails.

The problem is that the 0 is not mapped to any other code nor NA, and the list of codes for self, sire and dam end up with different lengths, which fires an error when trying to put them into a data.frame.

Excesive memory consumption building AR model

Running below listed model system was out of memory.

res.ar1 <- remlf90(fixed = phe_X ~ poch,
                 # random = ~ block,
                   genetic = list(model = 'add_animal',
                                  pedigree = data[, c('self','dad','mum')],
                                  id = 'self'),
                   spatial = list(model = 'AR',
                                  coord = data[, c('x','y')] ,
                                  rho = c(.8,.8)), data = data,  method='em')

I found the solution:
It is important to use 64 bit version of R in order to use all system memory (more then 4 GB)

Help page for extraction functions

Create a help page for ?extractor or ?extraction where all the extraction functions such as fixef(), ranef(), model.matrix(), fitted(), residuals() and so on are listed and briefly explained, with links to their respective help pages.

Furthermore, include all of them within the same family extraction so that in each particular help page there are direct links to all the rest.

Reporting inner or full knots

In the splines model, the user specifies the number of inner knots by rows and columns.
However, in the summary() of results, breedR reports the total number of knots (i.e. inner + 6 in each dimension).
This is confusing as long as it is not explicit.
Should breedR ask and show the inner number of knots to avoid confusions?
Moreover, would it be more natural to deal with the number of splines (inner + 2) in the base rather than with the number of knots?

Variogram scale

VS requests to have some degree of control over the scale of the variogram.
Useful, for example, to compare the variograms from two different models under the same scale.

Spatial variance components from different models are not comparable

Currently, the variance component parameters from different spatial models are not comparable.
See, for example, the globulus demo, which fits the same dataset using blocks, splines and an autoregressive spatial models.
There are small changes in the genetic and residual variances, but the spatial variance components are 2.7, 21.3 and .5 respectively.

These values are meaningless and not comparable between them.
Furthermore, they don't sum up to the same value of phenotipical variance.

Error msg when response is not found

On behalf of Sebastien Lecourieux.
when the name of the response variable is not found in data, the resulting error message is very misleading.
This case should be properly checked and reported.

competition model w/o pec requires initial variance

Here is a MRE:

ped = data.frame(self = 3:11, dad = 1, mum = 2)
coord = expand.grid(x = 1:3, y = 1:3)
vi  <- diag(1, 2)
dat = data.frame(z = rnorm(9), coord, ped)

## This one works ok and returns a list with 
# $pec
# $pec$present
# [1] FALSE
# 
# $pec$var.ini
# [1] 1
breedR::check_genetic(model = 'competition',
                      pedigree = ped,
                      coordinates = coord,
                      id = 'self',
                      data = dat)

## When we specify initial variances it asks for the initial
## value of the pec variance, even if pec = FALSE by default
breedR::check_genetic(model = 'competition',
                      pedigree = ped,
                      coordinates = coord,
                      id = 'self',
                      var.ini = vi,
                      data = dat)
# Error in breedR::check_genetic(model = "competition", pedigree = ped,  : 
#   var.ini must be specified for pec as well, in the competition specification.
# e.g. pec = list(present = TRUE, var.ini = 1)

The expected behaviour would be to return pec = list(present = FALSE) in both cases.

Thanks to S. Lecourieux for reporting.

Automatically switch to 'em' method for certain models

I know that models that require several "fake" random effects to trick progsf90 (e.g. competition or splines) causes airemlf90 not to converge.
It's better to automatically switch to 'em' in those known cases, to make things easier for the user.

update sim.spatial.metagene

The function currently uses INLA, which is no longer a dependency (for the moment).
Better use breedR's simulation framework.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.