padrinodb / ipmr Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 4.0 19.15 MB

Flexibly implement Integral Projection Models (IPMs)

Home Page: https://padrinoDB.github.io/ipmr/

License: Other

R 99.62% C++ 0.36% TeX 0.02%

demography integral-projection-models

ipmr's Introduction

ipmr

ipmr is a package for implementing Integral Projection Models (IPMs) in R. It relies heavily on the mathematical syntax of the models, and does not try to abstract over the process of fitting vital rates. It is now relatively stable, though tweaks may be made. Below is a brief overview of how ipmr classifies different model types followed by examples of how to implement those types in this framework.

Installation

ipmr is available on CRAN and can be installed with the following snippet:

install.packages("ipmr")

You can also install the development version from GitHub with the snippet below:

if(!require('remotes', quietly = TRUE)) {
  install.packages("remotes")
}

remotes::install_github("padrinoDB/ipmr", build_vignettes = TRUE)

Package scope

ipmr is intended to assist with IPM implementation and analysis. It is important to note that this package will not help with the process of fitting regression models for vital rates at all! That is a sufficiently different (and vast) topic that we decided it was not within the scope of this project. This will only help you turn those regression models into an IPM without shooting yourself in the foot. Furthermore, most of the documentation assumes a basic knowledge of IPM theory and construction. For those that are totally new to IPMs, it is strongly recommended to read a theoretical overview of the models first. Some favorites are Easterling et al. 2000, Merow et al. 2013, and Rees et al. 2014. The Introduction to ipmr vignette also contains a brief overview of IPM theory as well as a far more detailed introduction to this package. Thus, everything that follows assumes you have basic understanding of IPM theory, parameterized vital rate models, and are now ready to begin implementing your IPM.

Below is a brief overview of the package and some examples of how to implement models with it. A more thorough introduction is available here.

Model classes

Once all parameters are estimated, the first step of defining a model in ipmr is to initialize the model using init_ipm(). This function has five arguments: sim_gen, di_dd, det_stoch, kern_param, and uses_age. We will ignore uses_age for now, because age-size models are less common and have their own vignette.

The combination of these arguments defines the type of projection model, and makes sure that the machinery for subsequent analyses works correctly. The possible entries for each argument are as follows:

sim_gen: "simple"/"general"
- A. simple: This describes an IPM with a single continuous state variable and no discrete stages.
- B. general: This describes and IPM with either more than one continuous state variable, one or more discrete stages, or both of the above. Basically, anything other than an IPM with a single continuous state variable.
di_dd: "di"/"dd"
- A. di: This is used to denote a density-independent IPM.
- B. dd: This is used to denote a density-dependent IPM.
det_stoch: "det"/"stoch"
- A. det: This is used to denote a deterministic IPM. If this is the third argument of init_ipm, kern_param must be left as NULL.
- B. stoch: This is used to denote a stochastic IPM. If this is the third argument of init_ipm, kern_param must be specified. The two possibilities for the fourth are described next.
kern_param: "kern"/"param" (Complete definitions found in Metcalf et al. 2015)
- A. kern: This describes an IPM with discretely varying parameters such that their values are known before the model is specified. This is usually the case with models that estimate fixed and/or random year/site effects and for which defining a multivariate joint distribution to sample parameters from is not desirable/needed. These models can be a bit more computationally efficient than the param alternative because all kernels can be constructed before the iteration procedure begins, as opposed to requiring reconstruction for every single iteration.
- B. param: This describes an IPM with parameters that are re-sampled from some distribution at each iteration of the model. This could be a multivariate normal defined by covarying slopes and intercepts, or distributions of environmental variables that change from time to time. All that is required is that the parameters for the distribution are specified and that the function that generates the parameters at each iteration returns named lists that correspond to the parameter names in the model. Examples of this are available in the Introduction and General IPM vignettes.

The following possibilities are currently or will become available in ipmr (bold text denotes development progress):

Simple, density independent models: Completed and ready

"simple_di_det"
"simple_di_stoch_kern"
"simple_di_stoch_param"

Simple, density dependent models: Completed, likely stable

"simple_dd_det"
"simple_dd_stoch_kern"
"simple_dd_stoch_param"

General, density independent models: Completed and ready

"general_di_det"
"general_di_stoch_kern"
"general_di_stoch_param"

General, density dependent models: Completed, likely stable

"general_dd_det"
"general_dd_stoch_kern"
"general_dd_stoch_param"

Simple density-independent deterministic, simple kernel-resampled stochastic, and simple parameter resampled stochastic models (simple_di_det, simple_di_stoch_kern, simple_di_stoch_param) are described in detail here. The general_* versions of these are described here. Density dependent versions are completed for simple and general models, and are probably stable, but have not been tested enough to be certain. A very brief, though incomplete introduction is available here. Below is an example implementing a simple_di_det IPM.

Quick example of a simple, deterministic IPM

Here is a simple model implemented with ipmr. This is a hypothetical plant species where plants can survive and grow (P(z′,z)), and reproduce sexually (F(z′,z)). We’ll use 4 regressions: survival (s(z)), growth (G(z′,z), f_g), probability of reproducing (r_r(z)), and number of seeds produced conditional on flowering (r_s(z)). New recruits will be generated with a Gaussian distribution (f_r_d), which requires calculating the mean and standard deviation of new recruits from the data. For simplicity, we’ll assume there’s no maternal effect on recruit size. First, we’ll write out the functional forms for each component of the model:

n(z′,t+1) = ∫_L^UK(z′,z)n(z,t)d**z
K(z′,z) = P(z′,z) + F(z′,z)
P(z′,z) = s(z) * G(z′,z)
Logit(s(z)) = α_s + β_s * z
G(z′,z) = f_g(μ_g,σ_g)
μ_g = α_g + β_g * z
F(z′,z) = r_r(z) * r_s(z) * r_d(z′)
Logit(r_r(z)) = α_{r_r} + β_{r_r} * z
Log(r_s(z)) = α_{r_s} + β_{r_s} * z
r_d(z′) = f_{r_d}(μ_{r_d},σ_{r_d})

Equation 1 describes how all the vital rates act on the initial trait distribution to produce a new one at t + 1. Equations 3-6 describe how existing individuals can survive, and if they survive, grow or shrink. Equations 7-10 describe how existing individuals create new individuals. In order to implement this, we usually fit regression models to our data. The following set of generalized linear models correspond to the functional forms described above:

Survival (s(z) / s): a generalized linear model w/ a logit link.
- Example model formula: glm(surv ~ size_1, data = my_surv_data, family = binomial())
Growth (G(z′,z) / g): a linear model with a Normal error distribution.
- Example model formula: lm(size_2 ~ size_1, data = my_grow_data)
Pr(flowering) (r_r(z) / r_r): a generalized linear model w/ a logit link.
- Example model formula: glm(flower ~ size_1, data = my_repro_data, family = binomial())
Seed production (r_s(z) / r_s): a generalized linear model w/ log link.
- Example model formula: glm(seeds ~ size_1, data = my_flower_data, family = poisson())
Recruit size distribution (r_d(z′) / r_d): a normal distribution w parameters mu_fd (mean) and sd_fd (standard deviation).
- Example computations:
  - mu_fd = mean(seedling_data$size_2)
  - sd_fd = sd(seedling_data$size_2)

The example below assumes we’ve already fit our vital rate models from the raw data. In this example, the numbers are made up, but code that extracts the values you need from a real regression model is provided in comments.

# Load ipmr and get the parameter values. The data_list argument for define_kernel
# should hold every regression parameter and every constant used in the model.

library(ipmr)

my_data_list = list(s_int     = -2.2,   # coefficients(my_surv_mod)[1]
                    s_slope   = 0.25,  # coefficients(my_surv_mod)[2]
                    g_int     = 0.2,   # coefficients(my_grow_mod)[1]
                    g_slope   = 0.99,  # coefficients(my_grow_mod)[2]
                    sd_g      = 0.7,   # sd(resid(my_grow_mod))
                    r_r_int   = 0.003, # coefficients(my_pr_flower_mod)[1]
                    r_r_slope = 0.015, # coefficients(my_pr_flower_mod)[2]
                    r_s_int   = 0.45,   # coefficients(my_seed_mod)[1]
                    r_s_slope = 0.075, # coefficients(my_seed_mod)[2]
                    mu_fd     = 2,     # mean(recruit_data$size_next)
                    sd_fd     = 0.3)   # sd(recruit_data$size_next)

my_simple_ipm <- init_ipm(sim_gen   = "simple",
                          di_dd     = "di",
                          det_stoch = "det")


my_simple_ipm <- define_kernel(
  
  proto_ipm = my_simple_ipm,
    
  # Name of the kernel
  
  name      = "P_simple",
  
  # The type of transition it describes (e.g. continuous - continuous, discrete - continuous).
  # These must be specified for all kernels!
  
  family    = "CC",
  
  # The formula for the kernel. We dont need to tack on the "z'/z"s here.  
  
  formula   = s * g,
  
  # A named set of expressions for the vital rates it includes. 
  # note the use of user-specified functions here. Additionally, each 
  # state variable has a stateVariable_1 and stateVariable_2, corresponding to
  # z and z' in the equations above. We don't need to define these variables ourselves,
  # just reference them correctly based on the way we've set up our model on paper.
  
  # Perform the inverse logit transformation to get survival probabilities
  # from your model. plogis from the "stats" package does this for us. 

  s         = plogis(s_int + s_slope * dbh_1), 
  
  # The growth model requires a function to compute the mean as a function of dbh.
  # The SD is a constant, so we don't need to define that in ... expression, 
  # just the data_list.
  
  g         = dnorm(dbh_2, mu_g, sd_g),
  mu_g      = g_int + g_slope * dbh_1,
  
  
  # Specify the constant parameters in the model in the data_list. 
  
  data_list = my_data_list,
  states    = list(c('dbh')),
  
  # If you want to correct for eviction, set evict_cor = TRUE and specify an
  # evict_fun. ipmr provides truncated_distributions() to help. This function
  # takes 2 arguments - the type of distribution, and the name of the parameter/
  # vital rate that it acts on.
  
  evict_cor = TRUE,
  evict_fun = truncated_distributions(fun    = 'norm',
                                      target = 'g')
  ) 

my_simple_ipm <- define_kernel(
  proto_ipm = my_simple_ipm,
  name      = 'F_simple',
  formula   = r_r * r_s * r_d,
  family    = 'CC',
  
  # Inverse logit transformation for flowering probability
  # (because we used a logistic regression)
  
  r_r       = plogis(r_r_int + r_r_slope * dbh_1),
  
  # Exponential function for seed progression 
  # (because we used a Poisson)
  
  r_s       = exp(r_s_int + r_s_slope * dbh_1),
  
  # The recruit size distribution has no maternal effect for size,
  # so mu_fd and sd_fd are constants. These get passed in the 
  # data_list
  
  r_d       = dnorm(dbh_2, mu_fd, sd_fd),
  data_list = my_data_list,
  states    = list(c('dbh')),
  
  # Again, we'll correct for eviction in new recruits by
  # truncating the normal distribution.
  
  evict_cor = TRUE,
  evict_fun = truncated_distributions(fun    = 'norm',
                                      target = 'r_d')
) 

# Next, we have to define the implementation details for the model. 
# We need to tell ipmr how each kernel is integrated, what state
# it starts on (i.e. z from above), and what state
# it ends on (i.e. z' above). In simple_* models, state_start and state_end will 
# always be the same, because we only have a single continuous state variable. 
# General_* models will be more complicated.

my_simple_ipm <- define_impl(
  proto_ipm = my_simple_ipm,
  make_impl_args_list(
    kernel_names = c("P_simple", "F_simple"),
    int_rule     = rep("midpoint", 2),
    state_start  = rep("dbh", 2),
    state_end    = rep("dbh", 2)
  )
) 

my_simple_ipm <- define_domains(
  proto_ipm = my_simple_ipm,
  dbh = c(0, # the first entry is the lower bound of the domain.
          50, # the second entry is the upper bound of the domain.
          100 # third entry is the number of meshpoints for the domain.
  ) 
) 

# Next, we define the initial state of the population. We must do this because
# ipmr computes everything through simulation, and simulations require a 
# population state.

my_simple_ipm <- define_pop_state(
  proto_ipm = my_simple_ipm,
  n_dbh     = runif(100)
)

my_simple_ipm <- make_ipm(proto_ipm  = my_simple_ipm,
                          iterations = 200)


lambda_ipmr <- lambda(my_simple_ipm)
w_ipmr      <- right_ev(my_simple_ipm, iterations = 200)
v_ipmr      <- left_ev(my_simple_ipm, iterations = 200)

# make_ipm_report works on either proto_ipms or made ipm objects

make_ipm_report(my_simple_ipm, 
                render_output = TRUE, 
                title         = "my_simple_ipm_report")

make_ipm_report() generates an Rmarkdown file containing Latex equations and parameter values used to implement the IPM. This may be useful for publications/appendices, or for sending an IPM to the PADRINO project for archiving.

More complicated models

Examples of more complicated models are included in the vignettes, accessible using either browseVignettes('ipmr') or by visiting the Articles tab on project’s webpage. Please file all bug reports in the Issues tab of this repository or contact me via email with a reproducible example.

Code of Conduct

We welcome contributions from other developers. Please note that the ipmr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ipmr's People

Contributors

Stargazers

Watchers

Forkers

davan690 farcego aariq hadley

ipmr's Issues

Remove simple/general distinction

I think this decision could be hidden from the user, since it appears to be implemented only for computational efficiency.

Under the hood, the simple approach could be used after checking user inputs that:

all kernels have only continuous states
only one continuous state variable is used

Minor inconsistency in output of lambda()

A very minor inconsistency, but the output of lambda() on a deterministic IPM is a named vector and the output on a stochastic IPM is unnamed.

lambda(ipm_det)

#    lambda 
# 0.9790214 

lambda(ipm_det_ff) %>% names()
# [1] "lambda"

lambda(ipm_stoch)
# log(lambda) is returned by default for stochastic models. Set 'log = FALSE' for lambda on linear scale.
# [1] -0.02078642

lambda(ipm_stoch_ff) %>% names()
# log(lambda) is returned by default for stochastic models. Set 'log = FALSE' for lambda on linear scale.
# NULL

generalize format_mega_mat to use hierarchical syntax

should be able to do the following:

all_megas <- format_mega_matrix(general_ipm, c(P_j_site, F_Site, P_ja_site, P_a_site))

Don't forget about .make_mega_vec!!!!

`make_ipm()` hangs when `kernel_seq` is specified

I'm iterating a general stochastic kernel sampled IPM and it hangs when I specify kernel_seq. The vital rate models are GAMs with a random effect of year and in the IPM predict() is called with newdata = data.frame(year = yr) and par_set_indices = list(yr = 2000:2009). I tracked the hanging down to this line of .make_usr_seq():

https://github.com/levisc8/ipmr/blob/c40603ed76a53f58b7af09b2109823c9d826cf38/R/internal-make_ipm.R#L935

I don't understand why, but this call to grepl() is incredibly slow for my IPM. I don't quite understand what this line is doing or why it is so slow for my IPM but not noticeable for the one in the vignette. I assume there is a way to make this more efficient?

error messages

More of a long term thing, but want to start failing faster and more clearly when initial conditions are misspecified. For example, mis-specified hier_effs often lead to e.g. error in eval_tidy: p_i_2008 not found messages.

Will want to write some functions that can quickly inspect modified names in hier_effs code and make sure they're actually present in the parameter/vital rates/etc

can also be a problem w/ define_pop_state(wt = c(0, 20, 200) (e.g. forgetting to prepend n_ to wt)

interactive model builder?

Something like swirl, but for IPMs?

Could define kernels using name, formula, vr_exprs, data_list, and eviction. optionally add hierarchical effects as well.

Probably only useful for teaching and for simple IPMs

env_seq gets unlnisted recursively

This is maybe a weird edge case, but I have vital rates models that are GAMs that take a data frame with matrix columns. Because the proto_ipm$env_seq gets unlisted recursively (the default for unlist()) the output doesn't look like the input. For example, if my sample_env() function outputs something like this for a single iteration:

sample_env(data_full %>% select(spei_history), 1)
$spei_history
          [,1]       [,2]      [,3]      [,4]     [,5]      [,6]     [,7]      [,8]      [,9]    [,10]     [,11]
[1,] -1.331478 -0.8566401 -1.000386 -1.313495 -1.63653 -2.024821 -2.25552 -1.780731 -1.232011 0.163645 0.9614435
        [,12]     [,13]      [,14]     [,15]       [,16]     [,17]     [,18]     [,19]     [,20]     [,21]
[1,] 1.440836 0.6301332 -0.2844098 -0.252915 -0.08826031 0.5890239 0.2877022 0.9476817 0.8315037 0.9577722
        [,22]   [,23]   [,24]    [,25]    [,26]    [,27]    [,28]     [,29]      [,30]      [,31]     [,32]
[1,] 1.908109 1.75287 2.26437 1.412641 2.083237 1.194112 1.146075 -0.286554 -0.1933357 -0.1651673 0.2588008
         [,33]     [,34]      [,35]     [,36]     [,37]
[1,] 0.1826453 0.2680286 -0.8562538 -1.315719 -1.640294

Then the ipm$env_seq looks something like:

> ipm_dlnm_ff$env_seq 
  t spei_history1 spei_history2 spei_history3 spei_history4 spei_history5 spei_history6  ... spei_history37
1 1     0.2103302     0.3759758     0.8460177     0.7169548     0.9293107   -0.03144605  ... -0.08102309
2 2     0.9858378     0.5718160    -0.7310837    -1.3254078    -1.3687049   -1.63126382  ... -2.08027721
3 3    -1.2166994    -1.2700957     0.8460177     0.7169548     0.9293107   -0.03144605  ... -0.08102309

I've convinced myself it's not causing any problems in the actual IPM, the matrix just gets unlisted by this line I think:

https://github.com/levisc8/ipmr/blob/053fb52210e325209f64056c9a3a82b98312d7a2/R/internal-make_ipm.R#L544

Looks like it would probably break a bunch of stuff if this was changed to unlist(recursive=FALSE), but I thought I'd let you know. I got worried for a second when I realized that the input and output didn't quite match.

plot for general ipms

Still need plot methods for general IPMs. These will probably require a mega_mat argument, mostly becuase I do not feel like working out how to reasonably present multiple kernels of varying dimensions on a canvas without adding a ggplot2 dependency.

Require user to specify the differential of the state variable in formula

With general IPMs, users are told to add the 'dx' to the formula, but it isn't checked. If not provided, the model silently continues and may produce incorrect outputs.

time lagged models

Still not implemented. I suspect the underlying code/notation will/should look like the age X size example in #5. This interface will also require adding a pronoun to access the iterator in .iterate_model.

n_z_t_1 = F %*% n_z_[t-1] + P %*% n_z_t

Think carefully about how to have the user specify initial conditions here - needs to remain intuitive and really shouldn't require much additional code to specify the model.

Questions about General IPM vignette

https://github.com/levisc8/ipmr/blob/ad143c10e68bd40a2cac605bbd0f599314a1329d/vignettes/general-ipms.Rmd#L47

I don't see f_G or f_{r_d} in the previous list of equations. Do you mean G(z, z') and f_d(z') ?

I'm also not sure I understand why germination is part of the go_discrete kernel. Shouldn't that be part of leave_discrete for a seed bank? The current formulation seems more like a seedling bank to me.

n-dimensional integrations in models

@wpetry had mentioned this before, and came up today in discussion w/ Tomos. Currently, the par_set_indices method won't work, because each level of sex will be treated as a separate population when population size/lambda is computed.

Likely requires an additional argument to init_ipm() (e.g. init_ipm(..., two_sex = TRUE), and then altering .init_iteration and .pop_size to be generic on that. Almost certainly overlooking a critical component though.

state_start/state_end to define_kernel()

Mentioned by @Aariq in levisc8/ipmr_esa#3 - I think this would make a more intuitive UI. The rationale is that state_start/state_end are part of the symbolic kernel defintion anyway (i.e. they're the "z',z" in P(z',z) = ...), so they should be attached to define_kernel(), rather than a separate function.

This probably means dropping define_impl() and moving int_rule to define_domains(), or vice versa (depending on which function name feels more intuitive).

define_kernel(..., state_start = "z", state_end = "x") %>% 
define_domains(x = c(L_x, U_x, n_x), z = c(L_z, U_z, n_z))

age X size models

Still not implemented. To work with current hier_effs code, will require evaluation of bracketed code after hierarchical effects are generated but before they are substituted into each expression. Additionally, the 0 age will have to be specified separately from the other expressions (can't have negative age). Seems a bit clunky to me - consider alternative implementations.

n_size_age_t_1 = K_[age-1] %*% n_size_[age-1]_t

split discrete_extrema

Need two versions, 1 equivalent to each of these:

Option 1, call it "staggered" or something?

for(i in bottom_seq) {
      x[1, bottom_seq] <- x[1, bottom_seq] + (1 - colsums(x[ , bottom_seq])
      x[n_x, top_seq] <- x[1, top_seq] + (1 - colsums(x[ , top_seq])
}

Option 2, call it "cumul" or something

x[1, ] <- x[1, ] + p<dist_fun>(smallest_size, ...)
x[n_x, ] <- x[n_x, ] + p<dist_fun>(largest_size,...)

Need both for Padrino unfortunately

stochasticity within a level of parameter set

Given multiple parameter sets, allow for stochastic models to be stochastic within a level of one or more (e.g. stochastic across years within sites).

Interface:

par_set_list <- list(yr = 2005:2012, site = c("A", "B", "C")

define_kernel(..., uses_par_sets = TRUE, par_set_indices = par_set_list) %>%
define_impl(..., stoch_within = "site")

^^ Should produce 3 separate stochastic models for sites A,B,C using years 2005:2012 at each one.

NB: Not sure if this should go within define_impl or somewhere else - maybe define_pop_state() or make_ipm()

Specifying constants

From exercise 1 from the tutorial at evodemos:
When defining a kernel with a constant in it such as a seedproduction with a germination chance using the variable name from the data list gives a cryptic error if the name is the same in the data list as in the formula.
Error in (function (...) : promise already under evaluation: recursive default argument reference or earlier problems?
setting g_0 = 0.02465088, or g_0 as g_1 in the kernel does work correctly,
and so does not specifying g_0.

dat_list <- c(s_pars, g_pars, 
              repr_pars, seed_pars,
              f_d_pars,
              list(g_0 = 0.02465088,
                   g_1 = 0.01433191,
                   g_2 = 0.0005732764))

define_kernel(
    name= "F",
    family = "CC",
    formula = F_p*F_n*F_d*g_0*d_area,
    F_p = plogis(alpha_fp+beta_fp*area_1),
    F_n = plogis(alpha_fn+beta_fn*area_1),
    F_d = dnorm(area_2,fd_mu,fd_sd),
    g_0 = g_0,
    data_list=dat_list,
    states= list(c("area")),
    evict_cor = T,
    evict_fun= truncated_distributions("norm","F_d")
    )

Model iteration simple_di_det/stoch_kern

User expressions instead of right_mult hardcoded currently

plot simple ipm

Thanks @levisc8 for developing ipmr! I'll be using it in a course in Brazil later this month.
I noted that the plot() function for a simple ipm (a simple_di_det_ipm) shows the plot for the F kernel only (I had an P and an F kernel in the ipm). The plot title is also "F". Does it take the last subkernel generated? How can I generate a plot of both kernels combined?

NA warning in define_kernel() can't be silenced

I'm using some models in data_list that happen to have NAs in the model object. This triggers a warning from define_kernel(). It doesn't seem like there's a way to silence them, but I'm wondering if use_vr_model() could have that effect? Like, you're acknowledging that it's a model object, and not numeric by using use_vr_model() so maybe it should (optionally?) silence warnings about NAs.

Probably low priority, but I dug into this just to make sure I wasn't missing something important in my model and figured I'd share.

library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.
library(ipmr)
#> It looks like you've installed the development vesrion of ipmr from the repository 'levisc8/ipmr'.
#> ipmr will soon be moving to the padrinoDB Github organization.
#> In the future, you will need to install the development version from there.
#> remotes::install_github('padrinoDB/ipmr')
# big GAMs (mgcv::bam()) have the data slot set to NA
m1 <- bam(mpg ~ s(wt), data = mtcars)
m1$data
#> [1] NA

# using these in define_kernel() results in warning.
# use_vr_model() might be a way to silence these warnings?
data_list1 <- list(m = m1)
data_list2 <- list(m = use_vr_model(m1))

#just wrapped the relevant piece of `define_kernel()` for example purposes
# lines 110–122 in define_kernel.R
foo <- function(data_list) {
lapply(
  data_list,
  function(x) {
    x <- ipmr:::.protect_model(x)
    
    na_test <- suppressWarnings(any(is.na(x)))
    
    if(na_test) {
      warning("'data_list' in 'define_kernel()' contains NAs. Is this correct?",
              call. = FALSE)
    }
    
    return(x)
  })
}

foo(data_list1)
#> Warning: 'data_list' in 'define_kernel()' contains NAs. Is this correct?
#> $m
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> mpg ~ s(wt)
#> 
#> Estimated degrees of freedom:
#> 2.4  total = 3.4 
#> 
#> fREML score: 76.6982
foo(data_list2)
#> Warning: 'data_list' in 'define_kernel()' contains NAs. Is this correct?
#> $m
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> mpg ~ s(wt)
#> 
#> Estimated degrees of freedom:
#> 2.4  total = 3.4 
#> 
#> fREML score: 76.6982

^{Created on 2022-02-25 by the reprex package (v2.0.1)}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS  10.16                
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2022-02-25                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version    date       lib source                       
#>  P backports     1.2.1      2020-12-09 [?] CRAN (R 4.0.2)               
#>  P cli           3.0.1      2021-07-17 [?] CRAN (R 4.0.2)               
#>  P crayon        1.4.1      2021-02-08 [?] CRAN (R 4.0.2)               
#>  P digest        0.6.27     2020-10-24 [?] CRAN (R 4.0.2)               
#>  P ellipsis      0.3.2      2021-04-29 [?] CRAN (R 4.0.2)               
#>  P evaluate      0.14       2019-05-28 [?] CRAN (R 4.0.1)               
#>  P fansi         0.5.0      2021-05-25 [?] CRAN (R 4.0.2)               
#>  P fastmap       1.1.0      2021-01-25 [?] CRAN (R 4.0.2)               
#>  P fs            1.5.0      2020-07-31 [?] CRAN (R 4.0.2)               
#>  P glue          1.4.2      2020-08-27 [?] CRAN (R 4.0.2)               
#>  P highr         0.9        2021-04-16 [?] CRAN (R 4.0.2)               
#>  P htmltools     0.5.2      2021-08-25 [?] CRAN (R 4.0.2)               
#>  P ipmr        * 0.0.4.9001 2022-02-24 [?] Github (levisc8/ipmr@b83717c)
#>  P knitr         1.36       2021-09-29 [?] CRAN (R 4.0.2)               
#>  P lattice       0.20-44    2021-05-02 [3] CRAN (R 4.0.2)               
#>  P lifecycle     1.0.0      2021-02-15 [?] CRAN (R 4.0.2)               
#>  P magrittr      2.0.1      2020-11-17 [?] CRAN (R 4.0.2)               
#>  P Matrix        1.3-4      2021-06-01 [3] CRAN (R 4.0.2)               
#>  P mgcv        * 1.8-36     2021-06-01 [3] CRAN (R 4.0.2)               
#>  P nlme        * 3.1-152    2021-02-04 [3] CRAN (R 4.0.2)               
#>  P pillar        1.6.2      2021-07-29 [?] CRAN (R 4.0.2)               
#>  P pkgconfig     2.0.3      2019-09-22 [?] CRAN (R 4.0.2)               
#>  P purrr         0.3.4      2020-04-17 [?] CRAN (R 4.0.2)               
#>  P Rcpp          1.0.7      2021-07-07 [?] CRAN (R 4.0.2)               
#>  P reprex        2.0.1      2021-08-05 [?] CRAN (R 4.0.2)               
#>  P rlang         0.4.11     2021-04-30 [?] CRAN (R 4.0.2)               
#>  P rmarkdown     2.10       2021-08-06 [?] CRAN (R 4.0.2)               
#>  P rstudioapi    0.13       2020-11-12 [?] CRAN (R 4.0.2)               
#>    sessioninfo   1.1.1      2018-11-05 [3] CRAN (R 4.0.0)               
#>  P stringi       1.7.4      2021-08-25 [?] CRAN (R 4.0.2)               
#>  P stringr       1.4.0      2019-02-10 [?] CRAN (R 4.0.2)               
#>    styler        1.5.1      2021-07-13 [3] CRAN (R 4.0.2)               
#>  P tibble        3.1.4      2021-08-25 [?] CRAN (R 4.0.2)               
#>  P utf8          1.2.2      2021-07-24 [?] CRAN (R 4.0.2)               
#>  P vctrs         0.3.8      2021-04-29 [?] CRAN (R 4.0.2)               
#>  P withr         2.4.2      2021-04-18 [?] CRAN (R 4.0.2)               
#>  P xfun          0.25       2021-08-06 [?] CRAN (R 4.0.2)               
#>  P yaml          2.2.1      2020-02-01 [?] CRAN (R 4.0.2)               
#> 
#> [1] /Users/scottericr/Documents/GitHub/heliconia-ipm/renv/library/R-4.0/x86_64-apple-darwin17.0
#> [2] /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpEJbNZM/renv-system-library
#> [3] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
#> 
#>  P ── Loaded and on-disk path mismatch.

Release ipmr 0.0.5

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

Ipmr cran submission

Update url
Update startup message
Low hanging fruit feature additions

stoch_param models for different levels of discrete variation

See email from Sanne (5.16.22) regarding this. This is related to #26, in that multiple stochastic simulations need to run.

I haven't yet tracked down the precise source of this, but it seems parameter set indices aren't currently getting updated in the names of population state, so that is a good place to start.

in `lambda()` on stochastic IPMs, `log = TRUE` has no effect for `type_lambda = 'last'` or `'all'`

When running lambda() on a stochastic IPM, the log argument has no effect on the lambdas returned for "last" or "all". To be consistent, it seems like it should have an effect. If you have an idea for the desired behavior (e.g. log all type_lambda options for stochastic models by default, or log only when type_lambda = "stochastic"), I could make a small PR with a fix.

`define_env_state()` doesn't work properly with `predict()` method for vital rate function.

If I use predict() to define a vital rate function and include an environmental covariate in newdata, I get an error in model.frame.default().

A contrived example:

Starting with the example IPM from this section of the intro vignette, I fit a model with survival as a function of (randomly generated) precip.

data("iceplant_ex")
library(tidyverse)
iceplant_ex <- iceplant_ex %>% add_column(precip = rgamma(nrow(.), shape = 1000, rate = 2))
surv_mod <- glm(survival ~ size + precip, data = iceplant_ex, family = binomial())

I added surv_mod to constant_params and replaced s_lin_p with

    s_lin_p = predict(surv_mod, newdata = list(size = surf_area_1, precip = precip)),

Running make_ipm() on this modified example gives the error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : variable lengths differ (found for 'precip')

Traceback:

max_age API

Per DZC, age-size models with different functional forms for age < max_age and age = max_age won't work as implemented. This could become a commonly desired feature for Animalia

Differing mesh sizes for adults and juveniles

Hi,

I am trying to create an IPM for kauri (Agathis australis) as part of my PhD and I have split the trees up into adults (Diameter over 2.5cm) and saplings/seedlings (Diameter under 2.5 cm). The measurement variable is height for the saplings, and DBH for the adults. I am currently trying to build an IPM with 10,000 classes for the adults, and 1,000 for the saplings/seedlings, as the sapling/seedling stage only represents a small portion of the life cycle of kauri. However, when I try this, the lambda is too high to be feasible and the fecundity kernel is not correct, as it creates juveniles/seedlings in class 1-10 (as expected), but also class 100-110, 200-210 etc. Is there any way that I can go about fixing this, or correcting for this? (e.g. adjusting the fecundity kernel in R to change these values to 0 after the model is built?)

Thank you for creating this package, it is great, I am quite new to coding and population modelling and found it very user friendly!

Kind Regards,

Toby Elliott

`right_ev()` does not scale population state correctly (colSums !=1)

https://github.com/levisc8/ipmr/blob/6dd0e718eb47960cfc6d66a865a4cb86a9e38fa2/R/generics.R#L2041

This doesn't result in columns summing to 1 like it is intended to. For example:

x <- matrix(c(rnorm(5), rnorm(5, 10), rnorm(5, 100)), ncol = 3)
out1 <- x/colSums(x)

out2 <- x
for(i in 1:ncol(x)) {
  out2[,i] <- x[,i]/sum(x[,i])
}
#test
colSums(out1) == 1
#> [1] FALSE FALSE FALSE

colSums(out2) == 1
#> [1] TRUE TRUE TRUE

This results in wonky looking state distributions when using right_ev(), at least for simple_di_stoch_param models. Haven't checked other IPM types.

age within stage models

Better thought of as an extension of age_x_size (I think). Basically need a function called something like define_age()/define_age_within_stage() with an interface like:

define_age(age = list(yearling = c(0,0),  
                                   small = c(1,4),
                                   large = c(5, 10), 
                                   greybeard = c(max_age, max_age))

This info then gets folded into .init_iteration.

`all.equal()` is too strict for checking for convergence

I think all.equal() is probably not the right choice for checking convergence in is_conv_to_asymptotic(). Consider these last two stochastic (log) lambdas:

end  <- -0.002483481
start <- -0.002507642

They should be within tolerance of 1e-4 (0.0001), right? is_conv_to_asymptotic() uses all.equal() to determine this, which says they're not within tolerance, because it uses a relative difference for numbers not super close to 0.

end <- -0.002483481
start <- -0.002507642
format(end-start, scientific = FALSE)
#> [1] "0.000024161"
format(1e-4, scientific = FALSE)
#> [1] "0.0001"

abs(end - start) < 1e-4
#> [1] TRUE
dplyr::near(end, start, tol = 1e-4)
#> [1] TRUE
all.equal(end, start, tolerance = 1e-4)
#> [1] "Mean relative difference: 0.009728683"
all.equal(end, start, tolerance = 1e-2)
#> [1] TRUE

^{Created on 2022-02-28 by the reprex package (v2.0.1)}

log(mean(lambdas)) ≠ mean(log(lambdas))

When calculating stochastic lambda, the outputs of these two lines are not the same and it seems like they should be.

lambda(ipm, type_lambda = "stochastic", log = TRUE)
log(lambda(ipm, type_lambda = "stochastic", log = FALSE))

This is because the mean of all log-lambdas is not the same as the log of the mean of all lambdas

I think maybe .thin_stoch_lambda() should be something like:

.thin_stoch_lambda <- function(lambdas, burn_ind, log) {

  if(length(burn_ind > 0)) {
    out <- mean(lambdas[-c(burn_ind)])
  } else {
    out <- mean(lambdas)
  }
  if(log) out <- log(out)
  return(out)
}

hex logo

You need a hex logo for ipmr! Stickers to give out at conferences, something for users to put in slides and on posters, etc. (I am not volunteering myself to make one, to be clear)

periodic models

Current interface assumes all kernels operate on the same time scale without intermediate states. Need an interface for sequential kernels acting on intermediate states - think it should look something like this:

init_ipm(..., is_periodic = TRUE) %>%
... %>%
define_impl(
  state_start = c('size', 'size_1_star', 'size_2_star`),
  state_end  = c('size_1_star', 'size_2_star', 'size')
) %>%
define_pop_state(
  ...
  period_seq = c('size', 'size_1_star', 'size_2_star', 'size')
)

Check for convergence in stochastic lambda?

Discussed in #40

^{Originally posted by Aariq October 6, 2021}
Right now it seems like it doesn't make sense to use conv_plot() or is_conv_to_asymptotic() on stochastic IPMs since both functions use the lambdas for each transition year. I'm not super familiar with the literature on this, but it makes intuitive sense to me that you'd want to check for convergence in the stochastic lambda. Couldn't you do this by checking for convergence after calculating a cumulative mean of log-lambda?

changing title in conv_plot()

when plotting an ipm with parameter set(s) the conv_plot() shows the same title for all indices values. This will make it difficult to trace back to a specific index level in case of problems/non-convergence

stochastic left/right evs

@SanneE1 mentioned that the documentation states stochastic versions return xyz, but they aren't actually implemented. So... probably should implement these.

API for certain density dependent IPMs is pretty rubbish

See example at end here.

The divide by 200 thing in N = (d_z * sum(n_z_t * B_z / 200)) ^ 0.67 is going to be v confusing to those not familiar w/ how each expression in ... gets evaluated over z_1/z_2. Need to think about some way of recognizing this situation and automatically dealing with it.

as.matrix() method for sub-kernels

I would expect as.matrix(ipm$sub_kernels$P) to print a matrix, but it prints the same output as ipm$sub_kernels$P. It might be nice to have an as.matrix() method that strips the class() back to just "matrix" "array".

harmonize returned values for `lambda`

I find it confusing that lambda returns λ when type_lambda = 'all' and type_lambda = 'last', but returns log(λ) for type_lambda = 'stochastic'. I suggest adding a logical argument log that allows the user to toggle between these scales. Given the function name lambda, I think it would be most natural to set log = FALSE as the default.

`make_ipm()` errors when `sample_env()` draws only one param

If you take the "modeling the environment directly" example from the introduction vignette and edit it so only temp is a covariate,make_ipm() gives the following error:

#> rlang::last_trace()
<error/purrr_error_bad_element_type>
Element 2 of `.x` must be a vector, not a function
Backtrace:
    █
 1. ├─`%>%`(...)
 2. ├─ipmr::make_ipm(...)
 3. ├─ipmr:::make_ipm.simple_di_stoch_param(...)
 4. │ └─ipmr:::.bind_all_constants(...)
 5. │   ├─.flatten_to_depth(env_state, 1) %>% .[!duplicated(names(.))]
 6. │   └─ipmr:::.flatten_to_depth(env_state, 1)
 7. │     └─purrr::flatten(to_flatten)
 8. └─purrr:::stop_bad_element_type(...)
 9.   └─purrr:::stop_bad_type(...)

All I did was remove precip from the formulae in define_kernel() so they are:

    g_mu    = g_int + g_slope * surf_area_1 + g_temp * temp,
    s_lin_p = s_int + s_slope * surf_area_1 + s_temp * temp,

And change env_states to :

env_states <- data.frame(temp = rnorm(10, mean = 8.9, sd = 1.2))

allow define_env_state to take expressions

Required for Padrino, and probably desirable for interactive use. Multivariate distributions would still require a function wrapper, but non-joint distributions samplers don't.

define_env_state(precip = rgamma(1, pars))

AND

prec_sampler <- function(pars) list(precip = rgamma(1, pars))
define_env_state(env_vals = prec_sampler(env_pars))

should produce identical results.

CC->CC transition for general models fails when n_mesh_p is not the same for each kernel

It looks like for at least 1 model in Padrino, evaluation of the expressions produces an output of the wrong length, which breaks .fun_to_iter_mat. This is because each domain (n_seedling = 137, n_ramet = 500) aren't properly crossed. This is not an issue when n_seedling = n_ramet.

Need to implement crossed versions of these domains for models where there are multiple continuous states, and then work out how to replace these internally, as the user-facing API shouldn't change.

bump test coverage before submission

will need a bit more feedback before writing in edge cases, but there's a considerable number of inputs that aren't as well tested as they should be (if at all)

https://codecov.io/gh/levisc8/ipmr/tree/master/R

speed up tests as well. I think generel_di_stoch_param is particularly redundant

remove purrr Imports

Pretty much everything using purrr can be re-written to use lapply, Map, Filter, Reduce, and list(ipmr:::.flatten_to_depth(x, 1L)) .

Possibly move to Suggests, as some examples/vignette code may be more expressive if using it than base.

integration rules

New options should be "cdf" and "b2b".

It may make sense to re-examine how expressions are evaluated internally now. I think pretty much all vital rates will now have to become call2(<int_rule_fun>, !!! vr_expr) where <int_rule_fun> sets up the meshes and modifies vr_expr appropriately.

e.g. "cdf" needs to change Norm(size_2, mu_g, sd_g) to pnorm(size_2_u, mu_g, sd_g) - pnorm(size_2_l, mu_g, sd_g)

uninformative error when iteration matrix is all NAs

I'm not sure what I did to cause this error when running make_ipm() yet, but I tracked it to this line of code with debug().
https://github.com/levisc8/ipmr/blob/17a3db21391be33d793402592e66c7833342e097/R/internal-make_ipm.R#L1301
mat is all NAs so any(mat < 0) returns NA and the if() errors with:

Error in if (any(mat < 0)) { : missing value where TRUE/FALSE needed

I'm guessing my all-NA iteration matrix shouldn't have even existed and it should have errored earlier. If I figure out what I did wrong, I'll reply here.