traitecoevo / plant Goto Github PK

Trait-Driven Models of Ecology and Evolution :evergreen_tree:

Home Page: https://traitecoevo.github.io/plant

Makefile 0.06% R 30.06% C++ 68.15% CSS 0.02% TeX 1.71%

ecology evolution demography plant-physiology c-plus-plus r trait dynamic simulation science-research

plant's Introduction

plant: A package for modelling forest trait ecology and evolution

The plant package for R is an extensible framework for modelling size- and trait-structured demography, ecology and evolution in simulated forests. At its core, plant is an individual-based model where plant physiology and demography are mediated by traits. Individual plants from multiple species can be grown in isolation, in patches of competing plants or in metapopulations under a disturbance regime. These dynamics can be integrated into metapopulation-level estimates of invasion fitness and vegetation structure. Accessed from R, the core routines in plant are written in C++. The package provides for alternative physiology models and for capturing trade-offs among parameters. A detailed test suite is provided to ensure correct behaviour of the code.

Citation

Falster DS, FitzJohn RG, Brännström Å, Dieckmann U, Westoby M (2016) plant: A package for modelling forest trait ecology & evolution. Methods in Ecology and Evolution 7: 136-146. doi: 10.1111/2041-210X.12525

Documentation

An overview of the plant package is given by the above publication. Further background on the default FF16 growth model is available in Falster et al 2011 (10.1111/j.1365-2745.2010.01735.x) and Falster et al 2017 (10.1101/083451).

plant comes with a lot of documentation, available at https://traitecoevo.github.io/plant/. Initial versions for some of the material there was also included as supplementary material with the publication about plant, which can be accessed here.

Package structure

Plant is a complex package, using c++14 behind the scenes for speed with R6 classes (via the Rcpp and RcppR6 packages). In this blog post, Rich FitzJohn and I describe the key technologies used to build the plant package.

If you are interested in developing plant you should read the Developer Notes.

Installation

Requirements

You must be using R 4.1.0 or newer. At this stage the package is not on CRAN. You're options for installing are described below.
Installation requires a c++14 compatible C compiler (OSX >= 10.10/Yosemite satisfies this, as do standard linux Ubuntu 12.04 and 14.04). On Windows machines you will need to install Rtools. When I tried this in Rstudio, the program automagically sensed the absence of a compiler and asked if I wanted to install Rtools. Click Yes!

Option 1, using remotes::install_github

The plant package can be installed direct from github using the remotes package:

remotes::install_github("traitecoevo/plant", dependencies=TRUE)

To install a specific (older) release, decide for the version number that you want to install in https://github.com/traitecoevo/plant/releases e.g.

remotes::install_github("traitecoevo/[email protected]", dependencies=TRUE)

with "v1.0.0" replaced by the appropriate version number. Note, the latest version of plant resides on the develop branch, which is sporadically released. plant follows semantic versioning meaning that major version indicate a potential break in backward compatibility.

Option 2, building from source

If familiar with git you might find it easiest to build plant directly from the source code. This is most useful if developing new models or strategies, or to contribute new features.

First, clone the plant repository

git clone https://github.com/traitecoevo/plant

Open an R session in the folder, then to install dependencies run

devtools::install_deps()

Then to compile the project

devtools::install()

devtools::load_all()

Usage

Here are some example publications using plant:

Falster DS, FitzJohn RG, Brännström Å, Dieckmann U, Westoby M (2016) plant: A package for modelling forest trait ecology & evolution. Methods in Ecology and Evolution 7: 136-146. DOI: 10.1111/2041-210X.12525 code: github
Falster DS, Duursma RA, FitzJohn RG (2018) How functional traits influence plant growth and shade tolerance across the life cycle. Proceedings of the National Academy of Sciences 115: E6789–E6798. DOI: 10.1073/pnas.1714044115 code: github
Falster DS, Kunstler GK, FitzJohn RG, Westoby M (2021) Emergent shapes of trait-based competition functions from resource-based models: a Gaussian is not normal in plant communities. The American Naturalist 198: 256–267. DOI: 10.1086/714868 code: github

plant's People

Contributors

Stargazers

Watchers

plant's Issues

Translation for densities

I think we resolved this, but I don't immediately see code that tests it. Can you (@dfalster) point me at the docs, and we'll get this one nailed down while sorting out the comparison.

Disturbance regime within Parameters should be simplified

Perhaps don't keep the whole Disturbance object in the Parameters vector, but just the mean disturbance interval. Then each Patch will create one, given the parameters.

Seed rain should be part of the `Parameters`

Set up should be similar to disturbance regime. This will simplify the setting of seed rain, and help give a more unified interface with the stochastic model.

The motivation here is simplifying the comparison with the reference model. The counter motivation is just that we'll be solving for zero seed rain in running the EBT, which makes it seem better suited to being an argument. But I don't think that's a good reason for having it separate.

Better generation of templated types.

At the moment, I have things like new(PatchCohortTop, ...) to generate a new object of type Patch<CohortTop>. What would be better is if there was a function, say patch, that we could do patch(CohortTop) (or better Patch(CohortTop), but I'm not sure if the generating object will allow this easily?).

The object CohortTop is of class C++Class.

If there was some way of doing something like this:

RcppExport SEXP patch(SEXP Individual, ...) {
  SEXP ret = R_NilValue;
  if (Individual.type = Plant) {
    Patch<Plant> obj(...);
    ret = Rcpp::wrap(obj)
  } else if (...) { // more valid types
  } else {
    ::Rf_error("Cannot construct valid patch")
  }
  return ret;
}

Check initial conditions documentation

Some uncertainty in the equations as to if $\pi_0$ should appear in the initial condition for $n$.

Public members of Strategy and Parameters

Currently, there is a control member of Strategy, and all members of Parameters that are public. This probably ties things a bit too much to the current implementation and should be reviewed.

ode_values is not sufficient to reinitialise Species<CohortTop>

We also need to track CohortTop::pr_patch_survival_at_birth per cohort, and the ode variables for Species::seed (the boundary condition).

Once this is fixed, some of the tests will simplify as recreating the initial conditions reliably is tricky.

Simple driver for a single patch

It would be nice to remove the ode driving code from Patch. We should be able to drive this easily enough from R, I think? Or make a small wrapper class that takes care of the details.

Comparison with old version

We have the comparison with the individual growth model, but I'd like to get some comparisons with the previous EBT itself.

Previously, we wrote some small programs that would generate a series of outputs; run the new version of the model with the same parameters and see how we compare.

This is going to be tricky because the models will not line up particularly well; I'd expect fairly large quantitative differences. So rather than the random parameters that we picked before, we might want to select parameters that generate qualitative differences in model output.

This is further complicated by the issues around what we do if there is a difference. Bug finding through two separate versions of the model will be time consuming. At the same time, knowing that we have this correct is important.

Consistency of iterator typedefs (and others)

There are a number of iterator typedefs, but two basic styles: those for the container-type classes (using e.g. species_iterator) and the ode classes (using ode_iter). Probably a good idea to harmonise these at some point.

Ode statistics and control parameters are ignored

Currently, ODE statistics (count and failed_steps) are ignored; these should trigger a failure when they become too bad to stop us getting tied up in impossible grinding calculations.

Similarly, the control parameters (step_size_min, step_size_max, no_steps_max) are never set or used.

How long to run EBT for?

Need to port over (and/or adjust) the logic from faster-traitdiversity.

Original version was based on cumulative probability of patch survival, I believe.

Cohort refinement code only works for one species

Needs expanding to work for more than one

Improve calculation of Ode error control

Daniel had a nice weighting scheme for the ODE stepper. Document the previous scheme, and work out how to include weights in our ODE stepping in a fairly general way.

Turn into R package

This will completely remove the issues around paths and loading, which is starting to become a little tedious.

This probably only requires a NAMESPACE and DESCRIPTION file and we're done, so should be easy. Some of the helper code would move into R/, but some of that is test-specific. Some care might be needed for the original R implementation of the model though. It might be best to shift that into the tests directory rather than actually include it in the exported parts of the package.

We'll need to look at how Rcpp.package.skeleton() suggests sorting out the linking, includes and module loading.

Mutant fitness needs calculating

This is the fitness of a trivial number of individuals; so introduce at a level where they do not affect the light environment.

Practically, the way to do this is to just exclude these strategies from the calculation of light environment, I think.

Patch initialisation unclear

The logic around how the initial seed variables are computed is still unclear in Patch (and to a degree in Species, too). This should be simple, so go through and confirm we're doing the correct thing, document it and simplify what is fairly opaque at the moment.

SeedRain for the stochastic model?

In theory this is quite simple -- if the seed rain is a constant process, unaffected by the model itself, then it is representable by a vector of rates (r1, r2, ..., rn) for n strategies; over a period of time dt, the expected number of seeds that arrive is then Poisson distributed with mean ri * dt for the ith species. We can just take a draw of that many seeds and add them to the seed input from dispersal.

Seeds produced could end up in that returned rate, perhaps?

Move scheule building code into tree

Currently in scripts/build_schedule-fun.R.

This is just a reminder for me before I start relying on that code elsewhere...

Disturbance should use R's Weibull distribution functions

At the moment, things are computed manually, but will be slightly faster and less error prone to do so via Rmath's Weibull functions. In particular, that will allow better sampling of probabilities of disturbance for the stochastic model if that turns out to be useful.

This primarily affects the methods Disturbance::set_parameters_post_hook and Disturbance::survival0.

Error handling via exceptions.

Apparently, Rd_error is discouraged as it can cause leaks by skipping destructors. Using proper C++ exceptions would be a better long term solution.

The big stumbling block is getting sensible strings built, as previously I was using printf style composition. There is a boost library that supports this, but the alternative is substantially more cluttered code for a minor realised gain.

The effort/payoff relationship here will change if we ever start responding to exceptions within compiled code rather than just bailing back to R.

Different error scales grind EBT to halt

The seed production for the largest cohort is on a different scale to the rest of the population, and causes the ODE error checking to fail; it looks like there is no time step small enough to get the errors to the correct order.

First, confirm that this suspicion is correct.

If so, we could try and work in a log-basis for the seed production. Alternatively, we could discard some of the error checking here, or check only on relative/absolute error. Check to see what the original implementation did as a reference.

Bounds checking on input parameters

Currently it is possible to create a Strategy with impossible parameters -- things like negative heights. There should be some bounds checking; at least ensuring positiveness for most values. Should be easy to fix:

new(Strategy, list(lma=-1)) # fails
new(Strategy, list(hmat=-1)) # surprisingly does not fail

Trapezium integration calculation slow due to copying

The trapezium calculation takes a lot of time (surprising amount) due to allocation/freeing of vectors. Try to do this without the temporary space (perhaps via iterators?).

Multi-species SeedRain issues

There are a couple of related issues with the seed rain that will cause problems when there are multiple species.

Firstly, the environment does not expose a way to to update the current index for the seed rain for iterating over the different species.

Secondly, the Patch does not carry out this iteration!

As a result, at the moment everything has the seed rain from the first species only.

Presentation for scientific meeting on Nov 6

Should include some visualisations, perhaps from current talk

Outline machine learning challenges

Encapsulation (or lack thereof) in Environment members

Both light_environment and time are accessed/set via get/set methods, but no checking or anything interesting happens here. Is this really the best way to do this? Are the set methods only interesting/useful from R? Is anything actually being encapsulated, or is this being a bit of a pseudo-class?

Store/replay exact times of calculation during ODE

Rather than use the adaptive ODE stepper, we need to be able to store/replay times that calculations are carried out over. This will be essential for testing mutant fitness in the EBT. I don't think a similar requirement holds for the individual base model.

Prepare documentation for TREE package

Want to get documentation sorted out for the following, names in brackets responsible for first draft:

scientific directions (Daniel)
physiological model (copy form paper and adjust: height rather than leaf mass) (Daniel)
EBT math (Daniel)
evolutionary algorithms (Daniel)
code design (Rich)
machine learning challenges (both)

Get Felix a demo script

Loading cannot please both R CMD check and devtools?

In R/zzz.R, the module is loaded. If done via loadRcppModules(), R CMD check is happy, but devtools cannot generate the documentation. If done via loadModule("tree", TRUE) within .onLoad(), R CMD check cannot run set_sane_gsl_error_handling within .onLoad().

This is related to issues between the devtools and Rcpp packages and may be resolved at some point:

http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2012-March/003601.html
r-lib/devtools#61
r-lib/devtools#193
r-lib/devtools#253

Expose control parameters in ODE solver

Currently control parameters are not exposed or set (they can't be set at present!).

These include the step controls in OdeControl (especially eps_abs and eps_rel), parameters step_size_min, step_size_max and no_steps_max within Solver.

In contrast with the simpler classes, like Integrator, these probably don't all want to pile in with the constructor.

Lookup set_parameter can change state even on failure

If parameters in a Lookup-derived object are set, but some are invalid, parameters will be set down the list until the first invalid parameter. Ideally if parameter setting failed, the object would be unchanged.

This only affects access from R, and is not a huge priority because the "correct" thing to do here is not likely to be relied on in any of our code.

Change light environment calculations in Falster-traitdiversity for comparison with TREE

Still differences in results when comparing output from TREE to Falster-traitdiversity. These differences could arise from

differences in implementation, e.g. use of cohort tops vs cohort centres
differences in numerical issues, e.g. integration techniques and accuracies
unidentified bugs in either version.

Rich is keen to track down the error to ensure against bugs.

To aid comparison, Daniel suggests modifying Falster-traitdiversity to do calculation of light environment using cohort tops rather than cohort centres. This would involve writing a new version of function EBT_Base::calculate_env_at_height using cohort tops, which are already implemented.

At that point the two implementations should be identical, except for various numerical issues.

Description of "leaf area above" for cohorts

Check that the approach is correct and add justification to the EBT description document.

ode_values is not sufficient to reinitialise Species<CohortDiscrete>

Similar to #40, but simpler, we also need to keep track of how many individuals are in each discrete cohort. This is the simpler case, so probably should get fixed first, and allow thinking about a general interface.

Add sapwood turnover

Requires an size dimension for diameter of plant. Should be relatively straightforward to extend to second size dimension, provided have a single starting state.

below are some notes from a conversation with Åke Brännström about the second dimension (link to email)

I believe that a geometric increase in complexity in a transport
equation only occurs when (a) you have a diffusive term or (b) the
initial data is not one-dimensional. The latter is an interesting
observation. Given any n-dimensional transport equation and any
1-dimensional curve as initial data, the EBT method would almost
certainly be able to solve the equation in only n times the number of
operations needed for a genuine 1-dimensional case. This is not the
case for a finite difference method or an upwind scheme and may be a
important feature of the EBT method. It may be, however, that with
1-dimensional initial data, the n-dimensional transport equation can
be rewritten to a system of PDEs, each with one non-temporal variable.
I will have to think more about whether this is possible.

one can directly write down a transport
equation for the density n(ms, mh, t) and this equation can be solved
without a geometric increase in complexity provided that the initial
data is one-dimensional and the birth state at most one-dimensional.

Unclear how `Species::initialise` should treat private member `seed`

Something like this might be desirable on clear()

template <class Individual>
void Species<Individual>::initialise() {
  Individual seed_new(strategy.get());
  seed = seed_new;
}

This is primarily to guarantee that the seed has the same state after a clear. However, this is only really a problem for Species<CohortTop> at the moment.

Dispersal survival in EBT

In contrast to Daniel's implementation, the survival during germination is not yet applied to the seed output. I suspect we'll sort this out during the fitness calculations, but this is something to be aware of.

I think I'm actually in favour of never using Pi_0 within the CohortTop, because it can't know that number, as it's a Patch-specific quantity. We just need to make sure that we deal with it later. So during post-processing make sure that calculation happens, and recognise that seed output recorded in CohortTop is pre dispersal.

Don't reinitialise integrator for each plant, each step.

I'm not sure if this is a bottleneck, so check first!

In Plant::compute_assimilation, a new integrator is created for every step. It could be faster (but perhaps more memory intensive) to store the integrator within the Plant object. Probably better would be to provide a partly-cooked object that needs the functor object that we do create already.

If we do that, then Plant::compute_vars_phys needs to change signature to take the integrator object and pass it through to compute_assimilation.

Alternatively, the environment could probably hold the integrator fairly easily? But that feels a bit dirty.

Remove cohorts once density drops too low

Say 10-8 or something. Then we can fit more cohorts in at the beginning where they are important.

Would want to be able to turn this off, to make sure it doesn't affect anything.

Default cohort introduction times

The original implementation had default introduction times set up so that there was a higher density of introductions early on in patch age. This was set up so that the initial distribution of cohort times mimics the final distribution to some degree.

The function was:

void EBT_Base::set_default_cohort_introduction_times(double end, double small_step, double large_step, double multiplier)
{
        cohort_intro_default.clear();
        double dt=0, time=0;
        while(time <= end )
        {
                cohort_intro_default.push_back(time);
                dt = pow(2.0, floor(log2(time*multiplier)));
                time += max(min(dt, large_step), small_step);
        }
        cohort_intro_default.push_back(time+dt);        
}

so, step size is increasing linearly with time, but with steps induced by the pow(2, floor(log2(time * multiplier)) line. The steps are bounded to lie in [small_step, large_step]. Here is the distribution graphically with the default parameter values (running out until steps become equally spaced):

end <- 30
small.step <- 1e-5
large.step <- 2.0
multiplier <- 0.2
tt <- numeric(0)
dt <- time <- 0
while (time <= end) {
  tt <- c(tt, time)
  dt <- 2^floor(log2(time * multiplier))
  time <- time + max(min(dt, large.step), small.step)
}

plot(tt[-length(tt)], diff(tt), log="xy", xlab="Time", ylab="Step size")
curve(x * multiplier, add=TRUE, col="red")
curve(2^floor(log2(x * multiplier)), add=TRUE, col="blue", lty=2, n=1001)
abline(h=c(small.step, large.step), col="grey")

I presume there is a good reason for the stepped distribution (presumably around keeping the step sizes being a double of the previous step size, and related to how subsequent refinements will work).

So: should we keep this approach, or just have the linear relationship (i.e. the red line, switching to the grey lines where appropriate)? This is really easy to change on the fly, but I'd like to document the reason for the more complicated approach if we stick with it.

Vectorisation of C++ functions

For ease of calling from R, it would be nice if some of the C++ functions were vectorised. There is already some of this in the r_ interface, I think. This could be come a performance issue by avoiding lots of R->C overhead.

One example is the functions Disturbance::survival_probability and Disturbance::density, which can be sensibly vectorised with respect to time.

This adds a certain amount of boilerplate to the files though, so probably best not to race around and do this everywhere. Better might be a templating solution, so that we could write

    .method("density",         util::vectorize<double>(&model::Disturbance::density))

or something.

Move key size variable from leaf mass to height

If the key size dimension is height, the model becomes a lot easier to think about (especially with respect to the height at maturation trait).

This change would affect a lot of Plant and related subclasses, and probably require a fair bit of work. In particular, the leaf mass at birth becomes the height at birth, which is slightly harder to find because it's not naturally bounded (bounded at the lower limit, but not at the other, aside from Strategy::hmat). However, if we can move easily from total mass to height (or something similar) easily enough that would make a reasonable bound.

Last event of CohortScheule should be treated differently?

There is an issue here -- the last time that we run for needs to introduce a cohort, but we don't actually want to do that. So I might need to be more creative about how this works.

Consistency of control parameters

Control parameters are set differently for different objects; at initialisation for Integrator and FindRoot, via a set function for AdaptiveSpline and not at all for ode::Solver. Needs tidying up.

Inconsistent use of methods "reset" and "clear"

Still quite a few inconsistent methods -- especially clear (more common on container-like objects) and reset. However, things like Patch are often a container, and sometimes something more.

This is a particular problem for Patch, which has an initialise method and a clear method that do similar things.

Keep Patch::initialise the way that it is, but change Patch::clear to be Patch::reset and have it clear out all species, clear the environment (and recompute) and clear out the ode solver. Then set_seed_rain will trigger reset, as will EBT::set_cohort_schedule.

I think I'll git rid of all clear methods except for things that are clearing out container like things -- but I'm not sure that we have any of those!

How to compute light environment with CohortTop

With CohortTop, we don't yet know how to compute the light environment. This is the relevant equation eq:light in doc/EBT.md:

$E(z,a) = \exp \left(-c_{ext} \sum_{i=1}^{N} \int_{0}^{\infty} \phi(m) , L(z, h(m)) , n(x_i, m,a) , dm \right)$

EBT and "patch_area"

There is no concept of patch_area in the EBT, but it still appears in the Patch::canopy_openness calculation. There are two solutions for getting rid of this:

Specialise the canopy_openness method for types of CohortTop
Set parameters->patch_area = 1 within the EBT.

Both have weaknesses; the first means that if we implement CohortMean we'll have to do the same there. The second means we'll have to watch out for anything that might set patch_area in the future (though this should be OK because parameters are immutable once the EBT is created at the moment).

Also, of some concern, none of the tests picked up that this is a still unresolved. The main effect is simply to rescale parameters->c_ext, though, which is never fatal.

Provenance of `c_d0`

In the original EBT version, the Strategy parameter c_d0 was set as

0.01/exp(-c_d1*608.0);

(c_d1 is 0.0065)

rather than the given value of 0.52 in the paper, this evaluates to 0.520393415085166. However, the 608 here is the same as the "default" value for rho, so should this actually change with rho?