canmod / macpan2 Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 0.0 46.42 MB

Rebuilding https://github.com/mac-theobio/McMasterPandemic/

Home Page: https://canmod.github.io/macpan2/

License: GNU General Public License v3.0

R 60.20% C++ 38.71% Makefile 0.26% TeX 0.63% Rez 0.01% MATLAB 0.19%

compartmental-models epidemiology forecasting mixed-effects model-fitting optimization simulation-modeling simulation

macpan2's People

Contributors

Stargazers

Watchers

macpan2's Issues

Modify the square bracket function so that arbitrary sub-blocks of the matrix can be extracted

There are three arguments:

m -- a matrix being accessed
i -- the column index
j -- the row index

Currently i and j can only take 1-by-1 matrices. Please modify so that they can take arbitrary matrices of indices and apply the following algorithm (or a more efficient equivalent).

Flatten i and j by stacking columns on top of each other
Return the sub-matrix of m containing the rows identified by i and columns identified by j
The order of the rows and columns in this matrix should be consistent with the ordering provided by i and j

Quickstart version 1 -- hello world, code only

Debugging vignette

Write a vignette illustrating the options(error = recover) strategy and why it is the best approach with the object oriented architecture. Should illustrate how and why anonymous functions that immediately follow calls to lapply in the stack of often of particular interest.

Model updating functionality

Problem -- synchronization v speed trade-off

We always store constructor arguments as fields. We always try to store information that is derived from the arguments in methods so that if the argument fields are updated the derivations will be in sync with the new arguments. Sometimes these derived methods are problematically slow (e.g. when computing the ad_fun -- this really cannot be computed each time we need a simulation or the whole TMB performance gains will be severely compromised. On the other hand we do not want to store this type of derived information as fields because then they get out of sync if the arguments are updated.

Solution

Create a Synchronize class that is a direct parent for classes that contain a single derived information method.
Instances of these child classes can be composed with the focal class containing the problematic derived information method.
Classes inheriting from Synchronize also contain a field that caches the results of the derived information method.

document `oor` package some more?

It would be helpful to say a little bit more about oor in the README file; I tried to install locally and failed, then figured out that remotes::install_github() would work automatically.

Ability to write model definition files from a model that resulted from a product

Gamma Kernel

User Story

I want to generate a gamma convolution kernel on the c++ side so that I can optimize its shape parameters.

Signature

gamma_kernel(length, proportion, mean, cv)

Arguments

length -- Number of time-steps in the kernel
proportion -- Height parameter for the kernel
mean -- Location parameter for the kernel
cv -- Spread parameter for the kernel

Development Notes

This should be already done in macpan1 as the only kernel available in that engine. The arguments map as follows -- proportion = c_prop, mean = c_mean, cv = c_cv

See here for a description and here for the implementation.

User Notes

The following expression,

reports = convolution(I, gamma_kernel(14, c_prop, 0.1, 0.25))

would allow a user to fit the c_prop parameter used to control the under-reporting fraction.

Flows product

Create a flows_explicit method in Model that always returns the flows with the optional columns (e.g. from_partition)
[ ]

Colon operator and sequence function

Sequence

R-side symbol -- seq
Three arguments
- from -- first integer in the sequence
- length -- length of the sequence
- by -- number of integers to skip between adjacent elements in the sequence
Example:
- Input: seq(0, 4, 6)
- Output: c(0, 6, 12, 18)

Colon

R-side symbol -- :
Two arguments
- from -- first integer in the sequence
- to -- last integer in the sequence
Example:
- Input: 5:7
- Output: c(5, 6, 7)

Add transmission matrix expressions to StandardExpr

clean up starter models directory to include only actual starter models

The SI_products directory for example should be somewhere else, because if you try to create a compartmental model from this directory it will fail. But any directory in starter_models should be a valid compartmental model.

Update: perhaps what we mean by an 'actual starter model' is one with a README.md file that contains a yaml header? This would mean that other starter models are OK to be in the directory, but they will not show up with show_models() or in https://canmod.github.io/macpan2/articles/example_models.html and so will be unlikely to be discovered.

Update how coverage is saved in github actions

Tests for simulation blocks and r_par_id and etc macros

Finalizing the model definitions vignette

possible inconsistency in terminology for flow spec

in the "Flow between Compartments" section of vignettes/model_definitions.Rmd, the columns of the flows data frame are initially described as

from | to | rate_component | component_type

but then in the bulleted list, the last two columns are named

 flow | flow_type

moreover, in the SIR example, the last two column names in flows.csv are

flow_component | component_type

i'm not sure whether this is a true inconsistency or whether it's just that i'm not yet familiar enough with macpan2, but i thought i'd flag it as it was unclear to me as a macpan2 novice.

Managing Sparsity

Background

We want to avoid sparse matrices for now, but we do have genuinely sparse matrices -- rate and flow matrices -- that will slow things down if expressed as a dense matrix.

Can we do a few things now that will soften this issue?

We are already planning to get the rates in triplet form -- from, to, rate. This raises the possibility of addressing any use cases directly from the triplet form without explicitly forming a matrix -- dense or sparse.

One use case is to just compute the inflow and outflow vectors, which could be solved by a tapply-like function on the triplet form.

However, if we want to take the dominant eigenvalue of the next-generation matrix, we need (I think need) to take a matrix inverse and this will presumably benefit from genuinely sparse methods.

The main concerns with moving to sparse matrices are that it would increase our testing burden and possibly the complexity of some functions that require handling sparse and dense cases differently. But if we make special functions that take triplet form input and then do specialized targeted tasks that 'need' sparse methods (eg nextgen matrices), we could isolate this complexity.

Functions

`groupSums`

Arguments:

column vector, x, of values to sum
column vector, z, the same length as x containing indices into the return vector, y

y = Zeros(z.max() + 1, 1);
for (i = 0; i < x.size(); i++) {
  y[z[i]] += x[i]
}
return y;

Quickstart guide -- draft descriptive text

Fix tests so that they all have at least dummy objective functions

Make sure to address the off-by-one indexing issue

Script for SV-E-IH-R model

Write a script that directly calls TMBModel and TMBSimulator
Have clear comments about the what the model inputs mean
Have the TMBModel object be as close as possible to what we believe now will be generated by the model files translator
Create another version of the script that includes a time-varying parameter

Vignette on simulations involving randomness and rounding

Assign function

User Story

I want to be able to break apart a matrix into smaller pieces (e.g. unpack a state vector into scalar states; state -> S,I,R), so that I can update the matrix/vector using linear algebra but still have convenient access to the components as variables in and of themselves.

Signature

assign(m x1, x2, ...)

Arguments

m -- A matrix with values
x1, x2, ... -- Matrices that will have their values modified in-place by the values in m

Behaviour

In column-major order, loop over the elements in m and assign them to the elements in the x matrices in column-major order as well. Stop either when all elements in m have been assigned or when all elements in the x matrices have been filled, which ever comes first.

Return Value

A single one-by-one matrix with a zero in it -- this is a stand-in for NULL.

The null matrix is in the last position of the 'mats' list. Therefore, on the R side we need to pass one more matrix than the user provides. For sanity we should always pass this null matrix even if the assign function is not used in the model.

This is a difference between mats (on the c++ side) and valid_vars (on the R side). The former has one more element than the former. This additional element is the null matrix. To compute its zero based index into mats, we compute length(valid_vars) on the R side. Therefore, when the assign function is used, the expr_output_id = length(valid_vars)

Parse derivations.json and flows.csv

The derivations.json file can be used to generate what we will call 'user-defined expressions' and flows.csv can be used to generate state-updating expressions.

We need the following three methods that take a model definition and return all user-defined expressions that should get passed to the ...

... before argument in TMBModel
... during argument in TMBModel
... after argument in TMBModel

We also need a method to generate the state-updating expressions, which should be appended at the end of the during list generated in the parsing of the user-defined expressions.

Tasks

flows.csv
derivations.csv

Concatenation of String objects not finished

Steps

library(testthat); library(macpan2)
x = macpan2:::StringUndottedVector("S", "E", "I", "R")
y = macpan2:::StringUndottedVector("D")
z = macpan2:::StringUndottedVector("S", "E", "I", "R", "D")
expect_identical(c(x, y), z) # failing now

The problem seems to be inconsistent and incomplete implementation of value_combiner methods.

pkgdown

C++ rbind_lag and rbind_time validity checking

Currently segfaulting sometimes I think

Modify the specs and variable names related to time lags and list variables

Remove lists from the spec
Remove time lags from the spec, at least as an atomic concept
Remove expr_output_count because without lists this will always be 1
Describe how the engine makes both the (1) matrix-valued arguments themselves and (2) the indices to these arguments both available to a developer of a new function
Modify the names of argument value list and argument index list in the C++ code so that they are more descriptive of this idea

Vignette on how to add engine functions

time varying parameter example in a vignette somewhere

GH pages tab that points to example models?

Requested by @bbolker in more general #37

Product interface

Editable lists in model definitions

We had isues with model editing in
'macpan 1.5'. Here we develop the requirements for addressing these issues in macpan2.

R CMD Check Github action can't find an oor function

https://github.com/canmod/macpan2/actions/runs/3672579440/jobs/6208884098

This makes no sense to me, because the oor version on github and drat are both exporting the method. Sigh

pkgdown site

One-sided flows without needing to specify a placeholder state

New functions rbind_lag and rbind_time

Objectives

Remove extract_lag, extract_time, select_lag, select_time
Replace with rbind_lag and rbind_time

Specs rbind_lag

Arguments
- m -- a matrix with saved history
- i -- a column vector of integers
Behaviour
- For each value of i, access the value of the matrix, m, that many time steps in the past
- If each value of m has the same number of columns then create a single matrix by stacking the rows on top of each other in the order provided by i -- and return the result
- If not all values of m have the same number of columns then return an error
- If the simulation history is not saved for the first argument, then throw an error unless the user only asks for the current matrix
- At each iteration the vector of lags will determine a set of time-steps -- but only time-steps between 1 and T inclusive are valid, and all others should be thrown away

Specs rbind_time

Same as for rbind_lag but now the indices in i refer to absolute time steps instead numbers of time-steps in the past.

Example User

Notes to the future

We might want to extend this to cbind_lag, cbind_time, flatten_lag, and flatten_time, but for now rbind_lag and rbind_time are fine.

Settings products}

Make sure anything with RecycleInPlace C++ code doesn't segfault

Objective function in c++

Implement the spec on the C++ side (@guanwg)
Add the normal, and poisson distribution functions to the function list (@guanwg)

Pass through existing R-side functionality now, adding validity checks

This doesn't need to catch everything. The definition of done will be to make one pass through all of the R code.

Convolution

Use-case example

kernel = gamma_kernel(9, 0.1, 0.25, 0.4)  ## pre-simulation loop
reports = convolution(foi * S, kernel)  ## every simulation loop

Specs for convolution

convolution(m, kernel):

m -- matrix with saved history
kernel -- column vector of length less than the number of iterations

At iteration, t, do the following.

If t < length(kernel) return zero, otherwise continue.
For each element of m, get a vector with the history of this vector over each of the proceeding length(kernel) time times (including the current time)
For each element of m, take the inner product between this history vector and the kernel.
Return a new matrix the same shape as m but with the inner products.

Clean up validity messaging

This is a technical task to deal with a limitation of the current machinery for communicating to the user when there is a validity problem. TODO: add more detail

Quickstart2 for a structured model

Using sir_vax

Derivations products

Non-portable makefile warning from R package check

Here is the failing step.

https://github.com/canmod/macpan2/actions/runs/3626152432/jobs/6114886769#step:5:188

Here is the offending file.

https://github.com/canmod/macpan2/blob/main/Makefile

Not sure why.

github pages tweaks

add link to GH repo (I think there's a standard way to do this?)
#40

C++ developer utility function to repeat dimensions of 1 over an n by m matrix

As a C++ developer who gets a matrix with one row or one column,
I want a function to repeat the values in those rows and columns n and m times,
so that I can use it in for loops over rows and columns with n by m matrices.

For example, consider the following function.

case MP2_NORMAL_DENSITY:
  rows = r[0].rows();
  cols = r[0].cols();
  m = matrix<Type>::Zero(rows, cols);
  for (int i=0; i<rows; i++) {
      for (int j=0; j<cols; j++) {
          m.coeffRef(i,j) = -dnorm(r[0].coeff(i,j), r[1].coeff(i,j), r[2].coeff(i,j), 1);
      }
  }
  return m;

Here we have three arguments observed, r[0], expected, r[1], and standard deviation, r[2]. It would be nice to be able to do this:

case MP2_NORMAL_DENSITY:
  rows = r[0].rows();
  cols = r[0].cols();
  r[1] = RecycleInPlace(r[1], rows, cols);
  r[2] = RecycleInPlacer[2], rows, cols);
  m = matrix<Type>::Zero(rows, cols);
  for (int i=0; i<rows; i++) {
      for (int j=0; j<cols; j++) {
          m.coeffRef(i,j) = -dnorm(r[0].coeff(i,j), r[1].coeff(i,j), r[2].coeff(i,j), 1);
      }
  }
  return m;

Simplify spec

We do not need

DATA_IVECTOR(mats_save_hist) -- because we have (r|c)bind_(lag|time) we have a more flexible way of specifying what history should be saved
DATA_IVECTOR(expr_output_count) -- because we do not have lists of matrices anymore, the output count is always 1

calibrate + forecast vignette

Right now I don't think it's possible for a non-developer to figure out how to use macpan2 for calibration and forecasting. I understand that's not the focus of Irena and Mike's efforts right now (or of the upcoming workshop), but it would be good to have a short document that shows how to do this - maybe for the SIR model, for some simulated data and then for some real SIR-ish data (e.g. take an example or two from the fitode package ?)

Move macpan.cpp over to src

Maybe through a make rule so that C++ developers can treat the current location as a development environment, just as we do now in macpan1.

New function: triplets_to_matrix

New function called triplets_to_matrix(x, v, m) with the following arguments.

x -- A matrix
v -- A column vector of values to add to elements of x
m -- A matrix of integers

Here is pseudo-code for the function.

# 1. Fill x with all zeros.
x[,] = 0

# 2. Add elements in v to elements of x, according to the indices in m.
for (int k=0; i<n; i++)
  x[m[k,0], m[k,1]] += v[m[k,2], 0]

# 3. Return x.
x

canmod / macpan2 Goto Github PK

macpan2's People

Contributors

Stargazers

Watchers

macpan2's Issues

Problem -- synchronization v speed trade-off

Solution

User Story

Signature

Arguments

Development Notes

User Notes

Sequence

Colon

Background

Functions

groupSums

User Story

Signature

Arguments

Behaviour

Return Value

Tasks

Steps

Objectives

Specs rbind_lag

Specs rbind_time

Example User

Notes to the future

Use-case example

Specs for convolution

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`groupSums`