project-aero / measles-ews Goto Github PK

View Code? Open in Web Editor NEW

0.0 13.0 0.0 4.69 GB

A research compedium (data, code, manuscript) for project on detecting critical slowing down in measles dynamics.

License: MIT License

R 2.49% Shell 0.03% Makefile 0.03% Dockerfile 0.01% Rich Text Format 0.01% HTML 76.23% HiveQL 21.21%

aero critical-slowing-down early-warning-signals measles niger seir pomp

measles-ews's Introduction

measles-ews

Data and code for project aimed at detecting critical slowing down in measles dynamics from four cities in Niger.

NOTE: The code in this repository relies on pomp version 1.18. You may get errors if using other versions. To download this specific version, use devtools:

install_packages("devtools")
devtools::install_version("pomp", version = "1.18", repos = "http://cran.us.r-project.org")

Overview

The main goal of this project is to detect critical slowing in real disease dynamics. To do so, we use a model-based approach, which allows us to simulate approaches to transcritical bifurcations. The model, however, is empirically-based because we use SEIR models fit to time series data from four cities in Niger. Thus, this work represents a step in the direction of an empirical test of CSD, though with still a lot of help from the model. We use early warning signals to detect critical slowing down.

There are five main steps in the analysis.

Fit mechanistic SEIR models to weekly case count data from four cities in Niger (requires High Performance Computing).
Estimate uncertainty around maximum likelihood parameters via parametric bootstrap (requires High Performance Computing).
Simulate replicate emergence and elimination time series using fitted model parameters (does not require High Performance Computing, but is sped up by using multiple cores on the Drake Lab high memory machine).
Calculate early warning signals (EWS) in null and test intervals of the replicate simulation time series (does not require High Performance Computing, but is sped up by using multiple cores on the Drake Lab high memory machine).
Compute Area Under the Curve values for each EWS using the distribution of EWS calculated from the replicate simulations (no High Performance Computing!).

Notebooks

Many of my ideas and work in progress beyond that presented in the manuscript file and the description in this ReadMe can be found in a set of R Markdown Notebooks in the measles-ews/docs/notebooks/ subdirectory. Likewise, progress during the model fitting stage -- different model fitting strategies, approaches for estimating parameter uncertainty -- can be found in the physical TREDENNICK Lab Notebook (#00074).

Directory and file information

aux/: Contains R scripts that are no longer in use, either because they were dead-ends or because they were revamped for production.
code/: Contains R scripts necessary to reproduce the all analyses, from model fitting to simulating to calculation of early warning signals. I describe the key scripts below, other scripts are mainly for plotting/checking intermediate results.
- analyze-elimination-grid-sims.R: Calculates early warning signals in two windows from the replicate elimination simulations and the area under the curve statistics.
- analyze-emergence-grid-sims.R: Calculates early warning signals in two windows from the replicate emergence simulations and the area under the curve statistics.
- analyze-mvw-elimination-grid-sims.R: Calculates early warning signals in moving windows from the replicate elimination simulations and the area under the curve statistics. Only presented in supplementary material.
- analyze-mvw-emergence-grid-sims.R: Calculates early warning signals in moving windows from the replicate emergence simulations and the area under the curve statistics. Only presented in supplementary material.
- boot-mif-job*.sh: Bash scripts (* = 1,2,3,4,5) for running maximization by iterated filtering (MIF) on the Olympus High Performance Computing cluster. The scripts run instances of the bootstrap-fit-mif.R script in parallel, in 1000 core batches.
- bootstrap-fit-mif.R: Fits an SEIR model to weekly case count data using MIF as implemented with the pomp::mif() function. In this case, the "data" are stochastic realizations from the fitted model -- thus, model fitting is for a parametric bootstrap. This script designed to be run on a High Performance Computing cluster only. See boot-mif-job*.sh description above.
- define-continuous-measles-pomp.R: This script generates pomp models to be used for model fitting. One pomp model is generated for each city and saved as measles-pomp-object-*.RDS, where * is the name of the focal city. These files are stored in the code directory for easy access.
- estimate-transmission-state.R: This script peforms a plain vanilla particle filter at the fitted MLE parameters to generate one-step-ahead predictions for calculating a generalized R² for model fit. It was also initially designed to estimate R_E over time when transmission rate was allowed to take a random walk. However, further analyses showed that the highest likelihood models do not include a random walk tranmission rate. In other words, we found no evidence of a directional temporal trend in transmission rate.
- fetch-clean-data.R: This scripts reads in the raw case count and demographic data to make clean data structures for use in the pomp models and elsewhere. Clean data reside in the data/clean-data/ subdirectory.
- find-beta-random-walk-intensity.R: Runs the pomp MIF2 algorithm starting at the MLEs but let's tranmission rate take a random walk. The idea is to test for a directional trend in tranmission. The models suggest there is no directional trend.
- global-search-mif.R: This is the key script for model fitting. It implements the pomp MIF2 algorithm across a grid of starting values for model parameters. This script designed to be run on a High Performance Computing cluster only. Depends on fetch-clean-data.R and define-continuous-measles-pomp.R.
- initial-mif-job*.sh: Bash scripts (* = 1,2,3,4,5) for running maximization by iterated filtering (MIF) on the Olympus High Performance Computing cluster. The scripts run instances of the global-search-mif.R script in parallel, in 1000 core batches.
- make-pomp-filtering-function.R: Defines a function for generating a pomp model for particle filtering.
- make-pomp-simulating-function.R: Defines a function for generating a pomp model for making long run simulations. The function is called to make simulators of emergence and elimination.
- setup-mif-outputs.R: A utility script to generates empty CSV files for storage of model parameters from MIF on the Olympus HPC.
- simulate-bootstrap-series.R: Simulates replicate stochastic realizations of 11 years of weekly case counts from the MLE parameter sets for each city. These series are used for fitting parameters again to perform a parametric bootstrap and estimate uncertainty around parameter values.
- simulate-elimination-grid.R: Simulates replicate elimination scenarios at different speeds of vaccination uptake.
- simulate-emergence-grid.R: Simulates replicate emergence scenarios at different levels of susceptible depletion.
data/: Contains raw and cleaned versions of case count and demographic data. See ReadMes therein and the supplementary material on demographic data. Note that the raw data are provided to Project AERO by Matthew Ferrari. All subsequent use should be approved by M. Ferrari, J. Drake, and P. Rohani. There is also a subdirectory of spatial data for plotting the location of the four focal cities in this analysis.
docs/: Contains the manuscript and supplementary information documents. These are .Rmd files that are knitted to PDF. This directory also contains subdirectories for project notebooks (notebooks/) and the original project protocol (protocol/). Other subdirectories are made on the fly when knitting the Rmd to PDF.
figures/: This directory is now deprecated. It contains figures for the manuscript and SI that I am keeping here for posterity, but all manuscript and SI figures are made now made on the fly in the Rmd docs.
results/: Contains CSV and RDS files of analysis results. These are either intermediary and called by scripts for further analysis or are used in manuscript/SI docs to make figures/tables.
simulations/: Contains RDS files that store simulation results.

Contact

Andrew Tredennick ([email protected])

measles-ews's People

Contributors

Watchers

measles-ews's Issues

Time variables and units

In an effort to get everything in the units of yr^-1, the following things need to happen:

Make a new time column in the observation data frame that is decimal_date(date)
Set up the covar data frame to have a similar time column as decimal date, but do so on daily time steps to match the eventual model simulation time step
Redo the generation of the B-spline seasonal bases to have a period of 1 yr over the daily time steps in the covar data frame
Store observation and covar data frames separately, or nested

See recent King work for a good example.

Streamline particle filtering code

Need to streamline the estimate-transmission-state.R code, perhaps by breaking into one script that has a function for defining the pomp model and one for looping over cities for actual filtering.

Makefile

Develop a Makefile that documents (and reproduces) the workflow for the project.

EWS backward looking

@e3bo, can you take a look at my script here: https://github.com/project-aero/measles-ews/blob/master/code/calculate-ews.R

I have set the bandwidth to 35 weeks and backward_looking = TRUE, but values for the stats start to show up at the 10th observation. I thought I wouldn't see anything until the 36th observation, e.g., once there are 35 data points behind the focal data point to calculate the EWS. Thanks for your thoughts.

Add demographic data description to SI material

Need to add section on the demographic data interpolation to the SI material.

Re-do emergence and elimination simulations

There was an error (now fixed, dae9b94) in the make-pomp-simulator-function.R code: I didn't reset initial values for exposed and infected classes. This doesn't make much of difference for the elimination simulations because they burn in from initial conditions, but it does impact the emergence simulations.
For emergence simulations, initial exposed and infected should be set to zero. The function now allows this. Also, simulations now start from the middle of the first year so that dynamics start in the trough of seasonal transmission dynamics. Again, not much of an impact for the elimination simulations, but this better represents the scenario for re-emergence after a big outbreak.

For consistency, need to re-run:

simulate-emergence-grid.R (DATE COMPLETED: 2019-zz-zz)
simulate-elimination-grid.R (DATE COMPLETED: 2019-zz-zz)
analyze-emergence-grid-sims.R (DATE COMPLETED: 2019-zz-zz)
analyze-elimination-grid-sims.R (DATE COMPLETED: 2019-zz-zz)

single filtering failure

There is a persistent filtering failure, which I've traced to the conditional likelihood for the first observation. The problem is that cases gets zeroed out at each observation, meaning cases = 0 for all particles when evaluating the first observation. It's also just better practice to estimate states right before the first observation. So, that's what needs to be done.

I will set up the pomp models to start 1 week before the first observation, ensuring that cases != 0.

Alter AUC figures

Need to make AUC figures more intuitive. Pair each (emergence and elimination) with example time series and EWS calculation to make it clear what goes into the figure. Then separate emergence and elimination.

EWS on data

Calculate the suite of EWS on the actual data for each city. Use bandwidth of ~30.

Check R_E calculation for different versions of SEIR model

R_E is different in the SEIR model depending on the vital dynamics (demography) included. Check with Eamon to make sure these are all correct in the 2 different versions of the pomp model: the fitting model and the simulating model.

Add section on SEIR simulator.

Need to add section on how the SEIR simulator model is a little different than the SEIR fitting model, i.e., includes deaths etc.

Convert vacc speeds to weeks until transition

beta random walk

Need to get the random walk right for transmission in the post-MLE pomp process model. Currently, it is just being perturbed, which is not a random walk because there is no dependence on the previous value.

autocorrelation for elimination

Need to add an analysis that implements Eamon's approach for quantifying autocorrelation as outlined in the Distance to epidemic threshold paper. It should serve as the EWS for elimination, not lag-1 autocorrelation as currently presented.

Snippet of code here:

get_fit <- function(y, tstep = 1/52, est_K = FALSE, cutoff = .06) {
  x <- (seq_along(y) - 1) * tstep
  start <- list()
  im <- match(TRUE, abs(y) < cutoff)
  xs <- x[1:(im - 1)]
  ys <- y[1:(im - 1)]
  start$gamma <- try(unname(coef(lm(log(I(abs(ys))) ~ xs))["xs"]))
  if (!inherits(start$gamma, "try-error")){
    spec <- spectrum(y, plot = FALSE, na.action = na.exclude)
    start$omega <- spec$freq[which.max(spec$spec)] / tstep
    start$a <- 0
    fit_osc <- try(minpack.lm::nlsLM(
      y~sqrt(1 + a^2) * exp(x * gamma) * sin(x * omega + atan2(1, a)),
      start = start, na.action = na.exclude,
      control = minpack.lm::nls.lm.control(maxiter = 1000)))
    if (est_K) {
      fit_decay <- try(minpack.lm::nlsLM(y~K * exp(x * gamma),
                                         start = list(gamma = start$gamma, K = y[1]), na.action = na.exclude,
                                         control = minpack.lm::nls.lm.control(maxiter = 1000)))
    } else {
      K <- y[1]
      fit_decay <- try(minpack.lm::nlsLM(y~K * exp(x * gamma),
                                         start = list(gamma = start$gamma),na.action = na.exclude,
                                         control = minpack.lm::nls.lm.control(maxiter = 1000)))
    }
    if (inherits(fit_osc , "try-error")) {
      e_osc <- Inf
    } else {
      e_osc <- fit_osc$m$resid()
    }
    if (inherits(fit_decay, "try-error")) {
      e_decay <- Inf
    } else {
      e_decay <- fit_decay$m$resid()
    }
    nll <- function(resids) {
      n <- length(resids)
      (sum(resids ^ 2))
    }
    aic <- c(constant = nll(y), fit_decay = nll(e_decay) + 2 * (1 + est_K),
             fit_osc = nll(e_osc) + 2 * 3)
    fits <- list(constant = "constant_y=0", fit_decay = fit_decay, fit_osc = fit_osc)
    
    coefests <- try(coef(fits[[which.min(aic)]])[c("omega", "gamma", "a")])
    if (inherits(coefests, "try-error")){
      coefests <- c(NA, NA, NA)
    }
    names(coefests) <- c("omega", "gamma", "a")
    c(list(coef = coefests), fits)
  } else {
    c(list(coef = c("omega" = NA, "gamma" = NA, "a" = NA),
           fits = list(constant = "contant_y=0",
                       fit_decay = NA, fit_osc = NA)))
  }
}

cases <- readRDS(datafile)
cases <- cases %>%
  filter(year > 1994) %>%
  filter(region == "Maradi (City)") %>%
  pull(cases)

y <- acf(cases, lag.max = length(cases)-30, plot = TRUE)
acf_fits <- get_fit(y = as.numeric(y[[1]]))
g <- coef(acf_fits$fit_osc)["gamma"]
w <- coef(acf_fits$fit_osc)["omega"]
distance_to_threshold <- sqrt((g)^2 + (w)^2)

Lead times

As a way to bump up the impact of the paper, look at lead times of different EWS prior to outbreaks. Recipe is:

Use AUC values from the fixed window analysis to determine the EWS value that serves as a threshold between non-emerging and emerging.
Conduct moving window EWS analysis on replicate sims.
Calculate length of window that the EWS is above the threshold.
Do 3 for many replicates and get distribution of lead times for different EWS.

EWS ~ R_E correlations

Take another look at the analysis of temporal correlations between EWS calculated over a moving window and the effective reproduction number. Perhaps only show for high performing EWS. Visualize as a scatter plot.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble