GithubHelp home page GithubHelp logo

cdcgov / multisignal-epi-inference Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 2.0 3.08 MB

Python package for statistical inference and forecast of epi models using multiple signals

Home Page: https://cdcgov.github.io/multisignal-epi-inference/

Python 98.03% Makefile 0.57% R 0.19% Dockerfile 0.45% Shell 0.76%
epidemiology infectious-disease-models package renewal-process

multisignal-epi-inference's Introduction

Multisignal Renewal Project

⚠️ This is a work in progress ⚠️

Pre-commit installation and testing model installation and testing pipeline Docs: model codecov (model)

Overview

The Multisignal Renewal Project aims to develop a modeling framework that leverages multiple data sources to enhance CDC's epidemiological modeling capabilities. The project's goal is twofold: (a) create a Python library that provides a flexible renewal modeling framework and (b) develop a pipeline that leverages this framework to estimate epidemiological parameters from multiple data sources and produce forecasts. The library and pipeline are located in the model/ and pipeline/ directories of the GitHub repository, respectively.

Resources

General Disclaimer

This repository was created for use by CDC programs to collaborate on public health related projects in support of the CDC mission. GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.

Public Domain Standard Notice

This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.

License Standard Notice

This repository is licensed under ASL v2 or later.

This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.

This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.

You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html

The source code forked from other open source projects will inherit its license.

Privacy Standard Notice

This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/other/privacy.html.

Contributing Standard Notice

Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.

All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.

Records Management Standard Notice

This repository is not a source of government records but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.

Additional Standard Notices

Please refer to CDC's Template Repository for more information about contributing to this repository, public domain notices and disclaimers, and code of conduct.

multisignal-epi-inference's People

Contributors

afg6k7h4fhy2 avatar brandomr avatar cshelley avatar damonbayer avatar dependabot[bot] avatar dylanhmorris avatar gvegayon avatar natemcintosh avatar samuelbrand1 avatar sbidari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

brandomr ciemss

multisignal-epi-inference's Issues

Example numpyro models

As suggested by @seabbs, here is an issue for posting and discussing other Numpyro-based projects and what we can learn from them.

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/92
author: @dylanhmorris


Ramsey


exojax

Replicate Model 2 from cdcgov/wastewater-informed-covid-forecasting

Goal

Have a working Python library that implements the most basic version of the wastewater model. By most basic, we mean (a) hospitalizations only, and (b) single geographical unit (no pooled model).

Context

See #32.

Required features

  • The model (implemented either as a function or class) should be able to read hospitalization data simulated from the WW model.
  • It should fit the model using the no-U turn sampler.
  • Return the estimates.
  • And make predictions.

Specifications

  • Create (or bring in from WW) data for testing.
  • Write a Python script under model/src implementing the model (like this)
  • Document the model.
  • Write a Jupyter notebook walking through the process of read/fit/predict (vignette).
  • Have a test that compares the estimates of this Python module with the WW R package.

Out of scope

Features beyond the basic model.

Related documents

  • See #18 for math equations.
  • See the Stan implementation in the WW R package here

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/44
author: @gvegayoncdc

Set GHA for checking code coverage

Goal

We plan to use code coverage for this project. We need to set up the GHA CI for automatically checking code coverage.

Context

See #22.

Required features

See specs

Specifications

  • Setup the GHA yaml file under .github/workflows that runs code coverage for PRs and push to main.
  • Add a badge to README.md indicating the status of the workflow.

Out of scope

NA

Related documents

NA

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/27
author: @gvegayoncdc

Get Azure login GitHub action working

Goal

Create a new github action that logs into Azure, and prints out something simple to show that it is working.

Context

Getting Azure Github Actions working for this project will significantly improve our ability to automate testing on Azure. This issue is the first step towards that goal. See the Azure/login action.

@jkislin is assisting us getting the new credentials, and into the repo.

Required features

  • Create a new workflow file that logs into Azure
  • Add the necessary secrets to this repo's actions secrets

Specifications

  • Create the new workflow file, ideally as small and simple as possible
  • Receive new secrets
  • Put new secrets in the repo's actions secrets.

Out of scope

  • Anything beyond just getting logged in

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/60
author: @natemcintosh

Alternative to Jupyter notebooks

Goal

Have a lighter version of jupyter notebook that shows the raw python code without too much around it (i.e., large json files).

Context

Small changes in the code embeded in jupyter notebooks can lead to big changes in the jupyter files, inflating the number of changes tracked by git.

Required features

TBA

Specifications

TBA

Out of scope

TBA

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/88
author: @gvegayoncdc

Correctness unit tests for pyrenew.process

Goal

Unit tests to verify correctness of pyrenew stochastic processes.

Context

Currently, the only unit tests for stochastic processes verify that they can be instantiated and sampled. We would also like to verify that they correctly implement what they say they implement (e.g. a Normal random walk is actually a Normal random walk; an AR(1) process is actually an AR(1) process).

Required features / Specifications

  • Tests that assess whether samples produced by the sample() method are approximately distributed according to the target distribution
  • Tests that check that log_prob() values produced when using the processes for inference.

Out of scope

  • Other unit tests
  • Improved documentation for processes

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/61
author: @dylanhmorris

Refactor sample functions to use named arguments and **kwargs

Goal

Following #53, we have decided to use named arguments and **kwargs for sample functions in both RandomVariable and Model.

Required features

  • Metaclasses should reflect the change.
  • So this should also be reflected in all instances.

Specifications

  • Update metaclass.py.
  • Update latent
  • Update observed
  • Update model
  • Update deterministic

Also

  • Update docstrings to reflect the change.
  • Update the tests and docs.

Example:

def sample(
self,
random_variables: dict,
constants: dict = None,
) -> tuple:
"""Samples infections given Rt
Parameters
----------
obs : random_variables, optional
A dictionary containing an observed `Rt` sequence passed to
`sample_infections_rt()`. It can also contain `infections` and `I0`,
both passed to `obs` in `numpyro.sample()`.
constants : dict
Possible dictionary of constants.
Returns
-------
InfectionsSample
Named tuple with "infections".
"""
I0 = npro.sample(
name="I0",
fn=self.I0_dist,
obs=random_variables.get(self.I0_varname, None),

Instead:

    def sample(
        self,
        I0_prior = None,
        **kwargs,
    ) -> tuple:
        """Samples infections given Rt


        Parameters
        ----------
        obs : random_variables, optional
            A dictionary containing an observed `Rt` sequence passed to
            `sample_infections_rt()`. It can also contain `infections` and `I0`,
            both passed to `obs` in `numpyro.sample()`.
        constants : dict
            Possible dictionary of constants.


        Returns
        -------
        InfectionsSample
            Named tuple with "infections".
        """
        I0 = npro.sample(
            name="I0",
            fn=self.I0_dist,
            obs=I0_prior,

Observation model for genome concentrations

Goal

We want to have a single source of truth for the observation model for the viral genome concentrations in wastewater in each site and lab. The observed concentrations are a function of the true underlying concentration and the site-lab level scale factor and observation error standard deviation. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the observation model of wastewater genome concentrations in each site-lab
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing
  • time-varying observation error standard deviation

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/81
author: @kaitejohnson

Day-of-week effect

Goal

We want to have a single source of truth for the mathematical formulation of the day of week effect on the hospital admissions (and/or ED visits). We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the day of week effect applied to the hospital admissions and/or ED visit signals
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/75
author: @kaitejohnson

Class representing observed data

Goal

The module lacks a standardized way to process/represent observed data. We should have a class (implemented as an independent module) for processing epi data to facilitate the implementation of the package.

Context

This becomes clear as soon as we start working with time series data.

Required features

  • A class that reads in a dataset.
  • Possibly store the data using either polars.DataFrame or a named tuple of jax.numpy.Array.
  • The getter functions should raise exceptions when the user tries to pull some epi data.
  • The class should have a function (possibly a static method) to process the desired data.
  • Possibly implement a metaclass so each model could have their way to address data needs.

Specifications

TBD

Out of scope

  • None noted

Related documents

  • ADR on CDCent side (link)

Re-implementation of the pyrenew's BasicRenewlModel using the proposed modularization

Goal

Using the discussed programming patterns, reimplement pyrenew's BasicRenewalModel.

Context

This will serve as a concrete example of the proposed programming patterns ADR.

Required features

  • Should feature the basic model.
  • Have an independent observation process.
  • Have an example using the linkage module for n units.

Specifications

  • Port basic renewal to observations.
  • Add a test.
  • Add a reproducible example under docs.

Generation interval pmf

Goal

We want to have a single source of truth for the mathematical formulation of the generation interval probability mass function for flu and COVID. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • #43
  • sources for the underlying GI estimates for both diseases
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/72
author: @kaitejohnson

Infection feedback in R(t)

Goal

We want to have a single source of truth for the mathematical formulation of the infection feedback mechanism in the state-level $\mathcal{R}(t)$ We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical notation for the infection feedback mechanism to transform the unadjusted $\mathcal{R}^u(t)$ and associated language
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing
  • Other $\mathcal{R}(t)$ models (e.g. something that regresses towards 1). This is labeled as a "nice-to-have" and should be fleshed out in a separate ticket

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/70
author: @kaitejohnson

Create a Dockerfile for running the model code, locally or in Azure

Goal

Create a Dockerfile for the model folder. It should set up all the necessary dependencies, and install model as a package. Have make commands for easily performing the necessary docker commands, which can often be verbose.

Context

To run this model in Azure Batch, we will need to be able to hand Azure a Docker (or podman) image. There are possibly other ways to hand our model code to Azure, but this is likely the easiest way to ensure that it has the necessary dependencies, and it is the way we are already most familiar with.

Once there is a Dockerfile with corresponding make commands, integration with Azure will be much easier as the interface for running the model will be mostly defined already.

Required features

  • A working Dockerfile that installs everything necessary to run
  • New make commands for the primary docker tasks

Specifications

These are mostly suggestions, not strict specifications

  • Base the Dockerfile off of either:
    • This SO answer which seems fairly up to date, if verbose.
    • This solution from the poetry docs. This is much smaller and therefore easier to understand, but is perhaps less optimized.
  • A make build command for building the container
  • A make run command for running the container locally with any necessary input/output folders bind mounted. Should probably depend on the make build step.
  • A make launch command for building and launching the shell inside the container. This is very useful for debugging purposes.

Out of scope

  • Connecting with Azure

Related documents

Missing documentation

Goal

Fill the gaps. There's a lot of undocumented code. We need to make sure that it as the minimum: (a) Title, (b) brief description, (c) Parameters, and (d) Returns (tag).

Context

[Short paragraph describing how the issue arose and constraints imposed by the existing code architecture]

Specifications

Here is the list of currently undocumented functions/modules:

  • TBD

Out of scope

  • Add mathematical equations/definitions. That is needed but won't be approached here.

Related documents

Write a notebook demostrating pyrenew

Goal

Have a start-to-finish demo of pyrenew showcasing the library's main features.

Context

During porting from CDCent we realized the original demo notebook featured files not available in the repo.

Required features

  • Jupyter notebook with the code.
  • It should show an example simulating a renewal model using one of the classes included in the library.

Specifications

  • The notebook can be built automatically when the site builds (see #1)
  • It should include simulated data for a 'model fitting' example.

Out of scope

None

Related documents

See original code on CDCent.

cc @dylanhmorris

Prioritization of model features

We need to identify: (a) what are these specifically, (b) how we prioritize them. These have been mentioned a few times, but we need to ensure we have a clear path for development. Here is a list of features that have been mentioned before:

  1. Granularity: "[I]ntegrate other more geographically granular data signals even in the absence of a state or national signal at all" (https://github.com/cdcent/cfa-multisignal-renewal/discussions/2#discussioncomment-8317531).

  2. Multisignal fusion: Add arbitrary signals (vs hard-code them) using the same mechanism as in the WW model (https://github.com/cdcent/cfa-multisignal-renewal/discussions/2#discussioncomment-8318600).

  3. Arbitrary hierarchy: "[C]hange the hierarchical structure really easily depending on the availability of signals and how one wants to set up the hierarchy" (https://github.com/cdcent/cfa-multisignal-renewal/discussions/2#discussioncomment-8318848)

  4. Rt DGP: AR and Gaussian process for Rt (https://github.com/cdcent/cfa-stf-team-materials/discussions/21#discussion-6051731)

  5. Reporting delay DGP Hazard model (https://github.com/cdcent/cfa-stf-team-materials/discussions/21#discussioncomment-8082682)

  6. Rt granularity: Daily instead of weekly Rt (https://github.com/cdcent/cfa-stf-team-materials/discussions/21#discussioncomment-8082682)

For each one of these, we need to define the following:

  • Priority
    • Must have
    • Nice to have
  • Complexity
    • High (Re-factor+design existing code)
    • Medium (add a new class/extend function)
    • Low (twitch a few lines of code).
  • When
    • 2024-06-14 Working model prototype
    • Other?

More information

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/42

Model right-truncation

Goal

We want to have a single source of truth for the mathematical formulation for a minimal ability to model right truncation, e.g. assuming reporting delays follow a parametric distribution. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description for the right-truncation model implementation
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing
  • Joint inference of delay distribution (should be scoped in a separate issue)
  • Ability to combine parametric delay with hazards based delay (should be scoped in a separate issue)

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/78
author: @kaitejohnson

Standardize admissions nomenclature: use only "hospital admissions", not "hospitalizations"

Goal

In the current codebase, both "hospitalizations" and "hospital admissions" are used to refer to incident hospital admissions. The former is ambiguous and can be interpreted as prevalence (number of currently hospitalized individuals) as opposed to incidence (number of new admissions).

"Hospital admissions" is less ambiguous, and so we should standardize on that.

Time-varying IHR

Goal

We want to have a single source of truth for the mathematical formulation of the time-varying IHR. This allows for time-varying ratio of hospital admissions to genomes shed in wastewater (since we are starting by not allowing that quantity to vary). This could be due to new variants emerging with differences in severity, or for differences in the age distributions of infections. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the time-varying IHR
  • source for intercept of IHR estimate
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/74
author: @kaitejohnson

Port notebooks under model/docs to quarto

Goal

Jupyter notebooks are great, but not so for Git. We want to replace them (for the moment) with Quarto documents.

Context

See discussion on alternatives to Jupyter: #22

Required features

Currently, we have two Jupyter notebooks under model/docs/: getting-started and pyrenew_demo. Both need to be ported to quarto.

Specifications

  • Port getting-started.
  • Port pyrenew_demo.
  • getting-started has a gfm output saved under docs.
  • pyrenew_demo has a gfm output saved under docs.
  • Check whether there's a pre-commit for un-updated md documents. If it exists, add it.

Out of scope

  • Create the rst files for the website. That will be done after #1 is completed.

First pass on documentation

Goal

Give a first pass on the documentation, ensuring all the basics are there.

Context

Agreed by the team, we want to have good documentation from the beginning of the project.

Required features

  1. Each python function under model/src/pyrenew should:
  • Have a docstring featuring (i) a description, (2) parameters, and (3) outputs following numpy's style.

  • All functions should have type hints (link)

    For the latter, here is an example of a good typed hinted function:

    import jax.numpy as jnp
    from pyrenew.metaclass import RandomVariable
    
    def myfun(
      x: str,
      y: jnp.array,
      z: RandomVariable
      ) -> None
  1. In the case of the demos under model/docs, these should also be checked.

  2. __init__.py files should have a description of the module/submodule (currently missing).

Specifications

The following files under model/src/pyrenew should be checked:

@AFg6K7h4fhy2:

  • Top-level directory.
  • latent/
  • model/
  • observation/

@cshelley

  • process
  • Quarto files under model/docs.

Create a single PR per checkbox. PRs should use #16 as a baseline (so either wait until it is merged or use it as a baseline when creating the branch).

Wastewater genome concentration generative model

Goal

We want to have a single source of truth for the mathematical formulation of the generative model of viral genome concentration in wastewater. This should be a convolution of infections per capita and a normalized shedding kinetics distribution, scaled by the average genomes shed per infection. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the genome concentration generative function
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing
  • Time-varying number of genomes shed per infection
  • Expected variability in the genome concentration as a function of the number of contributing infections

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/80
author: @kaitejohnson

Porting weekly model from CDC ent

Goal

Port model from CDCent

Context

This model was identified during branch pruning of the CDCent side.

Specifications

  • Code matches Model and RandomVariable metaclasses
  • Code is properly documented using numpy's style.
  • Code is properly tested.

Port AR test from CDCent

Goal

Port existing test from CDCent to CDCgov (https://github.com/cdcent/cfa-multisignal-renewal/blob/27fe934f35deaf4b01573b407b7caddee2c4f57f/model/src/test/test_ar_process.py#L26-L40):

def test_ar_samples_correctly_distributed():
    """
    Check that AR processes have correctly-
    distributed steps.
    """
    ar_mean = 5
    noise_sd = jnp.array([0.5])
    ar_inits = jnp.array([50.0])
    ar1 = ARProcess(ar_mean, jnp.array([0.75]), noise_sd)
    with numpyro.handlers.seed(rng_seed=62):
        # check it regresses to mean
        # when started away from it
        long_ts = ar1.sample(5000, inits=ar_inits)
        assert_almost_equal(long_ts[0], ar_inits)
        assert jnp.abs(long_ts[-1] - ar_mean) < 4 * noise_sd

Context

NA

Required features

NA

Specifications

  • Include the test under tests/

Create simplest, running, Azure Batch kick-off script

Goal

Create a single python script to configure and run the model in Azure Batch.

Context

This will eventually be the primary script used for production runs. For now though, it should just get something super basic up and running in Azure Batch.

Required features

  • Use a certain pre-made pool in Azure
  • Run a few models in parallel, e.g. multiple states, multiple diseases
  • Run a post production task, that is dependent on all the model runs completing.
  • Make sure all output is saved to blob storage
  • Be able to kick it off as either a poetry script or a make command. Want to be able to run this from the command line locally, as well as run it in an automated fashion from CI (at a later point in time).

Specifications

  • Have good command line help and documentation
  • Read in a single primary configuration file
  • Start a single main job
  • For each model to run, kick off a task, giving it a properly formed:
    • Docker command, including bind mounts
    • Config file to run on
  • Create a post production task that depends on all model runs finishing. Should have properly formed:
    • Docker command, including bind mounts
    • Config file to run on

Related documents

  • None that I can think of

Hospital admissions generative model

Goal

We want to have a single source of truth for the mathematical formulation of the hospital admissions generative model from incident infections. This should be a convolution of infections and the infection to hospital admissions delay distribution scaled by the IHR. These are generated at the state-level in this first iteration, but should allow for modularity of sub-state (e.g. facility level) signals. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical notation for hospital admissions generative model
  • source for infection to hospital admissions delay distribution
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing)
  • adjustments to IHR, day of week effects, and observation models are separate issues)

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/73
author: @kaitejohnson

Standardize language

Goal

Make sure there's standardized language to refer to the different model components across modules.

Context

Required features

  • Clear language that can easily map to the documentation (theory).
  • More verbose variable names when possible.
  • Should have an overall standard, e.g., making the distinction between stochastic, deterministic, latent, etc.
  • Make sure to address correctness as in #39 (comment)

Specifications

TBD: List of all files under model/

Related documents

See #39 (comment)

Observation model for hospital signals

Goal

We want to have a single source of truth for the observation models for hospital admissions and ED visits. This should allow for modularity of the observation model (e.g. negative binomial or poisson). We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the observation model for hospital admissions and ED visit signals
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/77
author: @kaitejohnson

Wastewater site infections per capita

Goal

We want to have a single source of truth for the generation of site-level infections per capita. Wastewater site infections per capita are estimated via a site-level $\mathcal{R}^ui(t)$ that is hierarchically linked to the unadjusted state-level $\mathcal{R}^u(t)$. The infection feedback mechanism should be implemented at the site-level. Initialization of the site-level infections should be described, as is documented in the wastewater model documentation. Sites are sub-state populations (wastewater catchment areas can vary in size from covering multiple counties to as small as single facilities). We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the generation of the wastewater site level infections from the site-level $\mathcal{R}^ui(t)$
  • mathematical description of the site-level $\mathcal{R}^ui(t)$ from the state level $\mathcal{R}^u(t)$ via a mean-preserving AR(1) process
  • initialization of site-level infections per capita
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/79
author: @kaitejohnson

More flexible Rt class (possibly meta)

Goal

Given that we now have a latent module for latent (but potentially observable) variables, I propose creating an an Rt class there.

To represnt varying in time according to a particular stochastic process on a particular transformed scale, one would instantiate the Rt class (or a subclass of it), using stochastic processes from pyrenew.processes, transformations from pyrenew.transform etc.
-- #26 (comment) @dylanhmorris

Context

Re-implementation of pyrenew.

Required features

See goals

Specifications

  • Create the desired class under processes/
  • Have a test.
  • Have it properly documented, including an example in getting started or other doc under docs

ED visits generative model

Goal

We want to have a single source of truth for the generative model of the state-level ED visits, with modularity for sub-state (e.g. facility level) ED visits in mind. This might be identical to the hospital admissions generative model but with an estimate of the $P(ED_{visit}|I)$ and an estimate of the infection to ED visit delay distribution. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the ED visit generative model
  • estimate of prior on $P(ED_{visit}|I)$
  • estimate of infection to ED visit delay distribution or proposed method to obtain this delay distribution
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/76
author: @kaitejohnson

Add `typos` to actions for this repo

Goal

Add the typos action to this repo.

Context

I used it locally and found a few typos in pipeline. Would be good to have this done automatically to keep things spelled correctly.

Required features

  • Add typos action

Specifications

  • Add typos action

Out of scope

  • Anything beyond adding this action

Related documents

Add code coverage CI to pipeline code

Goal

Add new CI for code coverage tracking of the pipeline code.

Context

Code coverage CI was recently set up for the model code in the repo. This would essentially copy that setup, and make whatever modifications are necessary for the pipeline code.

Required features

  • Have pytest track code coverage
  • Display code-coverage the same way it is for the model code.

Specifications

  • Copy the CI code for the model CI as closely as possible

Out of scope

  • Anything outside of this CI

Related documents

CI for rendering quarto files

Goal

Set up a GitHub actions CI that takes the documents under model/docs and renders them automatically.

Context

NA

Required features

  • List all files under model/docs that end in the qmd extension.
  • Try to render them.
  • Fail if it doesn't work.

Specifications

  • workflow yaml.
  • badge on README.md

Renewal process

Goal

We want to have a single source of truth for the mathematical formulation of the infection generation process via a renewal process. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical notation for the infection generation process via a deterministic renewal process
  • initialization of the infection generation process described adequately
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing
  • This assumes a deterministic renewal process as is implemented in the current wastewater model. Stochastic implementation should be fleshed out and discussed in a separate issue.

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/71
author: @kaitejohnson

Remove #! from non-executable files

I see that a number of the .py files start with #!/usr/bin/env/python, as in:

https://github.com/CDCgov/multisignal-epi-inference/blob/main/model/src/pyrenew/basic.py#L1

These are only needed when you will be executing the file directly, because their purpose is to tell the shell which interpreter to run. I.e., so you can do ./foo.py rather than python foo.py. However, if the python code is just a module that isn't intended to be run but just imported, then you don't need this.

In addition, this is actually the wrong line, because what's after the #! needs to be something executable, but this isn't usually a valid path, because /usr/bin/env is a program, not a directory. Rather, you want /usr/bin/env python, which runs the env command to run python.

State-level unadjusted R(t)

Goal

We want to have a single source of truth for the mathematical formulation of the state-level unadjusted $\mathcal{R}^u(t)$ model. We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical notation for $\mathcal{R}^u(t)$ alongside associated language
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

Ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/64
author: @kaitejohnson

Copula implementation for US forecast

Goal

We want to have a single source of truth for the "bottom-up" copula implementation to generate a national forecast from the sum of the state-level forecasts. This has been discussed extensively in the cfa-forecasttools repo and for the We want to be able to use this to immediately code up the math decided upon here.

Context

This was discussed in the 2024-02-27 meeting (see minutes) and agreed upon in the ADR record under model_features.md. This specific feature should be the same as what's implemented in the existing wastewater model. Completion of this task will allow us to implement the feature more quickly and should facilitate creation of a model design document.

Required features

  • mathematical description of the copula implementation to produce a national forecast from state level forecasts
  • add to the feature to the model design document
  • Include an independent test that checks TBD.
  • Include documentation in the code using docstring

Out of scope

  • This does not include code, just specification of the feature in math/writing

Related documents

ref: https://github.com/cdcent/cfa-multisignal-renewal/issues/82
author: @kaitejohnson

Repo discoverability

Goal

The structure of this repository and goals of this project should be understandable to a new team member and, ideally, the public.

Context

As a new team member, I am finding it difficult to grok the repo / project. A few sentences in the readme and installation instructions (for both contributors and users) would go a long way.

Some questions I had that could be addressed:

  • What is the purpose of this project and who is the intended audience?
  • What does it mean that it is an internal forecasting model?
  • Does this repo contain "a model" as is stated in the readme? Or is it a library for creating models? Or a specific implementation of a model?
  • According to the readme, the project has two main folders - model and pipeline. What's in those folders and how do they relate?
  • What is PyRenew? How do I install it (current installation instructions point to an archived GitHub repo)? Why isn't it mentioned in the root readme?
  • The pipeline directory is blank. What is it going to be used for?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.