GithubHelp home page GithubHelp logo

covid-public's Introduction

Arena AI - Covid19:

How to use these models:

This repo contains python code for 3 different COVID-19 infection forecasting models, each at a state-level, for states within the U.S. There are 3 models: SIR, Curve fitting (IHME) and a "Phase Space" model. Training data is available on S3, and updates daily. Attributions cite external public data sources that we use to create the training dataset, and external references for model methodology.

If you find this useful, please tell us by emailing us at [email protected]. To file an issue, use Github issues. We hope that you find this useful, and welcome your feedback!

More info on our work at: covid.arena-ai.com

Models:

  1. SIR
from arenacovid.models import sir

# Set total population
N = 1e6
DAYS_TO_PREDICT = 180

# Initial Conditions
S = np.zeros(DAYS_TO_PREDICT)
I = np.zeros(DAYS_TO_PREDICT)
I[0] = 1
S[0] = N - I[0]

S_t, I_t = sir.simulate(S, I, N, lam=0.3, gamma=0.1)
  1. Phase Space
from arenacovid.models import phase_space

# cumulative cases time series for a single State
y = data.set_index('date')['cumulative_cases']
m = phase_space.PhaseFitter(tau=2, b_default=-.05).fit(y)

....
  1. Curve fitting
from arenacovid.models import curve_fitting

# Fit multiple states + countries at the same time
m = curve_fitting.HierarchicalCurveFitter(mu_lower_bound=mu_lower_bound, mu_upper_bound=mu_upper_bound)
m.fit(data["new_deaths_per_million"].values, data["group_id"].values, data["t"].values)

Training Data in S3:

Daily Death Time Series:

  • s3://arena-covid-public/covid_data/death_time_series_us
  • s3://arena-covid-public/covid_data/death_time_series_combined

Normalized death data for each state in the US, as well as normalized death data for US + International Countries. Cumulative death rate are quoted "per million" of population.

Daily Death + Case Series:

  • s3://arena-covid-public/covid_data/nyt_cases

A processed, state-aggregated view of the New York Times reported case and deaths data.

Reading data

import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
s3_uri = 's3://arena-covid-public/covid_data/death_time_series_combined'
df = pq.ParquetDataset(s3_uri, filesystem=fs).read().to_pandas()

Attributions:

  1. NY Times data: https://github.com/nytimes/covid-19-data
  2. UW Model: http://www.healthdata.org/covid
  3. Ferguson & Imperial College SIR model: https://spiral.imperial.ac.uk:8443/bitstream/10044/1/77482/5/Imperial%20College%20COVID19%20NPI%20modelling%2016-03-2020.pdf
  4. COVID-tracking project: https://covidtracking.com/
  5. State and County level ICU bed and 60+ population data: https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/#lookup
  6. Government actions data: https://www.kff.org/health-costs/issue-brief/state-data-and-policy-actions-to-address-coronavirus/
  7. Government actions data: https://en.wikipedia.org/wiki/U.S._state_and_local_government_response_to_the_2020_coronavirus_pandemic
  8. Government actions data (crowdsourced by Rex Douglass): https://github.com/rexdouglass/TIGR
  9. State and city populations: https://data.nber.org/data/census-intercensal-county-population.html

covid-public's People

Contributors

schrobot avatar rspeare avatar

Stargazers

Joe Cho avatar Ignacio Chavez avatar  avatar  avatar  avatar Kamil Toraman avatar  avatar Michael Rosenberg avatar @cdcme avatar Ryan Rappa avatar Geoff Tidey avatar Atul Acharya avatar wolfi3 avatar Ivan Campos avatar

Watchers

Benjamin G avatar Zhengyuan avatar James Deagle avatar James Cloos avatar Berkeley Malagon avatar Eric Moore avatar  avatar Joe Cho avatar Nick Dupoux avatar Pratap Ranade avatar Elcin avatar Chenyao Yu avatar Alicia avatar Daniel P avatar Nathan Mirman avatar Erick Katzenstein avatar Mike Pieper avatar  avatar Nikhil Angad Bakshi avatar Alex Shapiro avatar Sankeerth Garapati avatar Mohammed Affan avatar Harry Liu avatar Engin Ural avatar  avatar Erik Trinh avatar Neil Liu avatar Rohan Tilva avatar Arya Hezarkhani avatar Athina Vandame avatar Christopher Bryant avatar Etan Green avatar Mehmet Akif AKKUŞ avatar Rithik Rajani avatar

covid-public's Issues

Surface level review

Hey all,

Very cool work here - definitely useful to have a dashboard that compares multiple models to one another. The phase space model is a very interesting way of looking at things! I actually started to play a bit with visualization of this type the other night.

A quick breakdown of some concerns with the models.

SIR:

You're using a Kermack-McKendrick model, the simplest form of SIR model out there. It feels like more of a toy model to be illustrative than anything else, but I'll note at the least that:

  1. Latency is a critical factor in modeling this disease, as it drastically changes how control measures work. I'd extend at the very least to an SEIR model, which also requires pushing the R0 up (as your serial interval increases). Current belief on R0 is closer to the 5.7 range (https://wwwnc.cdc.gov/eid/article/26/7/20-0282_article?deliveryName=USCDC_333-DM25287)
  2. Recovery time is different than infectious time in the current environment. People isolating when they have symptoms changes these dynamics.
  3. Asymptomatic/presymptomatic/prodromal spread is a huge characteristic of COVID. Even if these cases don't contribute directly to deaths, they make control substantially harder - even in countries with robust contact tracing, long chains of asymptomatic cases could result in a new deadly outbreak weeks after the last confirmed case in a region.
  4. Implementing at least one kind of intervention into the model seems like a bare minimum to get this model operating in a similar class to the others. Something that allows changing the contact rate and recovery time at specific dates would be great.

Curve fitting (both phase space and IHME derived)

Both of these models are predicated on the idea that deceleration continues; i.e. that the measures are effective in eventually bringing r0 below 1 and that those measures are not let up at any point in the future. Carl Bergstrom has several pretty good breakdowns of the weaknesses of these types of models:

https://twitter.com/CT_Bergstrom/status/1245618003711946753
https://twitter.com/CT_Bergstrom/status/1247645708741566465
https://twitter.com/CT_Bergstrom/status/1246957709682806785

tldr; while case counts / death counts may be decelerating, there is no reason to assume that that deceleration will continue (or acceleration will not start) unless the herd immunity is reached.

In China we've already seen cases where relaxation of lockdown results in a new acceleration in cases - the danger in using these models as a 'peak' forecaster is that they assume that the mitigation measures completely eradicate the epidemic or that those mitigation measures are never relaxed.

While purely statistical models are useful (and should definitely be developed in combination with the mechanistic ones), we need to be careful that assumptions made in the statistical models don't violate the underlying mechanics or otherwise limit the possible outcomes in unrealistic ways. Imports (and travel policy) make a huge difference here. This can be seen directly with Connecticut and New Jersey, but a few new case injections into an area that has been social distancing mildly but not had any imports recently (and thus looks like it is doing well at controlling growth) could have a drastically different trajectory after a few imports or a single super-spreader.

Curve Fitting Prediction

Hello! Thanks for providing these models. I was wondering if you could provide an example of the predict function? When inputting any range of times into the future, I get the following error?

ValueError: operands could not be broadcast together with shapes (1,) (23,) (1,) (157,) (23,) (23,)

Where I've input [0,23] initial data points for fitting, and have input [0,180] t values for prediction. I expect I am misunderstanding how the predict function ingests it's inputs, but would simply a full example! Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.