reichlab / covid19-forecast-hub Goto Github PK
View Code? Open in Web Editor NEWProjections of COVID-19, in standardized format
Home Page: https://covid19forecasthub.org
License: Other
Projections of COVID-19, in standardized format
Home Page: https://covid19forecasthub.org
License: Other
I suggest we reorganize the data so that forecasts are separate from truth, e.g.
data-raw/forecasts
data-raw/truth
data-processed/forecasts
data-processed/truth
The subdirectory structure within the forecasts/ subdirectories would be the same as it is now.
Also, perhaps we should include nytimes "gold-standard" data in addition to the JHU data.
I already pushed my code, but it should only be run & results committed once the April 12 forecasts are available. Currently I use the April 9 forecasts to test the code, but these are not actually going to be used.
Some additional validations
forecast_date
and target_end_date
which are not part of Zoltar) can we require these?01
and 95
? I would prefer the former, so we are doing it specifically for accepted FIPS.forecast_date
and target_end_date
correctly? I will explain more below.location/target tuple
) - we know from Katie's code that the forecast_date column is the same for the entire file (based on filename)I'm working on this.
Currently most of the code is in the code/ directory and recently organized into subdirectories. As a general principle, I suggest we move code closer to the data it is used on. For example, I suggest we move raw data processing scripts to the data-raw/ folder.
The code/ directory could still be used for functions (rather than scripts) that are used in multiple scripts.
It needs point estimates to be in this format:
2020-04-12,1 day ahead cum death,2020-04-13,31,Nebraska,point,NA,6
This newly added file does not have required fields: forecast_date
and target_end_date
.
For consistency,
should use "day" rather than "days"Pull request incoming.
In addition to the missing fields in #66, the newest MOBS processed data file has a filename that is non-standard (for me) and is causing me issues with reading processed data for the shiny data-processing app.
Although I could update the data reading script, I think the real issue is that we don't seem to have a standardized filename processed data. I was assuming "-" was a reserved character such that the files are named
YYYY-MM-DD-team-model.csv
Can we set this as a standard?
The following files need required fields forecast_date
and target_end_date
:
data-processed/CU-60contact/2020-04-13-CU-60contact.csv
data-processed/CU-70contact/2020-04-13-CU-70contact.csv
data-processed/CU-80contact/2020-04-13-CU-80contact.csv
data-processed/CU-nointerv/2020-04-13-CU-nointerv.csv
Could we use the drop-down menu that we have previously used for "season" to toggle between different forecast targets, e.g. incident deaths, cumulative deaths, hospitalizations, etc...? how much customization would this take?
what are the next-phase targets that we want to include? likely we should phase these in slowly, to reduce strain on creating checks, visualizations, ensembles, for new targets. candidates are:
There are two competing priorities here:
(1) record all (or nearly all - do we really want to store every update, even if daily?) forecasts made by teams, as they make them [useful for "tracker"-like sites that want all versions and real-time updates]
(2) record forecasts made by teams that are available at a specific time, and use them to build an ensemble. realistically, for the foreseeable future we might just want to update the ensemble once a week. [useful for our standardizing our ensemble]
Here is one proposal for how to do this:
data-processed
directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.fcast_date
below.epiweek(fcast_date)
and when it means epiweek(fcast_date)+1
. for example, we say that if weekday(fcast_date)
is Thursday, Friday or Saturday, then "1 wk ahead" means epiweek(fcast_date)+1
and otherwise epiweek(fcast_date)
. (I don't feel that strongly about where the threshold is for switching over. Could be Tuesday, could be Thursday.)end_date
, so files submitted with fcast_date
of 2020-04-23
(thursday of EW 17) or 2020-04-27
(Monday of EW 18) would both have a "1 wk ahead" forecast with end_date
of 2020-05-02
(Saturday of EW 18).@jarad I see up to 41 day ahead targets in the recent 2020-04-26 CU files, but only max_n=9 for the day_ahead targets in the app.
Ex. Guam
All team forecasts should be in a subdirectory of data-raw/, but these files
https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble1.csv
https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble2.csv
are directly in the data-raw/ folder.
I would create a pull request, but these files have differences to the files in the data-raw/Imperial subdirectory, so I'm not sure which versions should be preserved.
@youyanggu could you add a metadata file to the YYG-ParamSearch Model?
See https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/UMass-ExpertCrowd/metadata-UMass-ExpertCrowd.txt for an example. We need the metadata in order to visualize the model. Thanks!
This file includes rownames in the first column which is non-standard.
Also, the file does not include target_end_date.
In processed data, the location
is "US", but the location_name
can be "US", "United States", or . Specifically, it is "United States" in UTexas data and in Imperial data.
Add a subsection enumerating teams / sources of forecasts we are planning to include + links to their repositories or websites
As of 2020-04-26 there are global files, with country-level forecasts from LANL, e.g.
https://covid-19.bsvgateway.org/forecast/global/files/2020-04-26/confirmed/2020-04-26_confirmed_quantiles_global_website.csv
These files should be included in the raw data download script for LANL.
Georgia, Indiana, Alabama, Arkansas, and Iowa to name a few examples
Write a script that does some plausibility checks for cleaned data, eg:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.