reichlab / covid19-forecast-hub Goto Github PK

Projections of COVID-19, in standardized format

Home Page: https://covid19forecasthub.org

License: Other

R 2.40% Shell 0.21% JavaScript 3.08% Python 1.97% HTML 9.29% Vue 0.66% CSS 0.74% TypeScript 0.41% Dockerfile 0.01% Jupyter Notebook 81.10% SCSS 0.12% Makefile 0.02%

covid19 forecasts covid-19 forecast-data covid-data github-pages visualization analytics

covid19-forecast-hub's Issues

Separate forecasts from truth

I suggest we reorganize the data so that forecasts are separate from truth, e.g.

data-raw/forecasts
data-raw/truth
data-processed/forecasts
data-processed/truth

The subdirectory structure within the forecasts/ subdirectories would be the same as it is now.

Also, perhaps we should include nytimes "gold-standard" data in addition to the JHU data.

build first draft local d3 visualization

Run CU processing files as soon as undamaged April 12 orecasts are available

I already pushed my code, but it should only be run & results committed once the April 12 forecasts are available. Currently I use the April 9 forecasts to test the code, but these are not actually going to be used.

add dates, targets, locations

add US death quantiles to LANL processing script

add CU data and license

add metadata for models

write script to convert CU data

write script to convert LANL data

Summarize Zoltpy validation checks in Wiki

Instructions on how to check file locally
Summary of what checks are currently in place

add metadata for teams

add instructions for running the visualization locally to the wiki

add additional validations

Some additional validations

ensure that we are checking for all required column names as required by the repo (right now we are requiring forecast_date and target_end_date which are not part of Zoltar) can we require these?
are we validating the FIPS locations based on the specific set of valid numbers, or just any string of a number between 01 and 95? I would prefer the former, so we are doing it specifically for accepted FIPS.
can we institute a more complex check to ensure that people are aligning forecast_date and target_end_date correctly? I will explain more below.
Require point estimates (exactly one point estimate per location/target tuple) - we know from Katie's code that the forecast_date column is the same for the entire file (based on filename)
update https://github.com/reichlab/covid19-forecast-hub/wiki/Validation-Checks

put in place data format checks from Travis

Add weekly targets to JHU_IDD

I'm working on this.

adapt process_lanl_file() to incorporate incident death data in processed files.

Move processing scripts to data-raw/ folders

Currently most of the code is in the code/ directory and recently organized into subdirectories. As a general principle, I suggest we move code closer to the data it is used on. For example, I suggest we move raw data processing scripts to the data-raw/ folder.

The code/ directory could still be used for functions (rather than scripts) that are used in multiple scripts.

update Imperial/MOBS/LANL target labels

make Imperial cumulative projections

update CU target labels

make IHME forecasts compatible with new timezero structure

Edit point estimate format in UMass-ExpertCrowd

@tomcm39

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/UMass-ExpertCrowd/2020-04-13-UMass-ExpertCrowd.csv

It needs point estimates to be in this format:

2020-04-12,1 day ahead cum death,2020-04-13,31,Nebraska,point,NA,6

write code for ensemble

2020-04-27-MOBS_NEU-GLEAM-COVID-19_v1.csv has no forecast_date or target_end_date

This newly added file does not have required fields: forecast_date and target_end_date.

Geneva processing file uses "days" rather than "day"

For consistency,

covid19-forecast-hub/code/process_geneva_file.R

Line 49 in d3e77ad

times = c("day ahead inc death", "days ahead cum death"))

should use "day" rather than "days"

Pull request incoming.

Standardize processed data filenames

In addition to the missing fields in #66, the newest MOBS processed data file has a filename that is non-standard (for me) and is causing me issues with reading processed data for the shiny data-processing app.

Although I could update the data reading script, I think the real issue is that we don't seem to have a standardized filename processed data. I was assuming "-" was a reserved character such that the files are named

YYYY-MM-DD-team-model.csv

Can we set this as a standard?

Add forecast_date, target_end_date to 2020-04-13 CU data-processed/ files

The following files need required fields forecast_date and target_end_date:

data-processed/CU-60contact/2020-04-13-CU-60contact.csv
data-processed/CU-70contact/2020-04-13-CU-70contact.csv
data-processed/CU-80contact/2020-04-13-CU-80contact.csv
data-processed/CU-nointerv/2020-04-13-CU-nointerv.csv

mismatch in shiny app n_states variable

showing JHU IDD has 50 states but it actually has 0?

determine feasibility of including additional targets in visualization

Could we use the drop-down menu that we have previously used for "season" to toggle between different forecast targets, e.g. incident deaths, cumulative deaths, hospitalizations, etc...? how much customization would this take?

write script to convert IHME data

specify scoring rules and ground-truth data

specify preferred quantiles
specify what scoring will be done
specify when death data will be retrieved for scoring

update target list

what are the next-phase targets that we want to include? likely we should phase these in slowly, to reduce strain on creating checks, visualizations, ensembles, for new targets. candidates are:

incident hospitalization demand by week/day?
ICU bed demand by week/day?
...

make LANL forecasts compatible with new timezero structure

migrate to a clearer structure for what forecasts are made when

There are two competing priorities here:
(1) record all (or nearly all - do we really want to store every update, even if daily?) forecasts made by teams, as they make them [useful for "tracker"-like sites that want all versions and real-time updates]
(2) record forecasts made by teams that are available at a specific time, and use them to build an ensemble. realistically, for the foreseeable future we might just want to update the ensemble once a week. [useful for our standardizing our ensemble]

Here is one proposal for how to do this:

we have the data-processed directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.
each file is marked with the date the forecast was made. This would change a bit our restriction right now that these YYYY-MM-DD's only refer to Mondays. I'm going to refer to this date in the filename as fcast_date below.
we set really clear guidelines for when "1 wk ahead" means epiweek(fcast_date) and when it means epiweek(fcast_date)+1. for example, we say that if weekday(fcast_date) is Thursday, Friday or Saturday, then "1 wk ahead" means epiweek(fcast_date)+1 and otherwise epiweek(fcast_date). (I don't feel that strongly about where the threshold is for switching over. Could be Tuesday, could be Thursday.)
to reinforce this and avoid inadvertent errors in assignment of targets to days/weeks, we could also accept a new column name in the files that would be end_date, so files submitted with fcast_date of 2020-04-23 (thursday of EW 17) or 2020-04-27 (Monday of EW 18) would both have a "1 wk ahead" forecast with end_date of 2020-05-02 (Saturday of EW 18).
on Mondays at a fixed time (6pm ET?) we run an ensemble script that finds all available forecasts from a team made since the preceding Thursday (i.e. 4 days prior) and takes the most recent forecast to include in the ensemble.

not all day-ahead targets are showing up in shiny app

@jarad I see up to 41 day ahead targets in the recent 2020-04-26 CU files, but only max_n=9 for the day_ahead targets in the app.

Add other territories to truth data

Ex. Guam

Remove Imperial ensemble forecast files from data-raw/ folder

All team forecasts should be in a subdirectory of data-raw/, but these files

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble1.csv
https://github.com/reichlab/covid19-forecast-hub/blob/master/data-raw/2020-04-19-Imperial-ensemble2.csv

are directly in the data-raw/ folder.

I would create a pull request, but these files have differences to the files in the data-raw/Imperial subdirectory, so I'm not sure which versions should be preserved.

Add Metadata to the YYG-ParamSearch Model

@youyanggu could you add a metadata file to the YYG-ParamSearch Model?

See https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/UMass-ExpertCrowd/metadata-UMass-ExpertCrowd.txt for an example. We need the metadata in order to visualize the model. Thanks!

make Imperial forecasts compatible with timezero structure

Move 2020-04-27 MOBS file to data-raw/ from data-processed

This is related to #66 and #67. I believe

https://github.com/reichlab/covid19-forecast-hub/blob/master/data-processed/MOBS_NEU-GLEAM_COVID/2020-04-27-MOBS_NEU-GLEAM-COVID-19_v1.csv

should be in

https://github.com/reichlab/covid19-forecast-hub/tree/master/data-raw

no quantile crossing
quantiles for cumulative deaths greater or equal than those for incident
quantiles for cumulative deaths non-decreasing over time
cumulative week-ahead and corresponding day-ahead forecasts coincide
Maybe related to #13 ?

reichlab / covid19-forecast-hub Goto Github PK

covid19-forecast-hub's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs