There are two competing priorities here: (1) record all (or nearly all - do we rea

A few remarks/questions: Would it be helpful to include the da

We need to automate the forecast file processing process more than what we curre

Updating my recommendations below: we have the data-processed

migrate to a clearer structure for what forecasts are made when about covid19-forecast-hub HOT 6 CLOSED

reichlab commented on July 19, 2024

migrate to a clearer structure for what forecasts are made when

from covid19-forecast-hub.

Comments (6)

jbracher commented on July 19, 2024

A few remarks/questions:

Would it be helpful to include the date when a forecast was made as a variable in the csv file? This is redundant information as it is already contained in the file name, but when I worked with the old FluSight data I found it a bit painful to always read that in from the file name via strsplit etc.
The end_date variable also applies to day-ahead forecasts and then just refers to the data, right? I'm in favour of this as it also makes it much easier to spot misaligned forecasts (as for Stephen Lauer's question on CU forecasts).
Any reason why the threshold would not be Monday, i.e. in line with the collection of forecasts for the ensemble? This way, if people adapt to the deadline and submit their forecast for a given epiweek at the latest possible time point (using most info) we get a fresh forecast for the ensemble.

from covid19-forecast-hub.

yuorme commented on July 19, 2024

Speaking from a tracking perspective. I think it's important to keep all versions of the data in as granular way as possible.

You guys are the experts here but I think it could be worthwhile to try ensembling a few recent versions of the same model instead of only the most recent version in a given week.

We'd be happy to contribute to that effort using the raw data available in this repo. To that end, I think keeping all copies of raw data that are made available to you would be helpful for our efforts.

Hopefully, through the work that we're doing on covid-projections.com, modelers can come to better understand how the changes they are making are affecting the outcomes.

from covid19-forecast-hub.

NutchaW commented on July 19, 2024

We need to automate the forecast file processing process more than what we currently have in place if we want to do this daily. Nonetheless, we will need someone to deal with inconsistent formatting in forecast files.
Ensemble can be run more often than once a week as long as we have enough processed forecast files from models with overlapping quantiles, targets, locations at the time we want to run. One caveat is that this can get confusing if we have inconsistent submissions that will result in ensemble members being different depending on the models we have when the ensemble is run.

from covid19-forecast-hub.

nickreich commented on July 19, 2024

Updating my recommendations below:

we have the data-processed directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.
we have in place a set of automatic checks from Travis CI to ensure data are passing some basic plausibility checks and formatting guidelines.
each file is marked with the date the forecast was made. This would change a bit our restriction right now that these YYYY-MM-DD's only refer to Mondays. I'm going to refer to this date in the filename as fcast_date below.
we define that "1 wk ahead" means epiweek(fcast_date) for forecasts with a fcast_date of Sunday and Monday, otherwise it is epiweek(fcast_date)+1. This means that any forecast submitted Tuesday through Saturday would be making a "1 week ahead" forecast for the tallies on the following Saturday (ie. 7-11 days ahead).
to reinforce this and avoid inadvertent errors in assignment of targets to days/weeks, we will accept a new optional column in the files that would be end_date. files submitted with fcast_date of 2020-04-23 (thursday of EW 17) or 2020-04-27 (Monday of EW 18) would both have a "1 wk ahead" forecast with end_date of 2020-05-02 (Saturday of EW 18). This end date column could also be used for "day-ahead" targets. (Adding these columns will increase the number of checks we need to do on the files...)
we also allow for an optional fcast_date column in the files. This would be redundant with the fcast_date in the filename, but still can facilitate data handling.
on Mondays at a fixed time (6pm ET) we run an ensemble script that finds all available forecasts from a team made since the preceding Thursday (i.e. 4 days prior) and takes the most recent forecast to include in the ensemble. (Interesting idea from @yuorme to include a few of the most recent forecasts, almost like a smoothing effect. I think as a first pass we leave this out, but perhaps come back to it once the systems are smoothed out.) For now, I think we should run just the one ensemble each week. We need to make sure we are doing this right, and CDC is only going to update once a week for now, so there isn't a need to do it more often.
we update the "COVID forecast hub" visualizations on Monday as well, at the same time as the ensemble, but not throughout the week as new forecasts roll in.

from covid19-forecast-hub.

NutchaW commented on July 19, 2024

I should also note that if end_date is in the forecast file, I will need to convert the fcast_date in the file name to end_date to avoiding having to read in all the files in order to run the ensemble.

from covid19-forecast-hub.

jbracher commented on July 19, 2024

Should we remove the "old" processed files where the date in the filename is timezero rather than forecast_date? Or should they be kept for documentation purposes, maybe in some subfolder?

from covid19-forecast-hub.

migrate to a clearer structure for what forecasts are made when about covid19-forecast-hub HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs