GithubHelp home page GithubHelp logo

Comments (6)

jbracher avatar jbracher commented on July 19, 2024

A few remarks/questions:

  • Would it be helpful to include the date when a forecast was made as a variable in the csv file? This is redundant information as it is already contained in the file name, but when I worked with the old FluSight data I found it a bit painful to always read that in from the file name via strsplit etc.
  • The end_date variable also applies to day-ahead forecasts and then just refers to the data, right? I'm in favour of this as it also makes it much easier to spot misaligned forecasts (as for Stephen Lauer's question on CU forecasts).
  • Any reason why the threshold would not be Monday, i.e. in line with the collection of forecasts for the ensemble? This way, if people adapt to the deadline and submit their forecast for a given epiweek at the latest possible time point (using most info) we get a fresh forecast for the ensemble.

from covid19-forecast-hub.

yuorme avatar yuorme commented on July 19, 2024

Speaking from a tracking perspective. I think it's important to keep all versions of the data in as granular way as possible.

You guys are the experts here but I think it could be worthwhile to try ensembling a few recent versions of the same model instead of only the most recent version in a given week.

We'd be happy to contribute to that effort using the raw data available in this repo. To that end, I think keeping all copies of raw data that are made available to you would be helpful for our efforts.

Hopefully, through the work that we're doing on covid-projections.com, modelers can come to better understand how the changes they are making are affecting the outcomes.

from covid19-forecast-hub.

NutchaW avatar NutchaW commented on July 19, 2024
  • We need to automate the forecast file processing process more than what we currently have in place if we want to do this daily. Nonetheless, we will need someone to deal with inconsistent formatting in forecast files.
  • Ensemble can be run more often than once a week as long as we have enough processed forecast files from models with overlapping quantiles, targets, locations at the time we want to run. One caveat is that this can get confusing if we have inconsistent submissions that will result in ensemble members being different depending on the models we have when the ensemble is run.

from covid19-forecast-hub.

nickreich avatar nickreich commented on July 19, 2024

Updating my recommendations below:

  • we have the data-processed directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.
  • we have in place a set of automatic checks from Travis CI to ensure data are passing some basic plausibility checks and formatting guidelines.
  • each file is marked with the date the forecast was made. This would change a bit our restriction right now that these YYYY-MM-DD's only refer to Mondays. I'm going to refer to this date in the filename as fcast_date below.
  • we define that "1 wk ahead" means epiweek(fcast_date) for forecasts with a fcast_date of Sunday and Monday, otherwise it is epiweek(fcast_date)+1. This means that any forecast submitted Tuesday through Saturday would be making a "1 week ahead" forecast for the tallies on the following Saturday (ie. 7-11 days ahead).
  • to reinforce this and avoid inadvertent errors in assignment of targets to days/weeks, we will accept a new optional column in the files that would be end_date. files submitted with fcast_date of 2020-04-23 (thursday of EW 17) or 2020-04-27 (Monday of EW 18) would both have a "1 wk ahead" forecast with end_date of 2020-05-02 (Saturday of EW 18). This end date column could also be used for "day-ahead" targets. (Adding these columns will increase the number of checks we need to do on the files...)
  • we also allow for an optional fcast_date column in the files. This would be redundant with the fcast_date in the filename, but still can facilitate data handling.
  • on Mondays at a fixed time (6pm ET) we run an ensemble script that finds all available forecasts from a team made since the preceding Thursday (i.e. 4 days prior) and takes the most recent forecast to include in the ensemble. (Interesting idea from @yuorme to include a few of the most recent forecasts, almost like a smoothing effect. I think as a first pass we leave this out, but perhaps come back to it once the systems are smoothed out.) For now, I think we should run just the one ensemble each week. We need to make sure we are doing this right, and CDC is only going to update once a week for now, so there isn't a need to do it more often.
  • we update the "COVID forecast hub" visualizations on Monday as well, at the same time as the ensemble, but not throughout the week as new forecasts roll in.

from covid19-forecast-hub.

NutchaW avatar NutchaW commented on July 19, 2024

I should also note that if end_date is in the forecast file, I will need to convert the fcast_date in the file name to end_date to avoiding having to read in all the files in order to run the ensemble.

from covid19-forecast-hub.

jbracher avatar jbracher commented on July 19, 2024

Should we remove the "old" processed files where the date in the filename is timezero rather than forecast_date? Or should they be kept for documentation purposes, maybe in some subfolder?

from covid19-forecast-hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.