Comments (6)
A few remarks/questions:
- Would it be helpful to include the date when a forecast was made as a variable in the csv file? This is redundant information as it is already contained in the file name, but when I worked with the old FluSight data I found it a bit painful to always read that in from the file name via
strsplit
etc. - The
end_date
variable also applies to day-ahead forecasts and then just refers to the data, right? I'm in favour of this as it also makes it much easier to spot misaligned forecasts (as for Stephen Lauer's question on CU forecasts). - Any reason why the threshold would not be Monday, i.e. in line with the collection of forecasts for the ensemble? This way, if people adapt to the deadline and submit their forecast for a given epiweek at the latest possible time point (using most info) we get a fresh forecast for the ensemble.
from covid19-forecast-hub.
Speaking from a tracking perspective. I think it's important to keep all versions of the data in as granular way as possible.
You guys are the experts here but I think it could be worthwhile to try ensembling a few recent versions of the same model instead of only the most recent version in a given week.
We'd be happy to contribute to that effort using the raw data available in this repo. To that end, I think keeping all copies of raw data that are made available to you would be helpful for our efforts.
Hopefully, through the work that we're doing on covid-projections.com, modelers can come to better understand how the changes they are making are affecting the outcomes.
from covid19-forecast-hub.
- We need to automate the forecast file processing process more than what we currently have in place if we want to do this daily. Nonetheless, we will need someone to deal with inconsistent formatting in forecast files.
- Ensemble can be run more often than once a week as long as we have enough processed forecast files from models with overlapping quantiles, targets, locations at the time we want to run. One caveat is that this can get confusing if we have inconsistent submissions that will result in ensemble members being different depending on the models we have when the ensemble is run.
from covid19-forecast-hub.
Updating my recommendations below:
- we have the data-processed directory contain all (or nearly all) forecasts from each team. no restrictions on when these forecasts are submitted.
- we have in place a set of automatic checks from Travis CI to ensure data are passing some basic plausibility checks and formatting guidelines.
- each file is marked with the date the forecast was made. This would change a bit our restriction right now that these
YYYY-MM-DD
's only refer to Mondays. I'm going to refer to this date in the filename asfcast_date
below. - we define that "1 wk ahead" means
epiweek(fcast_date)
for forecasts with afcast_date
of Sunday and Monday, otherwise it isepiweek(fcast_date)+1
. This means that any forecast submitted Tuesday through Saturday would be making a "1 week ahead" forecast for the tallies on the following Saturday (ie. 7-11 days ahead). - to reinforce this and avoid inadvertent errors in assignment of targets to days/weeks, we will accept a new optional column in the files that would be
end_date
. files submitted withfcast_date
of2020-04-23
(thursday of EW 17) or2020-04-27
(Monday of EW 18) would both have a "1 wk ahead" forecast with end_date of2020-05-02
(Saturday of EW 18). Thisend date
column could also be used for "day-ahead" targets. (Adding these columns will increase the number of checks we need to do on the files...) - we also allow for an optional
fcast_date
column in the files. This would be redundant with thefcast_date
in the filename, but still can facilitate data handling. - on Mondays at a fixed time (6pm ET) we run an ensemble script that finds all available forecasts from a team made since the preceding Thursday (i.e. 4 days prior) and takes the most recent forecast to include in the ensemble. (Interesting idea from @yuorme to include a few of the most recent forecasts, almost like a smoothing effect. I think as a first pass we leave this out, but perhaps come back to it once the systems are smoothed out.) For now, I think we should run just the one ensemble each week. We need to make sure we are doing this right, and CDC is only going to update once a week for now, so there isn't a need to do it more often.
- we update the "COVID forecast hub" visualizations on Monday as well, at the same time as the ensemble, but not throughout the week as new forecasts roll in.
from covid19-forecast-hub.
I should also note that if end_date
is in the forecast file, I will need to convert the fcast_date
in the file name to end_date
to avoiding having to read in all the files in order to run the ensemble.
from covid19-forecast-hub.
Should we remove the "old" processed files where the date in the filename is timezero rather than forecast_date? Or should they be kept for documentation purposes, maybe in some subfolder?
from covid19-forecast-hub.
Related Issues (20)
- Consolidate approved licenses into one master list
- latest/ihme-covid19.zip is 0 bytes in size HOT 2
- 'Run failed: Trigger zoltar upload' messages HOT 2
- Remove legend title for state weekly reports HOT 1
- dead link to dataset dictionary
- delete visualization folder HOT 1
- delete static plots folder
- clean up travis scripts HOT 3
- delete template directory
- delete packrat issue
- clean up code folder
- [email protected]
- How online behavioural advertising works
- upload_covid19_forecasts_to_zoltar.py updates validated_file_db.json in spite of failed upload jobs
- CI failing due to Python dependencies HOT 1
- truth data truncated HOT 2
- `truth-Incident Hospitalizations.csv` not updated HOT 3
- Submissions are not being included in the ensemble HOT 1
- fix broken upload_to_zoltar.yml action
- Update main README citation info to publication not medrxiv HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covid19-forecast-hub.