GithubHelp home page GithubHelp logo

kitmetricslab / covid19-forecast-hub-de Goto Github PK

View Code? Open in Web Editor NEW
28.0 1.0 34.0 31.65 GB

German and Polish COVID-19 Forecast Hub

Home Page: https://kitmetricslab.github.io/forecasthub/

License: Other

R 6.60% Python 3.96% HTML 20.65% Jupyter Notebook 68.80%

covid19-forecast-hub-de's Introduction

Actions Status Actions Status Actions Status Actions Status

German and Polish COVID-19 Forecast Hub

A collaborative forecasting project

Beschreibung in deutscher Sprache siehe hier.

Note: This project is now largely synchronized with the European COVID-19 Forecast Hub (website and github repository), which is run by the European Center for Disease Prevention and Control and the London School of Hygiene and Tropical Medicine. Further development is mainly going on in this new repository now. All forecasts except regional-level forecasts are also shown in the European Forecast Hub.

Website:: https://kitmetricslab.github.io/forecasthub/

Preprint: https://www.medrxiv.org/content/10.1101/2020.12.24.20248826v2

Old version of visualization incl. evaluation scores: https://jobrac.shinyapps.io/app_forecasts_de/

The new visualization, built by the Signale Team at RKI lives in a separate repository: https://github.com/KITmetricslab/forecasthub.**

Study protocol:: https://osf.io/cy937/registrations

Reference: Bracher J, Wolffram D, Deuschel, J, Görgen, K, Ketterer, J, Gneiting, T, Schienle, M (2020): The German and Polish COVID-19 Forecast Hub. https://github.com/KITmetricslab/covid19-forecast-hub-de.

Web tool to visualize submission files: https://jobrac.shinyapps.io/app_check_submission/

Web tool to explore forecast evaluations (still in development): https://jobrac.shinyapps.io/app_evaluation/

Contact: [email protected]

Purpose

This repository assembles forecasts of cumulative and incident COVID-19 deaths and cases in Germany and Poland in a standardized format. The repository is run by members of the Chair of Econometrics and Statistics at Karlsruhe Institute of Technology and the Computational Statistics Group at Heidelberg Institute for Theoretical Studies, see below.

An interactive visualization and additional information on our project can be found on our website here.

We are running a pre-registered evaluation study covering the months of October through March to assess the performance of different forecasting methods. You can find the protocol here.

The effort parallels the US COVID-19 Forecast Hub run by the UMass-Amherst Influenza Forecasting Center of Excellence based at the Reich Lab. We are in close exchange with the Reich Lab team and follow the general structure and data format defined there, see this wiki entry for more details. We also re-use software provided by the ReichLab (see below).

If you are generating forecasts for COVID-19 cases, hospitalizations or deaths in Germany and would like to contribute to this repository do not hesitate to get in touch.

Forecast targets

Deaths

We collect 1 through 4 week ahead forecasts of incident and cumulative deaths by reporting date in Germany and Poland (national level), the German states (Bundesländer) and Polish voivodeships, with a special focus on short horizons 1 and 2 week ahead. This wiki entry contains details on the definition of the targets. There is no obligation to submit forecasts for all suggested targets and it is up to teams to decide what they feel comfortable forecasting.

Our definition of targets parallels the principles outlined here for the US COVID-19 Forecast Hub.

Up to 14 December we treated the ECDC data available here and here in a processed form as our ground truth for the national level death forecasts. As of 19 December, we use data we process directly from Robert Koch Institute and the Polish Ministry of Health see below. These agree with the ECDC data up to 14 Dec.

Cases

We collect 1 through 4 week ahead forecasts of incident and cumulative confirmed cases by reporting date in Germany and Poland (national level), German states (Bundesländer) and Polish voivodeships, see the wiki entry. The respective truth data from RKI and the Polish Ministry of Health can be found here and here.

Contents of the repository

The main contents of the repository are currently the following (see also this wiki page):

  • data-processed: forecasts in a standardized format
  • data-truth: truth data from JHU and ECDC in a standardized format
  • data-raw: the forecast files as provided by various teams on their respective websites
  • The interactive visualization, which has been implemented by embers of the Signale Team at RKI, is maintained in a separate repository.

Guide to submission

For new teams we recommend direct submission to the European COVID-19 Forecast Hub unless they produce regional-level forecasts. They should consider the (slightly different) isntructions there.

Submission for actively contributing teams is based on pull requests. Our wiki contains a detailed guide to submission. Forecasts should be updated in a weekly rhythm. If possible, new forecast should be uploaded on Mondays. Upload until Tuesday, 3pm Berlin/Warsaw time is acceptable. Note that we also accept additional updates on other days of the week (not more than one per day), but will not include these in visualizations or ensembles (if no new forecast was provided on a Monday we will, however, use forecasts from the preceding Sunday, Saturday or Friday).

We moreover actively collect forecasts from a number of public repositories in accordance with the respective license terms and after having contacted the respective authors.

We strongly encourage teams to visually inspect their final forecasts prior to submission. We created a Shiny app to help you in this process.

We try to provide direct support to new teams to help overcome technical difficulties, do not hesitate to get in touch.

Data format

We store point and quantile forecasts in a long format, including information on forecast dates and location, see this wiki entry for details. This format is largely identical to the one outlined for the US Hub here.

Data license and reuse

The forecasts assembled in this repository have been created by various independent teams, most of which provided a license with their forecasts. These licenses can be found in the respective subfolders of data-processed. Parts of the processing, analysis and validation codes have been taken or adapted from the US COVID-19 forecast hub where they were provided under an MIT license. All codes contained in this repository are equally under the MIT license. If you want to re-use materials from this repository please get in touch with us.

Truth data

Data on observed numbers of deaths and several other quantities are compiled here and come from the following sources:

  • European Centre for Disease Prevention and Control This used to be our preferred source for national level counts, but ECDC has switched to weekly reporting intervals on 14 Dec 2020.
  • Polish Ministry of Health. We pull these data from this Google Sheet run by Michal Rogalski. This is our preferred source for Polish voivodeship level counts. The data are coherent with the national level data from ECDC. These data are coherent with the ECDC data up to 14 Dec. To align with the ECDC time scale we have shifted them by one day, see here.
  • Robert Koch Institut. Note that these data are subject to some processing steps (see here) and are in part based on manual data extraction performed by IHME. This is our preferred source for German Bundesland level counts. The data are coherent with the national level data from ECDC up to 14 Dec.
  • Johns Hopkins University. These data are used by a number of teams generating forecasts. Currently (August 2020) the agreement with ECDC is good, but in the past there have been larger discrepancies. This is the main data source for the European COVID-19 Forecast Hub.
  • DIVI Intensivregister. These data are currently not yet used for forecasts, but we may extend our activities in this direction.

Details can be found in the respective README files in the subfolders of data-truth.

Teams generating forecasts

Currently we assemble forecasts from the following teams. Note that not all teams are using the same ground truth data. (used truth data source and forecast reuse license in brackets):

Forecast evaluation and ensemble building

One of the goals of this forecast hub is to combine the available forecasts into an ensemble prediction, see here for a description of the current unweighted ensemble approach. Note that we only started generating ensemble forecasts each week on 17 August 2020. Ensemble forecasts from earlier weeks have been generated retrospectively to assess performance. As the ensemble is only a simple average of other models this should not affect the behaviour of the ensemble forecasts. The commit dates of all forecasts can be found here. Starting from 2020-09-21, our main ensemble is the median rather than the mean ensemble, as the former showed better performance in evaluations.

At a later stage we intend to generate more data-driven ensembles, which requires evaluating different forecasts, both those submitted by teams and those generated using different ensembling techniques. We want to emphasize, however, that this is not a competition, but a collaborative effort. The forecast evaluation method which will be applied is described in this preprint.

Forecast hub team

The following persons have contributed to this repository, either by assembling forecasts or by conceptual work in the background (in alphabetical order):

Related efforts

Scientific papers and preprints

Members of our group have contributed to the following papers and preprints on collaborative COVID-19 forecasting:

Acknowledgements

The Forecast Hub project is part of the SIMCARD Information& Data Science Pilot Project funded by the Helmholtz Association. We moreover wish to acknowledge the Alexander von Humboldt Foundation whose support facilitated early interactions and collaboration with the Reich Lab and the US COVID-19 Forecast Hub.

The content of this site is solely the responsibility of the authors and does not necessarily represent the official views of KIT, HITS, the Humboldt Foundation or the Helmholtz Association.

covid19-forecast-hub-de's People

Contributors

actions-user avatar arodloff avatar cfong32 avatar deankarlen avatar dwolffram avatar emaerthin avatar frostxtj avatar hoddlehh avatar holgerman avatar jadeusc avatar jak-ket avatar jbracher avatar jhmeinke avatar jlittek avatar konsti-g avatar maa989 avatar macrad avatar michaellli avatar neele-itwm avatar nikosbosse avatar rgmcasus avatar scc-usc avatar seabbs avatar stefanheyder avatar storaged avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

covid19-forecast-hub-de's Issues

Renaming DIVI files

DIVI-anzahl_meldebereiche.csv -> truth_DIVI-reporting_areas_Germany.csv
DIVI-anzahl_standorte.csv -> truth_DIVI-sites_Germany.csv
DIVI-betten_belegt.csv -> truth_DIVI-beds_occupied_Germany.csv
DIVI-betten_frei.csv -> truth_DIVI-beds_free_Germany.csv
DIVI-faelle_covid_aktuell.csv -> truth_DIVI-cases_covid_current_Germany.csv
DIVI-faelle_covid_aktuell_beatmet.csv truth_DIVI-cases_covid_current_ventilated_Germany.csv

Re-name DIVI truth files

  • Move the two COVID-related DIVI truth files directly into "DIVI" and rename them as follows:
    • truth_DIVI-cases_covid_current_Germany.csv -> truth_DIVI-Current ICU_Germany.csv
    • truth_DIVI-cases_covid_current_ventilated_Germany.csv -> truth_DIVI-Current Ventilated_Germany.csv
  • Move the remaining DIVI files to a subfolder called "DIVI/others"

Allow for death forecasts for Poland

Allow for files with -Poland- instead of -Germany- (only for death forecasts, not ICU). For these files allow for the following FIPS codes:
PL: Poland
PL72: Dolnośląskie Province, Poland
PL73: Kujawsko-Pomorskie Province, Poland
PL74: Łódzkie Province, Poland
PL75: Lubelskie Province, Poland
PL76: Lubuskie Province, Poland
PL77: Małopolskie Province, Poland
PL78: Mazowieckie Province, Poland
PL79: Opolskie Province, Poland
PL80: Podkarpackie Province, Poland
PL81: Podlaskie Province, Poland
PL82: Pomorskie Province, Poland
PL83: Śląskie Province, Poland
PL84: Świętokrzyskie Province, Poland
PL85: Warmińsko-Mazurskie Province, Poland
PL86: Wielkopolskie Province, Poland
PL87: Zachodniopomorskie Province, Poland

If possible ensure that -Germany- files only contain German data and $Poland- files only Polish.

Manual checks of data-processed against data-raw

To be sure we got the processing right we should write little manual checks comparing some data points in the raw and processed data. These should be written by a different person as the original extraction file. Please add a comment if you have written such a check for one team.

Fix plot_current_forecasts.R

Travis doesn't execute the file due to the following error (Line 26):
Error in names(cols_models) <- models :
'names' attribute [10] must be the same length as the vector [9]

Executing the file manually leads to a "broken" image

Allow for case forecasts.

Current Validation checks

Each pull request to our repository triggers an automated validation of the formatting requirements for forecast files in the data-processed. These are implemented in the script test-formatting.py. We are using a somewhat shortened form of the procedure implemented by the US Forecast Hub, see here for their documentation. You can also run these checks locally or apply a similar set of checks in R, see here. Specifically, the following checks are performed:

Checks applied to all files

  • validates file name

    • checks that the format of the file name is <date>-<team>-<model><possibly -ICU or -case>.csv
    • checks that the <team> part of the file name is the same as the name of the containing folder
  • validates header

    • checks for required columns: location, target, type, quantile, value, forecast_date, target_end_date
  • validates csv rows at the row level:

    • checks each row has same number of columns as header

    • validates forecast_date and target_end_date are dates of format YYYY-MM-DD

    • validates "__ day ahead" or "__ week ahead" increments in target are integers

    • validates values of quantile are int/float and from these 23 values:

      [0.01, 0.25, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99]
    • checks that value is an int or float

  • validates quantiles and values (i.e,. at the prediction level):

    • checks that entries in value are be non-decreasing as quantiles increase
    • checks that elements in quantile are unique (per target)
  • validates quantiles as a group:

    • there must be exactly one point prediction for each location/target pair

Checks applied to death forecast files

The following checks are only applied to death forecast files, i.e. without the -ICU tag in the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead inc death")
    paste(-1:130, "day ahead cum death")
    paste(-1:20,  "wk ahead inc death")
    paste(-1:20,  "wk ahead cum death")

Checks applied to case forecast files

The following checks are only applied to case forecast files, i.e. without containing the -case tag at the end of the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead inc case")
    paste(-1:130, "day ahead cum case")
    paste(-1:20, "wk ahead inc case")
    paste(-1:20, "wk ahead cum case")

Checks applied to ICU forecast files

The following checks are only applied to ICU forecast files, i.e. without containing the -ICU tag at the end of the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead curr ICU")
    paste(-1:130, "day ahead curr ventilated")
    paste(-1:20, "wk ahead curr ICU")
    paste(-1:20, "wk ahead curr ventilated")

Checks of location variable

The allowed entries for the variable location depend on the country indicated in the file name. For files containing -Germany- the following locations are allowed:
r ["GM", "GM01", "GM02", "GM03", "GM04", "GM05", "GM06", "GM07", "GM08", "GM09", "GM10", "GM11", "GM12", "GM13", "GM14", "GM15", "GM16"]
The FIPS code - "Bundesland" mapping can be found here and on Wikipedia.

For files containing -Poland- the following locations are allowed:
r ["PL", "PL72", "PL73", "PL74", "PL75", "PL76", "PL77", "PL78", "PL79", "PL80", "PL81", "PL82", "PL83", "PL84", "PL85", "PL86", "PL87"]
The FIPS code - "Voivodeship" mapping can be found here and on Wikipedia.

Metadata checks

  • validates metadata (In progress)
    • proper yaml format
    • includes: team_name,team_abbr, model_name, model_abbr, methods
    • methods is under 200 characters
    • forecast_startdate is date
    • this_model_is_an_ensemble and this_model_is_unconditional are boolean
    • model_name needs to be distinct from any already existing model_name
    • model_abbr needs to be distinct from any already existing model_abbr

Shiny App: Old Data is not plotted for Imperial and MIT

For Imperial-Ensemble1, Imperial-Ensemble2, and MIT-CovidAnalytics-DELPHI, only the one-week ahead forecast is plotted, even when "Show past values assumed by models where available" is ticked.
forecasts_to_plot.csv contains all relevant data, so this should be a code issue.

Add source of truth data (RKI, JHU) to file names to avoid confusion

Re-name to truth_RKI-Cumulative Deaths_Germany.csv and truth_JHU-Cumulative Deaths_Germany.csv and adapt the codes that generate the files. Keep truth files in the existing subfolders corresponding to the different sources (this is done in a similar way in the US, e.g. truth_nytimes-Incident Cases.csv)

automomatic comparison of ECDC and RKI data

Add a Python script to check that the national level numbers (location == "GM") from the ECDC and RKI data (for inc death, cum death, inc case, cum case) coincide for the last 7 days. If this is not the case issue a warning in a way that we will actually notice - possibly sending an email?

There is already one difference between the two data sources which I think is due to a reporting problem in the ECDC data, so we cannot compare the entire data set.

Adapt check to detect "week" instead of "wk"

If teams write e.g. "1 week ahead cum death" instead of "1 wk ahead cum death" the error from test_formatting.py reads:
"non-integer number of weeks ahead in 'wk ahead' target: '3 week ahead cum death'. row=['2020-06-19', '3 week ahead cum death', '2020-07-18', 'GM', 'Germany', 'quantile', '0.45', '9308.4111882054']"
Can we correct this so that it says that the problem is with "week" instead of "wk"?

Adapt checks to allow for ICU forecasts

Current Validation checks

Each pull request to our repository triggers an automated validation of the formatting requirements for forecast files in the data-processed. These are implemented in the script test-formatting.py. We are using a somewhat shortened form of the procedure implemented by the US Forecast Hub, see here for their documentation. You can also run these checks locally or apply a similar set of checks in R, see here. Specifically, the following checks are performed:

Checks applied to all files

  • validates file name

    • checks that the format of the file name is <date>-<team>-<model><possibly -ICU>.csv
    • checks that the <team> part of the file name is the same as the name of the containing folder
  • validates header

    • checks for required columns: location, target, type, quantile, value, forecast_date, target_end_date
  • validates csv rows at the row level:

    • checks each row has same number of columns as header

    • requires location to be one of the following FIPS codes:

      ["GM", "GM01", "GM02", "GM03", "GM04", "GM05", "GM06", "GM07", "GM08", "GM09", "GM10", "GM11", "GM12", "GM13", "GM14", 
      "GM15", "GM16"]

      The FIPS code - "Bundesland" mapping can be found here.

    • validates forecast_date and target_end_date are dates of format YYYY-MM-DD

    • validates "__ day ahead" or "__ week ahead" increments in target are integers

    • validates values of quantile are int/float and from these 23 values:

      [0.01, 0.25, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99]
    • checks that value is an int or float

  • validates quantiles and values (i.e,. at the prediction level):

    • checks that entries in value are be non-decreasing as quantiles increase
    • checks that elements in quantile are unique (per target)
  • validates quantiles as a group:

    • there must be exactly one point prediction for each location/target pair

Checks applied to death forecast files

The following checks are only applied to death forecast files, i.e. without the -ICU tag in the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead inc death")
    paste(-1:130, "day ahead cum death")
    paste(-1:20,  "wk ahead inc death")
    paste(-1:20,  "wk ahead cum death")

Checks applied to ICU forecast files

The following checks are only applied to ICU forecast files, i.e. without containing the -ICU tag at the end of the file name.

  • validates that target is one of the following:
    paste(-1:130, "day ahead curr ICU")
    paste(-1:130, "day ahead curr ventilated")
    paste(-1:20, "wk ahead curr ICU")
    paste(-1:20, "wk ahead curr ventilated")

Metadata checks

  • validates metadata (In progress)
    • proper yaml format
    • includes: team_name,team_abbr, model_name, model_abbr, methods
    • methods is under 200 characters
    • forecast_startdate is date
    • this_model_is_an_ensemble and this_model_is_unconditional are boolean
    • model_name needs to be distinct from any already existing model_name
    • model_abbr needs to be distinct from any already existing model_abbr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.