GithubHelp home page GithubHelp logo

swingapple / data Goto Github PK

View Code? Open in Web Editor NEW

This project forked from open-covid-19/data

0.0 0.0 0.0 141.93 MB

Daily time-series data for all countries & state/province data for 30+ countries

Home Page: https://open-covid-19.github.io/explorer

Python 1.10% Shell 0.21% HTML 98.70%

data's Introduction

Open COVID-19 Dataset

This repository contains datasets of daily time-series data related to COVID-19, including state/province data for over 30 countries.

Explore the data

A simple visualization tool was built to explore the Open COVID-19 datasets, the Open COVID-19 Explorer: If you want to see interactive charts with a unique UX, don't miss what @Mahks built using the Open COVID-19 dataset:
You can also check out the great work of @quixote79, a MapBox-powered interactive map site: Experience clean, clear graphs with smooth animations thanks to the work of @jmullo:
Become an armchair epidemiologist with the COVID-19 timeline simulation tool built by @LeviticusMB: Whether you want an interactive map, compare stats or look at charts, @saadmas has you covered with a COVID-19 Daily Tracking site:
Compare per-million data at Omnimodel thanks to @OmarJay1:

If you are using this data, feel free to open an issue and let us know so we can give you a call-out here.

Use the data

The data is available as CSV and JSON files, which are published in Github Pages so they can be served directly to Javascript applications without the need of a proxy to set the correct headers for CORS and content type. data.csv has a version with all historical data, and another version with only the latest daily data. All other datasets only have either historical or the latest data. The datasets available from this project are:

Dataset CSV URL JSON URL
Data Latest, Historical Latest, Historical
Metadata Latest Latest
Minimal Historical Historical
Weather Historical Historical
Mobility Historical Historical
Response Historical Historical
Forecast Latest Latest
Categories Historical Historical

You should use the files linked above instead of anything in the output subfolder via the Raw Github server, since the files under the output subfolder are subject to change in incompatible ways with no prior notice.

You can find several examples in the examples subfolder with code showcasing how to load and analyze the data for several programming environments. If you want the short version, here are a few snippets to get started.

Google Colab

You can use Google Colab if you want to run your analysis without having to install anything in your computer, simply go to this URL: https://colab.research.google.com/github/open-covid-19/data.

R

If you prefer R, then this is all you need to do to load the historical data:

data <- read.csv("https://open-covid-19.github.io/data/data.csv")

Python

In Python, you need to have the package pandas installed to get started:

import pandas
data = pandas.read_csv("https://open-covid-19.github.io/data/data.csv")

jQuery

Loading the JSON file using jQuery can be done directly from the output folder, this code snippet loads all historical data into the data variable:

$.getJSON("https://open-covid-19.github.io/data/data.json", data => { ... }

Powershell

You can also use Powershell to get the latest data for a country directly from the command line, for example to query the latest data for Australia:

Invoke-WebRequest 'https://open-covid-19.github.io/data/data_latest.csv' | ConvertFrom-Csv | `
    where Key -eq 'AU' | select Date,CountryName,Confirmed,Deaths

Understand the data

Data

Make sure that you are using the URL linked at the table above and not the raw GitHub file, the latter is subject to change at any moment. The columns of data.csv are:

Name Description Example
Date* ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-21
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} CN_HB
CountryCode ISO 3166-1 code of the country CN
CountryName American English name of the country, subject to change China
RegionCode (Optional) ISO 3166-2 or NUTS 2 code of the region HB
RegionName (Optional) American English name of the region, subject to change Hubei
Confirmed** Total number of cases confirmed after positive test 67800
Deaths** Total number of deaths from a positive COVID-19 case 3139
Latitude Floating point representing the geographic coordinate 30.9756
Longitude Floating point representing the geographic coordinate 112.2707
Population Total count of humans living in the region 58500000

* Date used is reporting date, which generally lags a day from the actual date and is subject to timezone adjustments. Whenever possible, dates consistent with the ECDC daily reports are used.

** Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported. For example, US states where deaths are not being reported have null values.

The CountryName and RegionName values are subject to change. You may use them for labels in your application, but you should not assume that they will remain the same in future updates. Instead, use CountryCode and RegionCode to perform joins with other data sources or for filtering within your application.

Metadata

Non-temporal data related to countries and regions. The columns of metadata.csv are:

Name Description Example
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
CountryCode ISO 3166-1 code of the country CN
CountryName American English name of the country, subject to change China
RegionCode (Optional) ISO 3166-2 or NUTS 2 code of the region HB
RegionName (Optional) American English name of the region, subject to change Hubei
Latitude Floating point representing the geographic coordinate 30.9756
Longitude Floating point representing the geographic coordinate 112.2707
Population Total count of humans living in the region 58500000

Minimal

There is a data_minimal.csv with a subset of the columns from data.csv but otherwise identical information:

Name Description Example
Date* ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-30
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
Confirmed** Total number of cases confirmed after positive test 6447
Deaths** Total number of deaths from a positive COVID-19 case 133

* Date used is adjusted reporting date. ECDC reporting date generally lags a day from the actual date. Time zone is used to adjust the date such that it matches local reports.

** Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported. For example, US states where deaths are not being reported have null values.

Weather

Daily weather information from nearest station reported by NOAA. The columns of weather.csv are:

Name Description Example
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_MI
Date ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-30
Station Identifier for the weather station USC00206080
Distance [kilometers] Distance between the location coordinates and the weather station 28.693
MinimumTemperature* [celsius] Recorded hourly minimum temperature 1.7
MaximumTemperature* [celsius] Recorded hourly maximum temperature 19.4
Rainfall* [millimeters] Rainfall during the entire day 51.0
Snowfall* [millimeters] Snowfall during the entire day 0.0

* Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported.

Mobility

Google's Mobility Reports are presented in CSV form as mobility.csv with the following columns:

Name Description Example
Date ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-25
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
TransitStations Percentage change in visits to transit station locations -15
RetailAndRecreation Percentage change in visits to retail and recreation locations -15
GroceryAndPharmacy Percentage change in visits to grocery and pharmacy locations -15
Parks Percentage change in visits to park locations -15
Residential Percentage change in visits to residential locations -15
Workplaces Percentage change in visits to workplace locations -15

Response

Summary of a government's response, including a stringency index, collected from University of Oxford:

Name Description Example
Date ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-25
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
SchoolClosing [0-3] Schools are closed 2
WorkplaceClosing [0-3] Workplaces are closed 2
CancelPublicEvents [0-3] Public events have been cancelled 2
RestrictionsOnGatherings [0-3] Gatherings of non-household members are restricted 2
PublicTransportClosing [0-3] Public transport is not operational 0
StayAtHomeRequirements [0-3] Self-quarantine at home is mandated for everyone 0
RestrictionsOnInternalMovement [0-3] Travel within country is restricted 1
InternationalTravelControls [0-3] International travel is restricted 3
IncomeSupport [USD] Value of fiscal stimuli, including spending or tax cuts 20449287023
DebtRelief [0-3] Debt/contract relief for households 0
FiscalMeasures [USD] Value of fiscal stimuli, including spending or tax cuts 20449287023
InternationalSupport [USD] Giving international support to other countries -0.75
PublicInformationCampaigns [0-2] Government has launched public information campaigns 1
TestingPolicy [0-3] Country-wide COVID-19 testing policy 1
ContactTracing [0-2] Country-wide contact tracing policy 1
EmergencyInvestmentInHealthCare [USD] Emergency funding allocated to healthcare 500000
InvestmentInVaccines [USD] Emergency funding allocated to vaccine research 100000
StringencyIndex [0-100] Overall stringency index 71.43

For more information about each field and how the overall stringency index is computed, see the Oxford COVID-19 government response tracker.

Note: Keys which correspond to a region-level datapoint always have the same value as the country-level datapoint, since the tracked government measures are at the country level.

Forecasting

There is also a short-term forecast dataset available in the output folder as data_forecast.csv, which has the following columns:

Name Description Example
ForecastDate ISO 8601 date (YYYY-MM-DD) of last known datapoint 2020-03-21
Date* ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-25
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
Estimated** Total number of cases estimated from forecasting model 66804.567
Confirmed Total number of cases confirmed after positive test 67800

* Date used is adjusted reporting date. ECDC reporting date generally lags a day from the actual date. Time zone is used to adjust the date such that it matches local reports.

** An estimate is also provided for dates before the forecast date, which corresponds to the output of the fitted model; this is the a priori estimate. True forecast values are those that have a Date higher than ForecastDate; which are the a posteriori estimates. Another way to distinguish between a priori and a posteriori estimates is to see if a given date has a value for both Confirmed and Estimated (a priori) or if the Confirmed value is null (a posteriori).

Active cases and categories

Another dataset available is data_categories.csv, which has the following columns:

Name Description Example
Date* ISO 8601 date (YYYY-MM-DD) of the datapoint 2020-03-27
Key CountryCode if country-level data, otherwise ${CountryCode}_${RegionCode} US_CA
NewCases Number of reported new cases from previous day 186
NewDeaths Number of reported new deaths from previous day 0
NewMild** Number of estimated new mild cases from previous day 148
NewSevere** Number of estimated new severe cases from previous day 27
NewCritical** Number of estimated new critical cases from previous day 9
CurrentlyMild** Number of estimated mild active cases at this date 819
CurrentlySevere** Number of estimated severe active cases at this date 190
CurrentlyCritical** Number of estimated critical active cases at this date 66

* Date used is adjusted reporting date. ECDC reporting date generally lags a day from the actual date. Time zone is used to adjust the date such that it matches local reports.

** See the category estimation notebook for an more thorough explanation of what each category represents and how the estimation is done.

Notes about the data

For countries where both country-level and region-level data is available, the entry which has a null value for the RegionCode and RegionName columns indicates country-level aggregation. Please note that, sometimes, the country-level data and the region-level data come from different sources so adding up all region-level values may not equal exactly to the reported country-level value. See the data loading tutorial for more information.

FR: Region-level confirmed cases for France only include positive results of tests being sent to a subset of all laboratories, therefore the sum of all confirmed cases across regions is significantly lower than the country totals.

PT: Regions reported by Portugal are broken down at the NUTS-2 level, not the usual ISO 3166-2 code reported by most other countries.

Backwards compatibility

Please note that the following datasets are maintained only to preserve backwards compatibility, but shouldn't be used in any new projects:

Contribute

The data from this repository has become increasingly reliant on Wikipedia sources. If you spot an error in the data, or there's a country you would like to include, the best way to contribute to this project is by helping maintain the data on the relevant Wikipedia article. Not only can that data be parsed automatically by this project, but it will also help inform millions of others that receive their information from Wikipedia. See the section below for a direct link to what Wikipedia data is being parsed by this project.

Sources of data

All data in this repository is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health.

Data Source
Metadata Wikipedia
Weather NOAA
Mobility data https://github.com/pastelsky/covid-19-mobility-tracker
Government response data Oxford COVID-19 government response tracker
Country-level data Daily reports from the ECDC portal
Argentina Wikipedia
Australia https://covid-19-au.github.io
Bolivia Wikipedia
Brazil https://github.com/elhenrico/covid19-Brazil-timeseries
Canada Department of Health Canada
Chile Wikipedia
China DXY COVID-19 dataset
Colombia Colombia's Ministry of Health
France https://github.com/cedricguadalupe/FRANCE-COVID-19
Germany https://github.com/jgehrcke/covid-19-germany-gae
India Wikipedia
Indonesia https://catchmeup.id/covid-19
Italy Italy's Department of Civil Protection
Japan https://github.com/swsoyee/2019-ncov-japan
Malaysia Wikipedia
Mexico https://github.com/carranco-sga/Mexico-COVID-19
Norway COVID19 EU Data
Pakistan Wikipedia
Peru Wikipedia
Poland COVID19 EU Data
Portugal https://github.com/dssg-pt/covid19pt-data
Russia Wikipedia
South Korea Wikipedia
Spain Datadista COVID-19 dataset
Sweden COVID19 EU Data
Switzerland OpenZH data
United Kingdom https://github.com/tomwhite/covid-19-uk-data
USA COVID Tracking Project

The data is automatically scraped and parsed using the scripts found in the input folder. This is done daily, and as part of the processing some additional columns are added, like region-level coordinates.

Before updating the outputs, data is spot-checked using various data sources including data from local authorities like Italy's ministry of health and the reports from WHO.

Why another dataset?

This dataset is heavily inspired by the dataset maintained by Johns Hopkins University. Unfortunately, that dataset has intermittently experienced maintenance issues and a lot of applications depend on this critical data being available in a timely manner. Further, the true sources of data for that dataset are still unclear.

Update the data

To update the contents of the output folder, first install the dependencies:

# Install Ghostscript
apt-get install -y ghostscript
# Install Python dependencies
pip install -r requirements.txt

Then run the following scripts to update all datasets:

sh input/update_data.sh

data's People

Contributors

owahltinez avatar dmamalis avatar glyph avatar leviticusmb avatar rquiroga7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.