The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.
More details available
here, and a csv
format
of the package dataset available
here
As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes
Install the CRAN version:
install.packages("coronavirus")
Install the Github version (refreshed on a daily bases):
# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")
While the coronavirus CRAN
version is updated
every month or two, the Github (Dev)
version is updated on a
daily bases. The update_dataset
function enables to overcome this gap
and keep the installed version with the most recent data available on
the Github version:
library(coronavirus)
update_dataset()
Note: must restart the R session to have the updates available
Alternatively, you can pull the data using the
Covid19R project data
standard
format
with the refresh_coronavirus_jhu
function:
covid19_df <- refresh_coronavirus_jhu()
head(covid19_df)
#> date location location_type location_code location_code_type data_type value lat long
#> 1 2021-05-03 Afghanistan country AF iso_3166_2 deaths_new 5 33.93911 67.709953
#> 2 2020-08-13 Afghanistan country AF iso_3166_2 cases_new 86 33.93911 67.709953
#> 3 2020-10-24 Afghanistan country AF iso_3166_2 recovered_new 13 33.93911 67.709953
#> 4 2021-03-14 Afghanistan country AF iso_3166_2 deaths_new 3 33.93911 67.709953
#> 5 2020-02-28 Afghanistan country AF iso_3166_2 cases_new 0 33.93911 67.709953
#> 6 2020-07-19 Afghanistan country AF iso_3166_2 deaths_new 17 33.93911 67.709953
A supporting dashboard is available here
data("coronavirus")
This coronavirus
dataset has the following fields:
date
- The date of the summaryprovince
- The province or state, when applicablecountry
- The country or region namelat
- Latitude pointlong
- Longitude pointtype
- the type of case (i.e., confirmed, death)cases
- the number of daily cases (corresponding to the case type)
head(coronavirus)
#> date province country lat long type cases
#> 1 2020-01-22 Afghanistan 33.93911 67.709953 confirmed 0
#> 2 2020-01-22 Albania 41.15330 20.168300 confirmed 0
#> 3 2020-01-22 Algeria 28.03390 1.659600 confirmed 0
#> 4 2020-01-22 Andorra 42.50630 1.521800 confirmed 0
#> 5 2020-01-22 Angola -11.20270 17.873900 confirmed 0
#> 6 2020-01-22 Antigua and Barbuda 17.06080 -61.796400 confirmed 0
Summary of the total confrimed cases by country (top 20):
library(dplyr)
summary_df <- coronavirus %>%
filter(type == "confirmed") %>%
group_by(country) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases)
summary_df %>% head(20)
#> # A tibble: 20 x 2
#> country total_cases
#> <chr> <dbl>
#> 1 US 33166418
#> 2 India 27157795
#> 3 Brazil 16194209
#> 4 France 5670486
#> 5 Turkey 5203385
#> 6 Russia 4960174
#> 7 United Kingdom 4483177
#> 8 Italy 4197892
#> 9 Germany 3662568
#> 10 Spain 3652879
#> 11 Argentina 3586736
#> 12 Colombia 3270614
#> 13 Poland 2867187
#> 14 Iran 2855396
#> 15 Mexico 2399790
#> 16 Ukraine 2244084
#> 17 Peru 1932255
#> 18 Indonesia 1786187
#> 19 Czechia 1658778
#> 20 Netherlands 1658587
Summary of new cases during the past 24 hours by country and type (as of 2021-05-25):
library(tidyr)
coronavirus %>%
filter(date == max(date)) %>%
select(country, type, cases) %>%
group_by(country, type) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type,
values_from = total_cases) %>%
arrange(-confirmed)
#> # A tibble: 192 x 4
#> # Groups: country [192]
#> country confirmed death recovered
#> <chr> <dbl> <dbl> <dbl>
#> 1 India 208921 4157 295955
#> 2 Brazil 73453 2173 41347
#> 3 Argentina 24601 576 24477
#> 4 US 22756 621 0
#> 5 Colombia 21181 459 17183
#> 6 Iran 11873 208 14676
#> 7 Turkey 9375 175 11192
#> 8 Nepal 8387 169 6404
#> 9 Russia 7762 385 8579
#> 10 Malaysia 7289 60 3789
#> 11 Peru 6966 417 11883
#> 12 Sweden 6034 30 0
#> 13 Bolivia 5696 159 3160
#> 14 Spain 5359 90 0
#> 15 Indonesia 5060 172 3795
#> 16 Iraq 4938 27 4279
#> 17 Chile 4160 37 5394
#> 18 Uruguay 3971 51 2988
#> 19 Philippines 3966 36 4646
#> 20 Japan 3918 106 5270
#> 21 Canada 3700 38 7389
#> 22 Thailand 3226 26 0
#> 23 Paraguay 3223 117 2215
#> 24 Italy 3220 166 11348
#> 25 France 3155 221 837
#> 26 Switzerland 2770 7 0
#> 27 Bahrain 2766 18 1535
#> 28 Ukraine 2730 257 17667
#> 29 Sri Lanka 2728 26 1228
#> 30 Pakistan 2724 65 4686
#> 31 Germany 2578 272 14190
#> 32 Netherlands 2497 13 48
#> 33 Mexico 2483 265 1814
#> 34 United Kingdom 2417 15 8
#> 35 Greece 2402 50 0
#> 36 Costa Rica 2370 28 684
#> 37 Kazakhstan 1860 4 2850
#> 38 Bangladesh 1675 40 1279
#> 39 United Arab Emirates 1672 4 1630
#> 40 Kuwait 1408 5 1158
#> # … with 152 more rows
Plotting the total cases by type worldwide:
library(plotly)
coronavirus %>%
group_by(type, date) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type, values_from = total_cases) %>%
arrange(date) %>%
mutate(active = confirmed - death - recovered) %>%
mutate(active_total = cumsum(active),
recovered_total = cumsum(recovered),
death_total = cumsum(death)) %>%
plot_ly(x = ~ date,
y = ~ active_total,
name = 'Active',
fillcolor = '#1f77b4',
type = 'scatter',
mode = 'none',
stackgroup = 'one') %>%
add_trace(y = ~ death_total,
name = "Death",
fillcolor = '#E41317') %>%
add_trace(y = ~recovered_total,
name = 'Recovered',
fillcolor = 'forestgreen') %>%
layout(title = "Distribution of Covid19 Cases Worldwide",
legend = list(x = 0.1, y = 0.9),
yaxis = list(title = "Number of Cases"),
xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))
Plot the confirmed cases distribution by counrty with treemap plot:
conf_df <- coronavirus %>%
filter(type == "confirmed") %>%
group_by(country) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases) %>%
mutate(parents = "Confirmed") %>%
ungroup()
plot_ly(data = conf_df,
type= "treemap",
values = ~total_cases,
labels= ~ country,
parents= ~parents,
domain = list(column=0),
name = "Confirmed",
textinfo="label+value+percent parent")
The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:
- World Health Organization (WHO): https://www.who.int/
- DXY.cn. Pneumonia. 2020.
https://ncov.dxy.cn/ncovh5/view/pneumonia.
- BNO News:
https://bnonews.com/index.php/2020/04/the-latest-coronavirus-cases/
- National Health Commission of the People’s Republic of China (NHC):
http:://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml - China CDC (CCDC):
http:://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
- Hong Kong Department of Health:
https://www.chp.gov.hk/en/features/102465.html
- Macau Government: https://www.ssm.gov.mo/portal/
- Taiwan CDC:
https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
- US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html
- Government of Canada:
https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/symptoms.html
- Australia Government Department of
Health:https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert
- European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases
- Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19
- Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
- 1Point3Arces: https://coronavirus.1point3acres.com/en
- WorldoMeters: https://www.worldometers.info/coronavirus/
- COVID Tracking Project: https://covidtracking.com/data. (US Testing and Hospitalization Data. We use the maximum reported value from “Currently” and “Cumulative” Hospitalized for our hospitalization number reported for each state.)
- French Government: https://dashboard.covid19.data.gouv.fr/
- COVID Live (Australia): https://covidlive.com.au/
- Washington State Department of Health:https://www.doh.wa.gov/Emergencies/COVID19
- Maryland Department of Health: https://coronavirus.maryland.gov/
- New York State Department of Health: https://health.data.ny.gov/Health/New-York-State-Statewide-COVID-19-Testing/xdss-u53e/data
- NYC Department of Health and Mental Hygiene: https://www1.nyc.gov/site/doh/covid/covid-19-data.page and https://github.com/nychealth/coronavirus-data
- Florida Department of Health Dashboard: https://services1.arcgis.com/CY1LXxl9zlJeBuRZ/arcgis/rest/services/Florida_COVID19_Cases/FeatureServer/0 and https://fdoh.maps.arcgis.com/apps/opsdashboard/index.html#/8d0de33f260d444c852a615dc7837c86
- Palestine (West Bank and Gaza): https://corona.ps/details
- Israel: https://govextra.gov.il/ministry-of-health/corona/corona-virus/
- Colorado: https://covid19.colorado.gov/data)