GithubHelp home page GithubHelp logo

jcmartinmu / coronavirus Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ramikrispin/coronavirus

0.0 1.0 0.0 296.49 MB

The coronavirus dataset

Home Page: https://ramikrispin.github.io/coronavirus/

License: Other

R 68.82% Dockerfile 14.54% Shell 16.63%

coronavirus's Introduction

coronavirus

build CRAN_Status_Badge lifecycle License: MIT GitHub commit Downloads

The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

More details available here, and a csv format of the package dataset available here

Source: Centers for Disease Control and Prevention’s Public Health Image Library

Important Note

As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")

Data refresh

While the coronavirus CRAN version is updated every month or two, the Github (Dev) version is updated on a daily bases. The update_dataset function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:

library(coronavirus)
update_dataset()

Note: must restart the R session to have the updates available

Alternatively, you can pull the data using the Covid19R project data standard format with the refresh_coronavirus_jhu function:

covid19_df <- refresh_coronavirus_jhu()
head(covid19_df)
#>         date    location location_type location_code location_code_type     data_type value      lat      long
#> 1 2021-05-03 Afghanistan       country            AF         iso_3166_2    deaths_new     5 33.93911 67.709953
#> 2 2020-08-13 Afghanistan       country            AF         iso_3166_2     cases_new    86 33.93911 67.709953
#> 3 2020-10-24 Afghanistan       country            AF         iso_3166_2 recovered_new    13 33.93911 67.709953
#> 4 2021-03-14 Afghanistan       country            AF         iso_3166_2    deaths_new     3 33.93911 67.709953
#> 5 2020-02-28 Afghanistan       country            AF         iso_3166_2     cases_new     0 33.93911 67.709953
#> 6 2020-07-19 Afghanistan       country            AF         iso_3166_2    deaths_new    17 33.93911 67.709953

Dashboard

A supporting dashboard is available here

Usage

data("coronavirus")

This coronavirus dataset has the following fields:

  • date - The date of the summary
  • province - The province or state, when applicable
  • country - The country or region name
  • lat - Latitude point
  • long - Longitude point
  • type - the type of case (i.e., confirmed, death)
  • cases - the number of daily cases (corresponding to the case type)
head(coronavirus)
#>         date province             country       lat       long      type cases
#> 1 2020-01-22                  Afghanistan  33.93911  67.709953 confirmed     0
#> 2 2020-01-22                      Albania  41.15330  20.168300 confirmed     0
#> 3 2020-01-22                      Algeria  28.03390   1.659600 confirmed     0
#> 4 2020-01-22                      Andorra  42.50630   1.521800 confirmed     0
#> 5 2020-01-22                       Angola -11.20270  17.873900 confirmed     0
#> 6 2020-01-22          Antigua and Barbuda  17.06080 -61.796400 confirmed     0

Summary of the total confrimed cases by country (top 20):

library(dplyr)

summary_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20) 
#> # A tibble: 20 x 2
#>    country        total_cases
#>    <chr>                <dbl>
#>  1 US                33166418
#>  2 India             27157795
#>  3 Brazil            16194209
#>  4 France             5670486
#>  5 Turkey             5203385
#>  6 Russia             4960174
#>  7 United Kingdom     4483177
#>  8 Italy              4197892
#>  9 Germany            3662568
#> 10 Spain              3652879
#> 11 Argentina          3586736
#> 12 Colombia           3270614
#> 13 Poland             2867187
#> 14 Iran               2855396
#> 15 Mexico             2399790
#> 16 Ukraine            2244084
#> 17 Peru               1932255
#> 18 Indonesia          1786187
#> 19 Czechia            1658778
#> 20 Netherlands        1658587

Summary of new cases during the past 24 hours by country and type (as of 2021-05-25):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
#> # A tibble: 192 x 4
#> # Groups:   country [192]
#>    country              confirmed death recovered
#>    <chr>                    <dbl> <dbl>     <dbl>
#>  1 India                   208921  4157    295955
#>  2 Brazil                   73453  2173     41347
#>  3 Argentina                24601   576     24477
#>  4 US                       22756   621         0
#>  5 Colombia                 21181   459     17183
#>  6 Iran                     11873   208     14676
#>  7 Turkey                    9375   175     11192
#>  8 Nepal                     8387   169      6404
#>  9 Russia                    7762   385      8579
#> 10 Malaysia                  7289    60      3789
#> 11 Peru                      6966   417     11883
#> 12 Sweden                    6034    30         0
#> 13 Bolivia                   5696   159      3160
#> 14 Spain                     5359    90         0
#> 15 Indonesia                 5060   172      3795
#> 16 Iraq                      4938    27      4279
#> 17 Chile                     4160    37      5394
#> 18 Uruguay                   3971    51      2988
#> 19 Philippines               3966    36      4646
#> 20 Japan                     3918   106      5270
#> 21 Canada                    3700    38      7389
#> 22 Thailand                  3226    26         0
#> 23 Paraguay                  3223   117      2215
#> 24 Italy                     3220   166     11348
#> 25 France                    3155   221       837
#> 26 Switzerland               2770     7         0
#> 27 Bahrain                   2766    18      1535
#> 28 Ukraine                   2730   257     17667
#> 29 Sri Lanka                 2728    26      1228
#> 30 Pakistan                  2724    65      4686
#> 31 Germany                   2578   272     14190
#> 32 Netherlands               2497    13        48
#> 33 Mexico                    2483   265      1814
#> 34 United Kingdom            2417    15         8
#> 35 Greece                    2402    50         0
#> 36 Costa Rica                2370    28       684
#> 37 Kazakhstan                1860     4      2850
#> 38 Bangladesh                1675    40      1279
#> 39 United Arab Emirates      1672     4      1630
#> 40 Kuwait                    1408     5      1158
#> # … with 152 more rows

Plotting the total cases by type worldwide:

library(plotly)

coronavirus %>% 
  group_by(type, date) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(date) %>%
  mutate(active = confirmed - death - recovered) %>%
  mutate(active_total = cumsum(active),
                recovered_total = cumsum(recovered),
                death_total = cumsum(death)) %>%
  plot_ly(x = ~ date,
                  y = ~ active_total,
                  name = 'Active', 
                  fillcolor = '#1f77b4',
                  type = 'scatter',
                  mode = 'none', 
                  stackgroup = 'one') %>%
  add_trace(y = ~ death_total, 
             name = "Death",
             fillcolor = '#E41317') %>%
  add_trace(y = ~recovered_total, 
            name = 'Recovered', 
            fillcolor = 'forestgreen') %>%
  layout(title = "Distribution of Covid19 Cases Worldwide",
         legend = list(x = 0.1, y = 0.9),
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

Plot the confirmed cases distribution by counrty with treemap plot:

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 
  
  plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:

coronavirus's People

Contributors

ramikrispin avatar liorkri avatar jebyrnes avatar laresbernardo avatar mariabnd avatar mine-cetinkaya-rundel avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.