GithubHelp home page GithubHelp logo

ajg279 / covdata Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kjhealy/covdata

0.0 0.0 0.0 1.24 GB

COVID-related data from a variety of sources, packaged for use in R

Home Page: http://kjhealy.github.io/covdata

License: Other

R 100.00%

covdata's Introduction

covdata

R build status

About the package

covdata is a data package for R that collects and bundles datasets related to the COVID-19 pandemic from a variety of sources. The data are current as of Wednesday, March 10, 2021. Minimal post-processing of the data has been done in comparison to the original sources, beyond conversion to tibbles and transformation into narrow- or tidy form. Occasionally some additional variables have been added (mostly ISO country codes) to facilitate comparison across the datasets or their integration with other sources.

covdata provides the following:

COVID-19 specific case and mortality data

All-cause mortality and excess mortality data

Mobility and activity data

  • Data from Apple on relative trends in mobility in cities and countries since mid-January of 2020, based on usage of their Maps application.
  • Data from Google on relative trends in mobility was previously included with this package but is now available in covmobility.

Caveat Emptor

The data are provided as-is. More information about collection methods, scope, limits, and possible sources of error in the data can be found in the documentation provided by their respective sources. Follow the links above, and see the vignettes in the package. The collection and effective reporting of case and mortality data by national governments has technical and political aspects influenced by, amongst other things, the varying capacity of states to test, track and measure events in a timely fashion, the varying definitions, criteria, and methods employed by states in registering cases and deaths, and the role of politics in the exercise of capacity and the reporting of unflattering news. Researchers should take care to familiarize themselves with these issues prior to making strong claims based on these data.

Installation

There are two ways to install the covdata package.

Install direct from GitHub

You can install covdata from GitHub with:

remotes::install_github("kjhealy/covdata@main")

Installation using drat

While using install_github() works just fine, it would be nicer to be able to just type install.packages("covdata") or update.packages("covdata") in the ordinary way. We can do this using Dirk Eddelbuettel's drat package. Drat provides a convenient way to make R aware of package repositories other than CRAN.

First, install drat:

if (!require("drat")) {
    install.packages("drat")
    library("drat")
}

Then use drat to tell R about the repository where covdata is hosted:

drat::addRepo("kjhealy")

You can now install covdata in the usual way:

install.packages("covdata")

To ensure that the covdata repository is always available, you can add the following line to your .Rprofile or .Rprofile.site file:

drat::addRepo("kjhealy")

With that in place you'll be able to do install.packages("covdata") or update.packages("covdata") and have everything work as you'd expect.

Note that my drat repository only contains data packages that are not on CRAN, so you will never be in danger of grabbing the wrong version of any other package.

Loading the Data

library(tidyverse) # Optional but strongly recommended
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
#> ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
#> ✓ tibble  3.0.6     ✓ dplyr   1.0.3
#> ✓ tidyr   1.1.2     ✓ stringr 1.4.0
#> ✓ readr   1.4.0     ✓ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> x dplyr::filter()  masks stats::filter()
#> x purrr::is_null() masks testthat::is_null()
#> x dplyr::lag()     masks stats::lag()
#> x dplyr::matches() masks tidyr::matches(), testthat::matches()
library(covdata)
#> 
#> Attaching package: 'covdata'
#> The following object is masked from 'package:datasets':
#> 
#>     uspop
#> The following object is masked from 'package:kjhutils':
#> 
#>     %nin%

covnat_weekly
#> # A tibble: 11,426 x 11
#>    date       year_week cname iso3     pop cases deaths cu_cases cu_deaths
#>    <date>     <chr>     <chr> <chr>  <dbl> <dbl>  <dbl>    <dbl>     <dbl>
#>  1 2019-12-30 2020-01   Afgh… AFG   3.89e7     0      0        0         0
#>  2 2020-01-06 2020-02   Afgh… AFG   3.89e7     0      0        0         0
#>  3 2020-01-13 2020-03   Afgh… AFG   3.89e7     0      0        0         0
#>  4 2020-01-20 2020-04   Afgh… AFG   3.89e7     0      0        0         0
#>  5 2020-01-27 2020-05   Afgh… AFG   3.89e7     0      0        0         0
#>  6 2020-02-03 2020-06   Afgh… AFG   3.89e7     0      0        0         0
#>  7 2020-02-10 2020-07   Afgh… AFG   3.89e7     0      0        0         0
#>  8 2020-02-17 2020-08   Afgh… AFG   3.89e7     0      0        0         0
#>  9 2020-02-24 2020-09   Afgh… AFG   3.89e7     1      0        1         0
#> 10 2020-03-02 2020-10   Afgh… AFG   3.89e7     3      0        4         0
#> # … with 11,416 more rows, and 2 more variables: r14_cases <dbl>,
#> #   r14_deaths <dbl>
apple_mobility %>%
  filter(region == "New York City", transportation_type == "walking")
#> # A tibble: 415 x 8
#>    geo_type region transportation_… alternative_name sub_region country
#>    <chr>    <chr>  <chr>            <chr>            <chr>      <chr>  
#>  1 city     New Y… walking          NYC              New York   United…
#>  2 city     New Y… walking          NYC              New York   United…
#>  3 city     New Y… walking          NYC              New York   United…
#>  4 city     New Y… walking          NYC              New York   United…
#>  5 city     New Y… walking          NYC              New York   United…
#>  6 city     New Y… walking          NYC              New York   United…
#>  7 city     New Y… walking          NYC              New York   United…
#>  8 city     New Y… walking          NYC              New York   United…
#>  9 city     New Y… walking          NYC              New York   United…
#> 10 city     New Y… walking          NYC              New York   United…
#> # … with 405 more rows, and 2 more variables: date <date>, score <dbl>
covus %>% 
  filter(measure == "positive", 
         date == "2020-04-27", 
         state == "NJ")
#> # A tibble: 1 x 7
#>   date       state fips  data_quality_grade measure   count measure_label 
#>   <date>     <chr> <chr> <lgl>              <chr>     <dbl> <chr>         
#> 1 2020-04-27 NJ    34    NA                 positive 111188 Positive Tests
nytcovcounty %>%
  mutate(uniq_name = paste(county, state)) %>% # Can't use FIPS because of how the NYT bundled cities
  group_by(uniq_name) %>%
  mutate(days_elapsed = date - min(date)) %>%
  ggplot(aes(x = days_elapsed, y = cases, group = uniq_name)) + 
  geom_line(size = 0.25, color = "gray20") + 
  scale_y_log10(labels = scales::label_number_si()) + 
  guides(color = FALSE) + 
  facet_wrap(~ state, ncol = 5) + 
  labs(title = "COVID-19 Cumulative Recorded Cases by US County",
       subtitle = paste("New York is bundled into a single area in this data.\nData as of", format(max(nytcovcounty$date), "%A, %B %e, %Y")),
       x = "Days since first case", y = "Count of Cases (log 10 scale)", 
       caption = "Data: The New York Times | Graph: @kjhealy") + 
  theme_minimal()
#> Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.
#> Warning: Transformation introduced infinite values in continuous y-axis

plot of chunk plot

Documentation and Summary Codebook

To learn more about the different datasets available, consult the vignettes or, equivalently, the package website. For a codebook-like summary of the variables in each table, see the Codebook vignette

Citing the covdata package

To cite the package use the following:

citation("covdata")
#> 
#> To cite the package `covdata` in publications use:
#> 
#>   Kieran Healy. 2020. covdata: COVID-19 Case and Mortality Time Series.
#>   R package version 0.5.2, <http://kjhealy.github.io/covdata>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {covdata: COVID-19 Case and Mortality Time Series},
#>     author = {Kieran Healy},
#>     year = {2020},
#>     note = {R package version 0.5.2},
#>     url = {http://kjhealy.github.io/covdata},
#>   }

Please be sure to also cite the specific data sources, as described in the documentation for each dataset.

Mask icon in hex logo by Freepik.

covdata's People

Contributors

kjhealy avatar mattspence avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.