GithubHelp home page GithubHelp logo

covid19datahub / covid19 Goto Github PK

View Code? Open in Web Editor NEW
252.0 13.0 94.0 162.9 MB

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution

Home Page: https://covid19datahub.io

License: GNU General Public License v3.0

R 99.34% TeX 0.35% CSS 0.03% JavaScript 0.28%
covid-data covid19-data coronavirus covid-19 2019-ncov r

covid19's Introduction

COVID-19 Data Hub Twitter URL

eRum2020::CovidR

The repository is being actively maintained and it will be for the whole 2023 thanks to the support of the R Consortium.

This repository aggregates COVID-19 data at a fine-grained spatial resolution from several sources and makes them available in the form of ready-to-use CSV files available at https://covid19datahub.io

Variable Description
confirmed Cumulative number of confirmed cases
deaths Cumulative number of deaths
recovered Cumulative number of patients released from hospitals or reported recovered
tests Cumulative number of tests
vaccines Cumulative number of total doses administered
people_vaccinated Cumulative number of people who received at least one vaccine dose
people_fully_vaccinated Cumulative number of people who received all doses prescribed by the vaccination protocol
hosp Number of hospitalized patients on date
icu Number of hospitalized patients in intensive therapy on date
vent Number of patients requiring invasive ventilation on date
population Total population

The dataset also includes policy measures by Oxford's government response tracker, and a set of keys to match the data with Google and Apple mobility reports, with the Hydromet dataset, and with spatial databases such as Eurostat for Europe or GADM worldwide.

Administrative divisions

The data are provided at 3 different levels of granularity:

  • level 1: national-level data (e.g., countries)
  • level 2: sub-national data (e.g., regions/states)
  • level 3: lower-level data (e.g., municipalities/counties)

Download the data

All the data are available to download at the download centre.

How it works

COVID-19 Data Hub is developed around 2 concepts:

  • data sources
  • countries

To extract the data for one country, different data sources may be required. For this reason, the code in the R folder is organized in two main types of files:

  • files representing a data source (prefix ds_)
  • files representing a country (prefix iso_)

The ds_ files implement a wrapper to pull the data from a provider and import them in an R data.frame with standardized column names. The iso_ files take care of merging all the data sources needed for one country, and to map the identifiers used by the provider to the id listed in the CSV files. Finally, the function covid19 takes care of downloading the data for all countries at all levels.

The code is run continuously on a dedicated Linux server to crunch the data from the providers. In principle, one can use the function covid19 from the repository to generate the same data we provide at the download centre. However, this takes between 1-2 hours, so that downloading the pre-computed files is typically more convenient.

Contribute

If you find some issues with the data, please report a bug.

Academic publications

The first version of the project is described in "COVID-19 Data Hub", Journal of Open Source Software, 2020. The implementation details and the latest version of the data are described in "A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution", Scientific Data, Nature, 2022. You can browse the publications that use COVID-19 Data Hub here and here. Please cite our paper(s) when using COVID-19 Data Hub.

Cite as

We have invested a lot of time and effort in creating COVID-19 Data Hub, please cite the following when using it:

Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

A BibTeX entry for LaTeX users is:

@Article{guidotti2020,
    title = {COVID-19 Data Hub},
    year = {2020},
    doi = {10.21105/joss.02376},
    author = {Emanuele Guidotti and David Ardia},
    journal = {Journal of Open Source Software},
    volume = {5},
    number = {51},
    pages = {2376}
}

The implementation details and the latest version of the data are described in:

Guidotti, E., (2022), "A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution", Sci Data 9, 112, doi: 10.1038/s41597-022-01245-1

A BibTeX entry for LaTeX users is:

@Article{guidotti2022,
    title = {A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution},
    year = {2022},
    doi = {10.1038/s41597-022-01245-1},
    author = {Emanuele Guidotti},
    journal = {Scientific Data},
    volume = {9},
    number = {1},
    pages = {112}
}

Terms of use

By using COVID-19 Data Hub, you agree to our terms of use.

Supported by

R Consortium IVADO HEC Montréal Hack Zurich Università degli Studi di Milano

covid19's People

Contributors

angelinakhatiwada avatar ardiad avatar dankelley avatar elsaburren avatar estellad avatar federicolg avatar guilhermegfv avatar jonekeat avatar m3it avatar martinbenes1996 avatar montemurropaolo avatar muhammedseehan-commits avatar rijinbaby avatar robertrosca avatar sim55649 avatar xiangyunhuang avatar yuanzhouir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid19's Issues

Mirai Solutions using covid19datahub

Dear Emanuele

Following the CovidR contest and some problems we have had with the data we are currently using we have decided to switch to the Covid19datahub project for our dashboard.
https://mirai-solutions.ch/gallery/covid19/

On the top left we will replace the "Data Source" text with your info. A screenshot from the feature branch attached.
image

On the ReadMe file we have quoted yourself, David Ardia and again referenced your website.
miraisolutions/Covid19#112

Let me know if this seems sufficient as a quote.
I believe we can go live on the master today.

We are also happy to be mentioned in your "usage" page!
https://covid19datahub.io/articles/usage.html

I would like to thank you for the great project you have put in place and for all the extraordinary efforts you and your team have made (!!!) .

Best regards

Guido Maggio

County level death data

County level death data for the United States seemed to have been changed to zero for the majority of counties.

Implemented France level 1,2,3 with new scheme

ds_opencovid_fr <- function(level=1, cache=cache){
  
  # Montemurro Paolo 11 05 2020
  
  # Libraries
  library(dplyr) #You can import different libraries!
  
  # Download data
  url <- "https://raw.githubusercontent.com/opencovid19-fr/data/master/dist/chiffres-cles.csv"
  x   <- read.csv(url, cache=cache) #To test it, remove cache from here.
  
  # Formatting columns
  x$date       <- as.Date(x$date)
  x$tests      <- x$depistes
  x$confirmed  <- x$cas_confirmes
  x$deaths     <- x$deces
  x$recovered  <- x$gueris
  x$hosp       <- x$hospitalises 
  x$icu        <- x$reanimation  
  
  x <- x[c("date","tests","confirmed","deaths","recovered","hosp","icu","granularite","maille_code","maille_nom")] #Not needed, but cleaner
  
  # Keeping only relevant level
  if(level==1){x<- x[x$granularite=="pays",]}
  if(level==2){x<- x[(x$granularite=="region" | x$granularite=="collectivite-outremer") ,]}
  if(level==3){x<- x[x$granularite=="departement",]}
  
  # Cleaning
  x <- x %>% distinct(date,maille_code, .keep_all = TRUE) #Keep the first observation, more reliable
  
  # Done!
  return(x)
  
  # Don't forget to check your data!!!
  
}


Coordinates for Denmark and France are wrong

Hi! Just wanted to notify that the lat/long coordinates for Denmark and France as delivered with the country timelines are off. Maybe this is due to their overseas territories. If you run a geo algorithm searching for the center point of a country polygon, you end up with a point in the middle of nowhere if you forget to delete the overseas polygons first. So the point for Denmark is somewhere in the north Atlantic and the point for France is in western Africa.

France could be 46 lat 3 long
Denmark could be 56 lat 9 long
(EPSG:3857 Web Mercator)

CovsirPhy (Python package for COVID-19 analysis) will use COVID-19 Data Hub

Dear @emanuele-guidotti and COVID-19 Data Hub team,
Thank you for creating this dataset and Python interface.

We are developing python package to analyse COVID-19 data with SIR-like models (and open dataset for Japan).
CovsirPhy: https://github.com/lisphilar/covid19-sir

Currently, we are using different datasets for analysis (JHU dataset, OxCGRT dataset, population values). However, I'd like to switch to COVID-19 Data Hub and use Python interface covid19dh as a dependency of CovsirPhy in the next version.

I found comments on this package in your paper.
When we use covid19dh as a dependency to download the dataset, it is enough to cite the paper as follows?

Guidotti, E., Ardia, D., (2020). COVID-19 Data Hub, Working paper, https://www.researchgate.net/publication/340771808_COVID-19_Data_Hub

Should we show the citation lists (stdout of covid19dh.covid19(country=None, verbose=True)) when the users download this dataset?

I'm looking forward to collaborating with you!
Best Regards,
Lisphilar

France series

How it comes that the number of confirmed cases in France in 2020-04-21 is 156921 and on the next days on 2020-04-22 decreases to 154715?

Thanks

Update Issue

Update at level 2 not there even though source data are current. This is turning out to be a recurrent and frustrating issue. Sorry, but I just needed to say it because we are very reliant on your data - we have built our platform of surveillance on it.

wrongly formated key_numeric in US data

Hi,
The "key_numeric" data is incorrect in the admin level 3 US data: it is formatted as an integer instead of a character and is loosing digits. These are FIPS codes, and should be 5 digits at the county level. For example, Autauga County in Alabama is FIPS code "01001", but it is entered in the covid19 data hub as 1001. I'm not familiar with other countries, but similar problems may exist in other geographies as well.
Fixing this would make life easier...thanks for the great product!

Data for Austria is outdated

Hi!
I just wanted to report that the data for positive tests in Austria in the admin level 1 file has not been updated for three days. Where do you source the data for Austria? JHU has different numbers for the positives, at least in the time series.

JHU:
10-27: 86102
10-28: 89496
10-29: 93949

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

Covid-19:
2020-10-27: 91386
2020-10-28,: 94891
2020-10-29: 94891
2020-10-30: 94891

https://storage.covid19datahub.io/data-1.csv

The official data provider for Austria would be the federal health agency AGES: https://covid19-dashboard.ages.at/dashboard.html and the URL of the relevant data CSV is https://covid19-dashboard.ages.at/data/CovidFaelle_Timeline.csv (Filter for "Österreich" in the column "Bundesland" and take the latest data from the column "AnzahlFaelle".

I know that there seems to be a problem with data transfer from Austria to ECDC. Do you source it from there?

All the best from Vienna!
gh

Policy Measures in US States dont seem correct

Thanks for putting this package together!.

Looking at Georgia I notice that all policy variables have increased over time without any decrease. Georgia has relaxed a number of restrictions and this relaxation does not seem to be accounted for.

Could you please shine some light on this?

No Updates?

The level-3 data set has not been updated since July 2. This is the second substantial delay in the past 10 days. Will updates be more regular in the future? My needs require the freshest data possible. Thank you, Ken

why no pop for United States?

I think there as a non-NA pop for the United States before, but now it seems to be gone. I wonder if that's a problem with the upstream data, or a result of the name change (which I think was "US" until a few days ago, but I might be remembering back to the days when I used my own code to download the Johns Hopkins data).

library(COVID19)
d <- covid19()
for (country in c("France","United States", "Canada")) {
    ds <- d[d$country==country,]
    cat(ds$country[1], ds$pop[1],"\n")
}

yields

France 66987244
United States NA
Canada 37058856

I cannot find data for Kansas City, MO

Hello,

I cannot find the rows for Kansas City, MO in the data of administrative level 3. Could you please help me point out where it is? How did you deal with Kansas City and the counties that the city overlaps?

FYI. In the github repo of NYTimes, it says "Four counties (Cass, Clay, Jackson and Platte) overlap the municipality of Kansas City, Mo. The cases and deaths that we show for these four counties are only for the portions exclusive of Kansas City. Cases and deaths for Kansas City are reported as their own line."

Thanks.

Zheng

missing populations

Hi, I'm new and have much to learn. First, thank you for making this very cool package!

Second, I am wondering why there is no population data for countries with id ERI, GPC and MSZ.

Thank you again!

Error in UK data on 01/06/2020

The deaths data is incorrect for the UK between 24-May and 01-Jun. On 01-Jun, a historical correction of 445 was introduced to add deaths tested by commercial labs (called "pillar 2"). In the official government figures this correction was retrospectively applied to previous reporting dates from 24-May to 31-May. Your data has applied the entire 445 correction to 01-Jun, increasing the correct number that day of 111 to an artificially high 556. I assume this will have happened because you just take the latest announced cumulative total and so the detail of the correction was missed.

I suggest refreshing your UK data based on the official DHSC numbers available here:

That will result in the following updates to your UK cumulative deaths:

  • 23-May: 36675 (Correct)
  • 24-May: 36793 > 37116
  • 25-May: 36914 > 37237
  • 26-May: 37048 > 37373
  • 27-May: 37460 > 37807
  • 28-May: 37837 > 38220
  • 29-May: 38161 > 38593
  • 30-May: 38376 > 38819
  • 31-May: 38489 > 38934
  • 01-Jun: 39045 (Correct)
  • 02-Jun: 39369 (Correct)
  • 03-Jun: 39728 (Correct)
  • 04-Jun: 39904 (Correct)

A similar correction was applied of ~4000 deaths on 29-Apr that your data does correctly incorporate in the same way as the UK government retrospectively applied it. Both corrections should have the same treatment.

Many thanks - happy to offer any additional clarification required.

coding of OxCGRT policy measures

Is this intentional or perhaps I'm missing something? The geographic flags from OxCGRT aren't included (reasonable simplification, in my opinion), but that means that, e.g. Italy's schools are listed as closed on 23 February (true for Lombardy, presumably), rather than 4 March. Perhaps worth including the acaps dataset as an alternative? (https://www.acaps.org/projects/covid19/data)

possibly hours-old problem with COVID19 'death's column

First, I apologize that this issue is quite long. You can basically see my problem by looking at the code and output blocks at the bottom. I think there may be a problem with COVID19 that did not exist yesterday.

I'm wondering whether some has changed very recently with COVID19, in the deaths column. Below is some code that shows unexpected results. I am not sure whether this is a difficulty in how subset is working, how [ is working, or perhaps in the deaths column. I am not familiar with working with tibbles, having started using R long before they were invented, so maybe both my trial methods for extracting data are faulty?

NOTE: I am not querying by ISO codes for country names, because I simply don't know all the names, whereas I do know the actual names. Also, I'm doing this for nearly 200 countries, and I fear that calling covid19() that many times will be slow.

My confusion points are

  • why do [ and subset give different results?
  • why does subset give incorrect results (i.e. max per country is identical to max per world)
  • how can the [ work so differently for different countries

As a clue, I am pretty sure the results I am getting this morning are different from those I got yesterday; the previous results were not giving 0 deaths in countries where I know for sure there have been deaths.

The R code

library(COVID19)
d <- covid19(end=Sys.Date()-1)
cat("World:\n    ", max(d$deaths), "deaths\n")
for (country in c("Australia", "Canada", "United Kingdom", "United States")) {
    cat(country, ":\n", sep="")
    sub1 <- subset(d, d$country == country)
    cat("    method 1 reveals ", max(sub1$deaths), "deaths\n")
    sub2 <- d[d$country == country, ]
    cat("    method 2 reveals ", max(sub2$deaths), "deaths\n")
}

gives output

World:
     56259 deaths
Australia:
    method 1 reveals  56259 deaths
    method 2 reveals  0 deaths
Canada:
    method 1 reveals  56259 deaths
    method 2 reveals  0 deaths
United Kingdom:
    method 1 reveals  56259 deaths
    method 2 reveals  21092 deaths
United States:
    method 1 reveals  56259 deaths
    method 2 reveals  56259 deaths

City and country names not showing up

Hi - Thank you for all your work. This is a remarkable contribution!!!

I noticed that in early version of the R package, covid19, the city and state names were visible from the ID variable. Now it's outputting the underlying codes without the place names. Is that bug? If not, is there a crosswalk to link the codes with the place names?

require("COVID19")

us.city <- covid19("USA", level = 3)
us.city.list <- sort(unique(us.city$id))

us.city.list[1:20]
[1] "0007cb93" "00261c81" "004a8ee7" "0051e968" "006b65bd"
[6] "00738b9f" "0083c472" "008b8a54" "00a1a685" "00b3d68a"
[11] "00b948a7" "00cc6d45" "00cebd4e" "00fc7fbd" "010cd779"
[16] "010e0772" "013e158a" "0141ae45" "0163ccb2" "0171bcbd"

No more updates?

First of all: Thank you very much for this fantastic project. I just wanted to know whether there are issues with the data updates or whether the update cycle has been extended to once a week instead of once a day. The last update on admin level 1 was on 2020-06-24, the last update on US admin level 3 on 2020-06-23. https://covid19datahub.io/articles/data.html

GBR administrative area level 2 lost at level 3

The level 2 region data is lost at level 3, I'm pretty this wasn't the case before.

Before you could download the level 3 GBR data, then filter by level 2 by England, and then get all the level 3 regions in England. Now if you pull the level 3 data in the level administrative_area_level_2 column is empty, so there's no way to select a level 2 area and then filter level 3 by that selection.

Missing data

Dear, data from Brazil and several countries in South America are not available.

Kind regards

> covid19(country = "BRA")
# A tibble: 0 x 35
# Groups:   id [0]
# … with 35 variables: id <chr>, date <date>, tests <int>, confirmed <int>, recovered <int>, deaths <int>,
#   hosp <int>, vent <int>, icu <int>, population <int>, school_closing <int>, workplace_closing <int>,
#   cancel_events <int>, gatherings_restrictions <int>, transport_closing <int>,
#   stay_home_restrictions <int>, internal_movement_restrictions <int>,
#   international_movement_restrictions <int>, information_campaigns <int>, testing_policy <int>,
#   contact_tracing <int>, stringency_index <dbl>, iso_alpha_3 <chr>, iso_alpha_2 <chr>,
#   iso_numeric <int>, currency <chr>, administrative_area_level <int>, administrative_area_level_1 <chr>,
#   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>, latitude <dbl>, longitude <dbl>,
#   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>

Level 2 Updates

Level 2 data have not updated since August 2. The NYTimes source data are updated to 8/4 at levels 2 and 3 was of noon on 8/5, as I write this.

Having error while loading repository

Install COVID19

remotes::install_github("covid19datahub/COVID19")
Downloading GitHub repo covid19datahub/COVID19@master
Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/covid19datahub/COVID19/tarball/master'

Install COVID19

remotes::install_github("covid19datahub/COVID19")
Downloading GitHub repo covid19datahub/COVID19@master
covid19datahub-COVID19-31935e9/man/figures/apple-touch-icon.png: truncated gzip input
tar.exe: Error exit delayed from previous errors.
Error: Failed to install 'COVID19' from GitHub:
Does not appear to be an R package (no DESCRIPTION)
In addition: Warning messages:
1: In utils::untar(tarfile, ...) :
‘tar.exe -xf "C:\Users\choti\AppData\Local\Temp\Rtmpg1s3Jr\file2bbc2489976.tar.gz" -C "C:/Users/choti/AppData/Local/Temp/Rtmpg1s3Jr/remotes2bbc5fc55a4b"’ returned error code 1
2: In system(cmd, intern = TRUE) :
running command 'tar.exe -tf "C:\Users\choti\AppData\Local\Temp\Rtmpg1s3Jr\file2bbc2489976.tar.gz"' had status 1

problem with today's data

Many thanks for this package.

I'm wondering whether I'm missing something, as illustrated with the R script and output given below, run using updated COVID19 as updated a few minutes ago.

Note the most recent value of confirmed, for example.

I can work around this issue, by ignoring today's data if they disagree badly with the data on the day before, but I am pointing this out in case it reveals a problem that you might want to look at. (Or, perhaps, is there a way provided by COVID19 to skip not-yet-complete data?)

R script

library(COVID19)
old <- world("country")
new <- covid19()
for (country in c("Canada", "United States")) {
    cat("#", country, "\n")
    print(tail(old[old$country == country, ], 3))
    print(tail(new[new$country == country, ], 3))
}

Output


R version 4.0.0 alpha (2020-04-01 r78130)
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(COVID19)
> old <- world("country")
> new <- covid19()
> for (country in c("Canada", "United States")) {
+     cat("#", country, "\n")
+     print(tail(old[old$country == country, ], 3))
+     print(tail(new[new$country == country, ], 3))
+ }
# Canada 
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 CAN   2020-04-13    779     25674     0    107480     0     0     0 Canada 
2 CAN   2020-04-14    899     27029     0    116822     0     0     0 Canada 
3 CAN   2020-04-15      0         8     0      8210     0     0     0 Canada 
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 CAN   2020-04-13    779     25674     0    107480     0     0     0 Canada 
2 CAN   2020-04-14    899     27029     0    116822     0     0     0 Canada 
3 CAN   2020-04-15      0         8     0      8210     0     0     0 Canada 
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# United States 
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 USA   2020-04-13  23468    578978     0         0     0     0     0 United…
2 USA   2020-04-14  25770    605948     0         0     0     0     0 United…
3 USA   2020-04-15      0         0     0         0     0     0     0 United…
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>
# A tibble: 3 x 21
# Groups:   id [1]
  id    date       deaths confirmed tests recovered  hosp   icu  vent country
  <chr> <date>      <dbl>     <dbl> <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>  
1 USA   2020-04-13  23468    578978     0         0     0     0     0 United…
2 USA   2020-04-14  25770    605948     0         0     0     0     0 United…
3 USA   2020-04-15      0         0     0         0     0     0     0 United…
# … with 11 more variables: state <lgl>, city <lgl>, lat <dbl>, lng <dbl>,
#   pop <int>, pop_14 <dbl>, pop_15_64 <dbl>, pop_65 <dbl>, pop_age <dbl>,
#   pop_density <dbl>, pop_death_rate <dbl>

US City Data, recent data missing, and NC County Missing

  1. City Level Data in the US is not current- latest date is June 12, 2020.
  2. For US, North Carolina, City data, "Alamance" (the first county alphabetically) is missing. In reality, there are 100 counties in the state... even though there are 100 entities for Counties, one of them is "Out of NC"

Reproducible code below:
data3 <- subset(covid19("US", level=3),state=="North Carolina")
table(data3$city) #first county alphabetically is missing, should be "Alamance"
max(data3$date) #returns "2020-06-12"

Austria series

Hi
Could you please let me know why Austria confirmed cases are lower than the published one. The recovered cases are higher than the confirmed cases
Thanks

Partially missing level 2 data for Germany

Hi, first of all, thanks for putting together this awesome tool.

I am trying to run an analysis using time-series data for the confirmed cases in German landers. I am using the Python API but I also double-checked with the R API and I am getting the same:

covid_germany, _ = covid19(['Germany'], level=2, verbose=False)
print(covid_germany.administrative_area_level_2.unique())
['Bayern' 'Schleswig-Holstein' 'Nordrhein-Westfalen' 'Baden-Württemberg'
 'Bremen' 'Hamburg' 'Hessen' 'Rheinland-Pfalz' 'Niedersachsen']

So basically, I can only fetch data for 9 out of the 16 landers. Missing the regions in red: ['Saarland', 'Berlin', 'Sachsen-Anhalt', 'Thüringen', 'Brandenburg', 'Sachsen', 'Mecklenburg-Vorpommern']

image

The source, RKI, seems to report for all landers, so where is this data getting lost?

JOSS review

Hi there.

Thank you for the submission - this is a great resource! This is my review as part of openjournals/joss-reviews#2376. Please can you address the following comments:

  • tests / continuous integration

I can see the tests directory but I'm not entirely clear on how to run these. I don't code using R so this might be why. Regardless, could you please add some documentation for running these tests (or make it clearer where this documentation is if it already exists). Can you also add some continuous integration for your test suite please - travis or similar would be great.

  • references

Please could you include DOIs for the references where you can.

  • minor paper comments

Although the paper is very well written and the summary is nice, the actual software description/purpose isn't included until the final paragraph which is on the second page of the paper. In terms of readability/impact, maybe you could introduce the datahub earlier on? Feel free to ignore this comment if you wish. Secondly, and this might be intentional, there are a few extra full stops after first mention of Excel!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.