GithubHelp home page GithubHelp logo

wpgp / wopr Goto Github PK

View Code? Open in Web Editor NEW
33.0 6.0 7.0 18.21 MB

An R package and Shiny application to provide API access to the WorldPop Open Population Repository (WOPR)

Home Page: https://apps.worldpop.org/woprVision

License: GNU General Public License v3.0

R 29.20% HTML 70.80%
bayesian-methods r-package shiny-apps population-model

wopr's Introduction

wopr: An R package to query the
WorldPop Open Population Repository


WorldPop, University of Southampton

10 June 2021

Introduction

wopr is an R package that provides API access to the WorldPop Open Population Repository. This gives users the ability to:

  1. Download WorldPop population data sets directly from the R console,
  2. Submit spatial queries (points or polygons) to the WorldPop server to retrieve population estimates within user-defined geographic areas,
  3. Get estimates of population sizes for specific demographic groups (i.e. age and sex), and
  4. Get probabilistic estimates of uncertainty for all population estimates.
  5. Run the woprVision web application locally from the R console.

Code for the wopr package is openly available on GitHub: https://github.com/wpgp/wopr.

Installation

First, start a new R session. Then, install the wopr R package from WorldPop on GitHub:

devtools::install_github('wpgp/wopr')
library(wopr)

You may be prompted to update some of your existing R packages. This is not required unless the wopr installation fails. You can avoid checking for package updates by adding the argument upgrade='never'. If needed, you can update individual packages that may be responsible for any wopr installation errors using install.packages('package_name'). Or, you can use devtools::install_github('wpgp/wopr', upgrade='ask') to update all of the packages that wopr depends on. In R Studio, you can also update all of your R packages by clicking “Tools > Check for Package Updates”.

Note: When updating multiple packages, it may be necessary to restart your R session before each installation to ensure that packages being updated are not currently loaded in your R environment.

Usage

Demo code is provided in demo/wopr_demo.R that follows the examples in this README.

You can list vignettes that are available using: vignette(package='wopr')

The woprVision web application is an interactive web map that allows you to query population estimates from the WorldPop Open Population Repository. See the vignette for woprVision with: vignette('woprVision', package='wopr')

If you are intersted in developing your own front end applications that query the WOPR API, please read the vignette that describes the API backend for developers: vignette('woprAPI', package='wopr')

woprVision

woprVision is an R shiny application that allows you to browse an interactive map to get population estimates for specific locations and demographic groups. woprVision is available on the web at https://apps.worldpop.org/woprVision. You can also run woprVision locally from your R console using:

wopr::woprVision()

We suggest installing Michael Harper’s fix to the leaflet.extras draw toolbar:

devtools::install_github("dr-harper/leaflet.extras")

This is not required, but it fixes a bug that prevents the draw toolbar from being removed from the map when it is inactive.

Data Download

One way to access data from WOPR is to simply download the files directly to your computer from the R console. This can be done with three easy steps:

# Retrieve the WOPR data catalogue
catalogue <- getCatalogue()

# Select files from the catalogue by subsetting the data frame
selection <- subset(catalogue,
                    country == 'NGA' &
                      category == 'Population' & 
                      version == 'v1.2')

# Download selected files
downloadData(selection)

Note: 'NGA' refers to Nigeria. WOPR uses ISO country codes to abbreviate country names.

By default, downloadData() will not download files larger than 100 MB unless you change the maxsize argument (see ?downloadData). Using the default settings, a folder named ./wopr will be created in your R working directory for downloaded files. A spreadsheet listing all WOPR files currently saved to your hard drive can be found in ./wopr/wopr_catalogue.csv. To list the files that have been downloaded to your working directory from within the R console, use list.files('wopr', recursive=T). In multiple calls to downloadData(), files that you have previously downloaded will be overwritten if your local files do not match the server files (based on an md5sums check). This allows you to keep up-to-date local copies of every file.

You can download the entire WOPR data catalogue using: downloadData(getCatalogue(), maxsize=Inf). Note: Some files in the WOPR data catalogue are very large (e.g. 140 GB), so please ensure that you have enough disk space. If disk space is limited, you can restrict the maximum file size that you woud like to download using the maxsize argument (default = 100 MB).

Spatial Query

Population estimates can also be obtained from WOPR using spatial queries (geographic points or polygons) for user-defined geographic area(s) and demographic group(s).

Spatial queries must be submitted using objects of class sf. You can explore this functionality using example data from Nigeria that are included with the wopr package. Plot the example data using:

data(wopr_points)
plot(wopr_points, pch=16)

data(wopr_polys)
plot(wopr_polys)

Note: ESRI shapefiles (and other file types) can be read into R as sf objects using:

sf_feature <- sf::st_read('shapefile.shp')

To submit a spatial query, you must first identify which WOPR databases support spatial queries:

getCatalogue(spatial_query=T)

This will return a data.frame:

country version
NGA v1.2
NGA v1.1
COD v1.0

These results indicate that there are currently two WOPR databases for Nigeria (NGA) that support spatial queries and one database for Democratic Republic of Congo (COD).

Query total population at a single point

To get the total population for a single point location from the NGA v1.2 population estimates use:

N <- getPop(feature=wopr_points[1,], 
            country='NGA', 
            version='v1.2')

Notice that the population estimate is returned as a vector of samples from the Bayesian posterior distribution:

print(N)
hist(N)

This can be summarized using:

summaryPop(N, confidence=0.95, tails=2, abovethresh=1e5, belowthresh=5e4)

The confidence argument controls the width of the confidence intervals. The tails argument controls whether the confidence intervals are calculated as one-tailed or two-tailed probabilities. If confidence=0.95 and tails=2, then there is a 95% probability that the true population falls within the confidence intervals, given the model structure and the data used to fit the model. If confidence=0.95 and tails=1, then there is a 95% chance that the true population exceeds the lower confidence interval and a 95% chance that the true population is less than the upper confidence interval.

The abovethresh argument defines the threshold used to calculate the probability that the population will exceed this threshold. For example, if abovethresh=1e5, then the abovethresh result from summaryPop() is the probability that the population exceeds 100,000 people. The belowthresh argument is similar except it will return the probability that the population is less than this threshold.

Query total population within a single polygon

To query WOPR using a single polygon works exactly the same as a point-based query:

N <- getPop(feature=wopr_polygons[1,], 
            country='NGA', 
            version='v1.2')

summaryPop(N, confidence=0.95, tails=2, abovethresh=1e2, belowthresh=50)

Query population for specific demographic groups

To query population estimates for specific demographic groups, you can use the agesex_select argument (see ?getPop). This argument accepts a character vector of age-sex groups. 'f0' represents females less than one year old; 'f1' represents females from age one to four; 'f5' represents females from five to nine; 'f10' represents females from 10 to 14; and so on. 'm0' represents males less than one, etc.

Query the population of children under the age of five within a single polygon:

N <- getPop(feature=wopr_polygons[1,], 
            country='NGA', 
            version='v1.2',
            agesex_select=c('f0','f1','m0','m1'))

summaryPop(N, confidence=0.95, tails=2, abovethresh=10, belowthresh=1)

If the agesex argument is not included, the getPop() function will return estimates of the total population (as above).

Query multiple point or polygon features

We can query multiple point or polygon features using the woprize() function:

N_table <- woprize(features=wopr_polys, 
                   country='NGA', 
                   version='1.2',
                   agesex_select=c('m0','m1','f0','f1'),
                   confidence=0.95,
                   tails=2,
                   abovethresh=2e4,
                   belowthresh=1e4
                   )

You can save these results in a number of ways:

# save results as shapefile
sf::st_write(N_table, 'example_shapefile.shp')

# save results as csv
write.csv(sf::st_drop_geometry(N_table), file='example_spreadsheet.csv', row.names=F)

# save image of mapped results
jpeg('example_map.jpg')
tmap::tm_shape(N_table) + tmap::tm_fill('mean', palette='Reds', legend.reverse=T)
dev.off()

Contributing

The WorldPop Open Population Repository (WOPR) was developed by the WorldPop Research Group within the Department of Geography and Environmental Science at the University of Southampton. Funding was provided by the Bill and Melinda Gates Foundation and the United Kingdom Foreign, Commonwealth & Development Office (OPP1182408, OPP1182425, INV-002697). Professor Andy Tatem provides oversight of the WorldPop Research Group. The wopr R package was developed by Doug Leasure. Maksym Bondarenko and Niko Ves developed the API backend server. Edith Darin added multi-lingual functionality to the Shiny app and the French translation. Natalia Tejedor Garavito proofread the Spanish translation. Gianluca Boo created the WOPR logo. Population data have been contributed to WOPR by many different researchers within the WorldPop Research Group.

Suggested Citation

Leasure DR, Bondarenko M, Darin E, Tatem AJ. 2021. wopr: An R package to query the WorldPop Open Population Repository, version 1.3.3. WorldPop, University of Southampton. doi: 10.5258/SOTON/WP00716. https://github.com/wpgp/wopr

License

GNU General Public License v3.0 (GNU GPLv3)]

wopr's People

Contributors

doug-leasure avatar edarin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wopr's Issues

cell id optional in API

Let's try to make cellid an option attribute to return in the API requests to the /polytotal and /polyagesex endpoints. This could really speed up transfer time for large requests.

woprize is giving me an error

I am trying to get population estimates for areas within Guinea.

And I get an error message back, plus a couple of warnings.

The warnings either refer to an unknown column 'geometry' or 'geom', depending on whether I've renamed geometry to geom or not.

I presume I'm just doing something wrong.

G <- woprize(features = focidg, country = 'GIN', version = '1.0', confidence = 0.95)
Submitting 4 feature(s) to:
  https://api.worldpop.org/v1/wopr/polytotal
Checking status of 36 tasks:
  Error in results[[j]] : subscript out of bounds
In addition: Warning messages:
1: Unknown or uninitialised column: `geometry`.
2: Unknown or uninitialised column: `geometry`.

local query of agesex populations

Hi,

I was trying to query only a single age-sex group using a local SQL database. In getPopSql() here: https://github.com/wpgp/wopr/blob/master/R/getPopSql.R#L73 the code collapses to a vector when agesex_select has a length of 1 (e.g. 'm0'). If you add the option drop = FALSE to the data frame that will preserve the 2 dimensional structure that apply expects. Happy to submit a quick pull request with the change, if you'd like.

Catalogue doesn't have list of all countries for which dataset is available

The catalogue lists only 51 countries for which data is available. The population dataset is available only for eight countries. Can this be updated, please? Could you please help me out if I'm missing something?

I know other datasets are available (for example, here's a list of datasets for India).

> catalouge = getCatalogue() %>% as_tibble()

> catalouge %>% filter(category == "Population") %>% distinct(country)
# A tibble: 8 x 1
  country
  <chr>  
1 BFA    
2 COD    
3 GHA    
4 MOZ    
5 NGA    
6 SLE    
7 SSD    
8 ZMB


> catalouge %>% distinct(country)
# A tibble: 51 x 1
   country
   <chr>  
 1 AGO    
 2 BDI    
 3 BEN    
 4 BFA    
 5 BWA    
 6 CAF    
 7 CIV    
 8 CMR    
 9 COD    
10 COG    
# … with 41 more rows

Hash check

Take a look at the last two files in the API return from http://wopr.worldpop.org/api/v1.0/data

They both have the same hash: f11531ca5cdacb40846182bee54778d6.

The md5sum for the file "NGA_population_v1_1_uncertainty.tif" is different than the hash provided in the API return.

Unsettled pixels

We don't make model predictions for pixels that are mapped as unsettled so those cellids are not in the SQL database. When I submit a polygon that includes no settled pixels, it returns this error: "No User Description for this type of error: AttributeError".

A more informative message might be: "Polygon unsettled"

API for private data

We need to prevent non-public data from being accessed via API. Should we block these requests completely or should we make a key to provide restricted access to all files for a given non-public release (e.g. /COD/Population/v1.0/*)?

Sync worldpop drive

We need to sync the following folder on the WorldPop drive to the gridFree server:
Z:\Projects\WP517763_GRID3_DataRelease\DataFinal\*

This will ensure that the data catalogue is kept up to date for API download requests and that the correct version number is assigned to SQL databases being queried by API requests for specific locations.

Currently API requests to the following endpoints are returning results from NGA v1.1 when I request results from NGA v1.2:
https://api.worldpop.org/v1/grid3/stats
http://10.19.100.66/v1/grid3/stats
http://10.19.100.66/v1/grid3/popag

FYI - There is no SQL database for v1.2 yet. I am working on that now.

OperationalError: GIN v1.0

I am getting intermittent errors when clilcking the "Submit" button for GIN v1.0:

simpleWarning: No User Description for this type of error: OperationalError

The error only occurs for some locations and it sometimes returns the population estimate, but othertimes it doesn't.

WOPR Title Page

The title at wopr.worldpop.org should be "WorldPop Open Population Repository" rather than "WorldPop Open Data Repository"

WOPR not WODR :)

Tiles not visible

The population tiles are not showing for any countries in woprVision. I have tested from Brave Browser and Firefox on Ubuntu (20.04) workstation and from my phone (Brave browser).

API response: unsettled area

When a polygon is queried that contains no settled area, the API currently returns:
status=finished, error=true, error_message=No User Description for this type of error: ValueError.

It needs to return:
status=finished, error=true, error_message=No settlements mapped within this area.

I may also be beneficial to add an attribute "error_code" so that the font-end doesn't need to parse the actual error_message to know how to process different results.

API response: parameter help

An empty query to:
https://api.worldpop.org/v1/wopr/pointtotal

Returns this result:
{
"error": true,
"message": "Required parameters [ iso3], [ ver], [ lat], [ lon] are missing or empty",
"example_api_url": "api.worldpop.org/v1/services/sample?dataset={dataset_name}&year={year}&lat={lat}&lon={lon}"
}

The "example_api_url" is formatted for the WorldPop Global endpoint, not the WOPR endpoint.

Dependency geojsonlint retired

Hi,
I'm trying to install wopr, but this fails because it takes a dependency on geojsonlint, which has been retired from CRAN.
Would it be possible to migrate the particular functionality to a different package?
Thank you.

problem with getPop

Hi all.

I'm having a problem with the wopr::getPop() function. It throws an error about the geometries class, even though my input is in the class require. Please see the reprex below.

ps. Congrats on such a fantastic project and set of packages!

reprex

library(sf)
library(geobr)
library(wopr)

# download area of interest
my_area <- geobr::read_state(code_state = 'DF')

class( st_geometry(my_area) )
> [1] "sfc_MULTIPOLYGON" "sfc" 

# get population estimate
pop <- wopr::getPop(feature = my_area,
                    country = 'BRA')

> Error in endpoint(features = feature, agesex = length(agesex_select) <  : 
                      Input feature geometries must be of class "sfc_POLYGON", "sfc_MULTIPOLYGON", "sfc_POINT", or "sfc_MULTIPOINT"

Additional layers: population displacement, count of households, etc.

The dev API now includes the functionality to specify the column from the SQLite database ("Nhat" table) that you would like to query ("Pop" is default). We can store estimated parameters other than total population in these additional columns (vector of posterior predictions or single numbers). We have immediate applications in SSD to add counts of people displaced from/to each pixel, and also for GHA to add counts of housing units.

COD database debugging

@bondarenkom When I try to use wopr::getPop() to query the COD database using dev_server/v1/wopr/polytotal , I get this error:

$status
[1] "finished"

$error
[1] TRUE

$error_message
[1] "Internal File Error."

$taskid
[1] "1fef3886-3817-550c-991d-11587a62558c"

$startTime
[1] "2019-12-05 21:52:34.275152"

$endTime
[1] "2019-12-05 21:52:34.523026"

$executionTime
[1] 0

Any ideas? Can you tell me what "Internal File Error" means so maybe it will help me figure out the issue on my end?

I am using the same geojson that works with this link:

http://10.19.100.66/v1/wopr/polytotal?iso3=COD&ver=1.0&&geojson={%22type%22:%22FeatureCollection%22,%22features%22:[{%22type%22:%22Feature%22,%22properties%22:{},%22geometry%22:{%22type%22:%22Polygon%22,%22coordinates%22:[[[16.94091796875,-4.127285323245357],[17.5341796875,-4.127285323245357],[17.5341796875,-3.5572827265412794],[16.94091796875,-3.5572827265412794],[16.94091796875,-4.127285323245357]]]}}]}

point-based api request

I get an error when I submit a point request to dev_server/v1/grid3/sample .

Here is the request:
$iso3
[1] "NGA"
$ver
[1] 1.2
$lat
[1] 6.22337
$lon
[1] 4.771174
$key
[1] "..."

Here is the error when I try to retrieve the result from dev_server/v1/tasks:
$status
[1] "finished"
$error
[1] TRUE
$error_message
[1] "No User Description for this type of error: ValueError"
$taskid
[1] "d59007a0-e6e6-525d-a120-a6a905c115a3"
$startTime
[1] "2019-11-19 19:00:20.811556"
$endTime
[1] "2019-11-19 19:00:21.044907"
$executionTime
[1] 1

thinned results

@bondarenkom The NGA v1.2 requests are quite a bit slower because v1.2 has 10,000 posterior samples compared to NGA v1.1 which has 1,000 samples. A user may want to have a thinned sample of the posteriors to speed up the calculation. (the cell id switch may help this issue too)

We might want to consider adding a "thin" argument that can be 0 to 1. If thin=1, then all 10,000 posterior samples are returned. If thin=0.5, then only 5,000 posterior samples are returned. Also, it would need to be thinned in a particular way. For example, if thin=0.5, we would want to keep every other sample (e.g. rather than the first 5,000 samples).

How hard would this be?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.