GithubHelp home page GithubHelp logo

davesteps / ppmify Goto Github PK

View Code? Open in Web Editor NEW

This project forked from goldingn/ppmify

0.0 2.0 0.0 272 KB

convert your Poisson regression into a Poisson point process model

License: Other

R 100.00%

ppmify's Introduction

ppmify

convert your Poisson regression into a Poisson point process model

Build Status codecov.io cran version

package under construction - not ready to use just yet!

construction

ppmify (i.e. ppm-ify, pronounced p-p-m-if-eye? never mind...) is a micropackage to help set up Poisson point process models (PPMs) for point data. PPMs can be fitted using standard software for Poisson regression, after a little bit of fiddling with the data (adding integration points as pseudo-observations and calculating integration weights). This process isn't too difficult to do manually, but can be a bit boring, so why bother?

The function ppmify() takes the data needed for fitting a PPM, does this fiddling for you, and then returns a dataframe-like object that you can use in your favourite Poisson modelling software. That could be GLM, a generalised boosted model (AKA boosted regression tree), elasticnet regression, Gaussian process regression or whatever. At some point in the future it should be able to set up slightly more complex integration methods, PPMs with known spatial reporting biases, PPMs with attraction/repulsion between points, and possibly even multi-species PPMs - watch this space!

However, this isn't an end-to end solution. Once ppmify() has handed over the processed data, it's up to the user to fit the Poisson model in the right way. We'll provide some guidance here, but would advise users to familiarise themselves with PPMs and this modelling approach first.

Installation

You can install ppmify directly from GitHub using the devtools package:

devtools::install_github('goldingn/ppmify')
library(ppmify)

(If you're a windows user, you may need to install RTools first to use this devtools function)

ppmifying things

Here's an example of a Boosted regression tree (generalised boosted model) PPM species distribution model fitted to a presence-only species distribution dataset:

First we load an example dataset contained in the dismo R package (install this first if it's not already installed):

bradypus <- read.csv(paste0(system.file(package="dismo"),
                            "/ex/bradypus.csv"))[, -1]
covariates <- stack(list.files(paste0(system.file(package="dismo"), '/ex'),
                               pattern='grd', full.names=TRUE))

plot(covariates[[1]])
points(bradypus, pch = 16, cex = 0.4)

Next, we run ppmify on the point data with quadrature on a regular grid, specifying the area of interest (one of the covariates), the stack of covariates to use, and the density of integration points we want (in points per square km):

ppm <- ppmify(bradypus,
              area = covariates[[1]],
              covariates = covariates,
              density = 1 / 100,
              method = 'grid')

Next, we can load the gbm R package and fit a PPM BRT. The key points to remember are: use a poisson likelihood and use the weights column as a log-offset:

# fit a BRT PPM SDM, OMG.
library(gbm)
m <- gbm(points ~ offset(log(weights)) +
           bio1 +
           bio5 +
           bio6 +
           bio7 +
           bio8,
         data = ppm,
         n.trees = 10000,
         cv.folds = 5,
         distribution = 'poisson')

Next we pick the optimal number of trees (a gbm step to prevent overfitting) and make predictions using the raster-friendly predict function in the raster package. When predicting we need to remember to: provide a weights argument (so that raster::predict can do it's thing) and predict on the response scale to get the expected number of points per unit area:

# optimal number of trees
trees <- gbm.perf(m, plot.it = FALSE, method = 'cv')

# predict to a raster
p <- predict(covariates, m,
             type = 'response',
             const = data.frame(weights = 1),
             n.trees = trees)

# and plot
plot(p)

(actually gbm refuses to apply the offset, even though we provide it, so for this type of model the const argument isn't necessary - normally it is though)

ppmify() works in square kilometres, so the units in p are the expected number of points per square kilometre. We can instead calculate the expected number of points per cell by multipling by the area of each cell (these are lat-longs, so the cell area increases further from the equator):

p_cell <- area(p) * p
plot(p_cell)
points(bradypus, pch = 16, cex = 0.4)

We can then calculate the expected number of points predicted by the model and compare it to the number it was fitted to

cellStats(p_cell, sum)
## [1] 118.1604
nrow(bradypus)
## [1] 116

Pretty cool.

ppmify's People

Contributors

goldingn avatar

Watchers

James Cloos avatar David Stephens avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.