GithubHelp home page GithubHelp logo

adamlilith / enmsdmpredimport Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 1.0 98.83 MB

Testing methods for measuring variable importance in species distribution models

License: GNU General Public License v3.0

R 100.00%
niche-modelling niche statistical-inference species-distribution-modelling ecological-niche-modelling

enmsdmpredimport's Introduction

enmSdmPredImport

DOI

The enmSdmPredImport package for the R Statistical Environment enables the creation of virtual species for assessing the inferential power of species distribution models. The package is used in Smith, A.B. and Santos, M.J. 2020. Testing the ability of species distribution models to infer variable importance. Ecography 43:1801-1813. That manuscript uses this package and scripts in the GitHub repository https://github.com/adamlilith/enmSdmPredImport_scenarios. Those scripts provide the best illustrations on how to use this package.

Use

The package has three main functions which are called in sequence:

predImportMakeData: Creates a set of data scenarios based on simulated landscapes (environmental layers) with user-defined properties and a species with a known relationship to the environment. We refer to the particular combination of landscape and species as a "data scenario." Data scenarios can be iterated to simulated randomness in placement of the species' presences, absences, and background sites used to train and evaluate a distribution model (a "data iteration"). Output is a series of files for a given data scenario, one per data iteration, used by subsequent functions.

predImportTrainModels: Trains a species distribution modeling algorithms on each data scenario. Algorithms supported include BIOLCIM, BRTs, GAMs, GLMs, MaxEnt (and MaxNet), and RFs, plus an "omniscient" model the same as used to simulate the species' original probability of distribution. Output is a set of files, one per algorithm, data iteration, and type of analysis ("multivariate" models with all variables, "reduced" models with each variable dropped in turn, and "univariate" models with just one variable at a time). Note that in Smith and Santos only the "multivariate" models were used.

predImportEval: Assesses the predictive accuracy and inferential accuracy of each combination of data scenario iteration, model algorithm, and model type (multivariate, reduced, univariate). Output is a series of files, one per data scenario containing a data frame with evaluation statistics.

Additionally, the loadEvals function is a utility to load and compile all evaluation files for a given data scenario into a single data frame for futher analysis.

All other functions in the package are called by the three main functions, but can also be used by themselves. Of particular interest may be the genesis function for creating square and circular landscapes with different patterns of environmental variables. See the help for this function and any other functions for details.

Installation

To install this package and dependencies you will need the packages (from CRAN) sp, raster, gbm, dismo, and rJava. You will also need to install dependences from GitHub:

# install.packages('remotes') # only if you don't have "remotes" installed already!
remotes::install_github('adamlilith/omnibus, dependencies=TRUE')
remotes::install_github('adamlilith/statisfactory, dependencies=TRUE')
remotes::install_github('adamlilith/legendary, dependencies=TRUE')
remotes::install_github('adamlilith/enmSdm, dependencies=TRUE')
remotes::install_github('adamlilith/enmSdmPredImport, dependencies=TRUE')

Note that if you have issues installing any of the latter packages (e.g., because of using an earlier/later version of R), you can download the ZIP file of the package from the zipTarFiles directory in each of these packages. You can then install manually in R from this file.

Use of the model algorithm maxent requires copying a Java file into the "java" directory of the library of your dismo installation. You can obtain a copy of this file from https://biodiversityinformatics.amnh.org/open_source/maxent/.

Citation

This package was first used in Smith, A.B. and Santos, M.J. 2020. Testing the ability of species distribution models to infer variable importance. Ecography 43:1801-1813.

Abstract: Models of species’ distributions and niches are frequently used to infer the importance of range‐ and niche‐defining variables. However, the degree to which these models can reliably identify important variables and quantify their influence remains unknown. Here we use a series of simulations to explore how well models can 1) discriminate between variables with different influence and 2) calibrate the magnitude of influence relative to an ‘omniscient’ model. To quantify variable importance, we trained generalized additive models (GAMs), Maxent and boosted regression trees (BRTs) on simulated data and tested their sensitivity to permutations in each predictor. Importance was inferred by calculating the correlation between permuted and unpermuted predictions, and by comparing predictive accuracy of permuted and unpermuted predictions using AUC and the continuous Boyce index. In scenarios with one influential and one uninfluential variable, models failed to discriminate reliably between variables when training occurrences were < 8–64, prevalence was > 0.5, spatial extent was small, environmental data had coarse resolution and spatial autocorrelation was low, or when pairwise correlation between environmental variables was |r| > 0.7. When two variables influenced the distribution equally, importance was underestimated when species had narrow or intermediate niche breadth. Interactions between variables in how they shaped the niche did not affect inferences about their importance. When variables acted unequally, the effect of the stronger variable was overestimated. GAMs and Maxent discriminated between variables more reliably than BRTs, but no algorithm was consistently well‐calibrated vis‐à‐vis the omniscient model. Algorithm‐specific measures of importance like Maxent's change‐in‐gain metric were less robust than the permutation test. Overall, high predictive accuracy did not connote robust inferential capacity. As a result, requirements for reliably measuring variable importance are likely more stringent than for creating models with high predictive accuracy.

Last updated 2020-12-02 Adam B. Smith

enmsdmpredimport's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lipinu

enmsdmpredimport's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.