cjcarlson / embarcadero Goto Github PK
View Code? Open in Web Editor NEWπ²π Species distribution models with Bayesian additive regression trees
π²π Species distribution models with Bayesian additive regression trees
Running partial()
on a model of class rbart
fails, with error message
Error in pdbart(model, xind = x.vars, levs = lev, pl = FALSE) :
x.train must be a matrix, data.frame, formula, fitted bart model, or dbartsSampler
Can confirm that pdbart()
doesn't recognize an rbart
object as a fitted BART model; and the documentation for rbart_vi
doesn't mention partials, so this is on some level a dbarts thing.
known issues
haha
avian cbot showed that bart.auc doesn't work even though var selection one does
My initial attempt to install failed because the package velox was removed from CRAN and is only available from the archive.
Fitted an RI model with keepTrees = TRUE
, saved it, then loaded it in a fresh R session and used it for prediction: the output is all value = 0.5 for every grid cell. Suspect, based on our chat, that trees aren't being retained in the saved R object.
Sometimes, 'dbarts' entirely drops a variable from the model without even including it in the variable splits with "0" splits - both can happen in the same model, if dimensionality is high. This causes issues with varimp() and then some downstream issues.
I started writing a solution for varimp() but it doesn't work yet, and a lot of things are broken downstream because of it.
Thanks for the great package and vignette. I'm a bit puzzled that input data (including species occurrences and predictor variables) for bart
are a data frame (occ.df
), but then for predict
you need a RasterStack
of the predictor variables. Any chance of allowing 'climate' to be a data frame just like occ.df
?
Hiya,
just partook in a workshop organised by the IBS and am pretty stoked on your package here. Wanting to run an example as simple as possible during the workshop, I opted for a model with only two environmental covariates to my binary outcome of species-presence/absence.
Unfortunately, that caused a few issues in particular with the variable.step function which seems to accommodate exclusively models with three or more covariates. I propose a check to be implemented in this function that alerts users to this fact.
Cheers,
Erik
Hi,
Just encountered an error when running the variam.diag
function on my data, so I decided to test it with the vignette and was able to reproduce the error.
When you run the line
varimp.diag(occ.df[,xnames], occ.df[,'Observed'], iter=50, quiet=TRUE)
it throws up the error
Error in select(., -dropnames) : unused argument (-dropnames)
Trying to figure out what was going on, I ran lines 31-33 from the varim.diag code
quietly(model.0 <- bart.flex(x.data = x.data, y.data = y.data,
ri.data = ri.data,
n.trees = 200))
It turns out that
dput(unlist(attr(model.0$fit$data@x,"drop")) )
throws
c(x1 = FALSE, x2 = FALSE, x3 = FALSE, x4 = FALSE, x5 = FALSE, x6 = FALSE, x7 = FALSE, x8 = FALSE)
which is why the next line (#35)
dropnames <- colnames(x.data)[!(colnames(x.data) %in% names(which(unlist(attr(model.0$fit$data@x,"drop"))==FALSE)))]
doesn''t assign anything to dropnames
and then the function can't find the object.
Can't figure out anything beyond that, but then I tried variable.step
which uses similar code, and it works! I hope this is helpful to find the issue!
- Session info ------------------------------------------------------------------------------------------------
setting value
version R version 4.0.2 (2020-06-22)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate Spanish_Spain.1252
ctype Spanish_Spain.1252
tz Europe/Paris
date 2020-08-21
Hi Colin!
Just dropping a quick note about an error when running bart.step in which the "select" function is not properly defined as coming from either the raster or dplyr libraries:
library(conflicted)
sdm <- bart.step(y.data=dat[,"occurrence"], x.data=dat[,xnames], full=TRUE, quiet=TRUE)
_Error: [conflicted] select
found in 2 packages.
Either pick the one you want with ::
conflict_prefer()
Responding to closed pull request from @DidDrog11 - would be nice to allow y.data in non-vector format. Maybe worth a broader think about how other classes of data are handled e.g., tibbles.
bart.step()
throws an error
unable to find an inherited method for function βselectβ for signature β"data.frame"β
I think this is due to the call to select()
in the function varimp.diag()
(line 29 in my copy-paste of the code from the terminal), which doesn't specify the dplyr
namespace.
I am having some issues with the predict function that I'm not sure are a bug or me doing something wrong.
I ran a model in a cluster, saved the result of bart.step()
as an RDS file, and then opened it locally. Everything seems good, this is how the loaded object looks:
> class(calbor.sdm)
[1] "bart"
> summary(calbor.sdm)
Call: bart all.cov[, step.model] all.cov[, "pres"] TRUE
Predictor list:
bati chla_var logchla_lag3 sal sst sst_grad
Area under the receiver-operator curve
AUC = 0.8937647
Recommended threshold (maximizes true skill statistic)
Cutoff = 0.519118
TSS = 0.6287537
Resulting type I error rate: 0.16072
Resulting type II error rate: 0.2105263
which I don't hate, and the plot looks like this
However, when I try to predict using
CB_prediction <- embarcadero::predict2.bart(object = calbor.sdm2, #make sure I'm getting embarcadero's predict
x.layers = predictors_original,
quantiles =c(0.025, 0.975),
# splitby = 20, #Doesn't work with or without this
quiet = F)
I get a stack of rasters with all cells == 0.5, see:
> CB_prediction
class : RasterStack
dimensions : 266, 242, 64372, 3 (nrow, ncol, ncell, nlayers)
resolution : 0.4986111, 0.4986111 (x, y)
extent : -64.84722, 55.81667, -52.825, 79.80556 (xmin, xmax, ymin, ymax)
crs : NA
names : layer.1, layer.2, layer.3
min values : 0.5, 0.5, 0.5
max values : 0.5, 0.5, 0.5
I've tried tweaking the options of the predict function, but nothing seemed to work.
The cluster (where I ran the model) works with R version 3.6, while I'm running in my laptop Windows x64 (where I'm predicting and plotting) R version 4.1.1.
Thanks for the help!
Question more than an issue - but wanted to know how the algorithm handles spatial covariates - i.e. with INLA you would used SPDE to create a spatial smoother. Does BART have a way of incorporating those kind of spatial covariates that deal with spatial autocorrelation?
life is hell. need to do dark blue/dark red, on the same plot
Is there a way to install/use embarcadero without needing the velox dependency? I can't seem to install it with either method suggested in the readme getting errors saying "Rterm.exe - Entry Point Not Found". Looking at the issues in the velox github it looks like it isn't maintained and isn't updated to R v4.X
I tried a fresh install of R (v4.0.2) and had a different error but still can't install velox. The error looks similar to one you mention in your possible Mac fixes but I have no idea how to translate that into a fix for Windows.
Error: package or namespace load failed for 'velox' in .doLoadActions(where, attach):
error in load action .__A__.1 for package velox: Rcpp::loadModule(module = "BOOSTGEOM", what = TRUE, env = ns, : Unable to load module "BOOSTGEOM": cannot allocate vector of length 1739883848
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] ps_1.3.4 fansi_0.4.1 prettyunits_1.1.1 rprojroot_1.3-2 withr_2.2.0
[6] crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 backports_1.1.8 cli_2.0.2
[11] curl_4.3 remotes_2.1.1 rstudioapi_0.11 callr_3.4.3 tools_4.0.2
[16] glue_1.4.1 yaml_2.2.1 compiler_4.0.2 processx_3.4.3 pkgbuild_1.0.8
tmean being wrong for plague probably - then, issue gonna be a Thing
When the predictors include categorical variables, dbarts::bart
includes them but embarcadero
removes them. This appears to be because embarcadero
removes variables based on unlist(attr(model$fit$data@x, "drop"))
, where the categorical variables are actually split and renamed to reflect their categories. This leads to an error in varimp
(which fails for models that include categorical predictors) and to categorical variables being automatically excluded a priori by variable.step
and bart.step
, with a message unfairly blaming dbarts
. Here some reproducible code:
# generate some data as in ?bart examples:
f <- function(x) {
10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
10 * x[,4] + 5 * x[,5]
}
set.seed(99)
sigma <- 1.0
n <- 100
x <- matrix(runif(n * 10), n, 10)
Ey <- f(x)
y <- rnorm(n, Ey, sigma)
# make 'y' binary:
y <- ifelse(y > mean(y), 1, 0)
# make one of the x variables categorical:
x <- data.frame(x)
x[,1] <- ifelse(x[,1] > mean(x[,1]), "high", "low")
head(x)
# fit a bart model:
set.seed(99)
bartFit <- bart(x, y, keeptrees = TRUE)
summary(bartFit) # notice 10 variables (i.e. including the categorical one) in predictor list
bartFit$fit$data
unlist(attr(bartFit$fit$data@x, "drop")) # notice X1 (categorical variable) named here as X11 and X12 (one for each category)
# X11 X12 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 52 48 0 0 0 0 0 0 0 0 0
# attempt to compute variable importance with 'embarcadero':
varimp(bartFit) # Error in data.frame(names, varimps) : arguments imply differing number of rows: 9, 10
# but the variable importance info is there, including for the categorical variable (though it's also renamed here):
rel_imp <- bartFit$varcount / rowSums(bartFit$varcount)
colnames(rel_imp)
# [1] "X1.low" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
# attempt to simplify the model with 'embarcadero':
variable.step(x, y) # X1 (categorical variable) said to be dropped by 'dbarts', but it wasn't really -- it was dropped by 'embarcadero' when expecting unlist(attr(bartFit$fit$data@x, "drop")) to have the original variables' names
Super quick question! (hope that's ok)
I'm going to have to run several models and their corresponding predictions in a loop. I'd like to plot a binary map for each, but I haven't found a way to obtain a cutoff value from the object resulting from the bart
function. it appears in the console when running summary(model)
, but haven't found it anywhere inside the model
object. Any idea where to find it, or if it's even possible? I don't want to have to do each one by hand, printing the summary info on screen and typing the cutoff value myself, and I'm sure it must be tucked away inside the huge list somewhere, but I just haven't found it.
I've just downloaded R v4.0.5 and attempted to install embarcadero but I get a warning message saying that the package isn't available for this version of R- is this a known issue?
Cheers
Harry
I am starting with the embarcadero package as I would like to compare bayesian-SDMs with standard SDMs. However, I really need to include a categorical raster map (geology in my case) to model a plant species. It is very likely confined to special substrates and this would be important to consider.
Yet, the bart function does not accept a factorial parameter. Any workarounds known?
Create option in partial() to limit xlim to 100% or 90% range of training data
The plot.mcmc() function works fine for the example provided, but if the raster input dataset is not a simulated squared-raster layer but a masked raster layer (i.e. South America) the output of the plot.mcmc() function is meaningless.
I probably has something to do with the matrix conversion in the beginning of the function.
Here is a reproducible example of the problem:
library(dismo)
file <- paste0(system.file(package="dismo"), "/ex/bradypus.csv")
file
bradypus <- read.table(file, header=TRUE, sep=",") %>% select(-species)
bradypus$presence <- 1
files <- list.files(path=paste(system.file(package="dismo"), '/ex',
sep=''), pattern='grd', full.names=TRUE )
mask <- raster(files[1])
set.seed(1963)
bg <- randomPoints(mask, 500)
bg <- as.data.frame(bg)
colnames(bg) <- c("lon", "lat")
bg$presence <- 0
abspres <- bind_rows(bradypus, bg)
path <- file.path(system.file(package="dismo"), 'ex')
files <- list.files(path, pattern='grd$', full.names=TRUE )
files
predictors <- stack(files)
predictors
xnames <- names(predictors)
plot(predictors[[1]])
occ <- SpatialPoints(abspres[,c('lon','lat')])
occ.df <- cbind(abspres,
raster::extract(predictors, occ))
occ.df <- occ.df[,-c(1:3)]
### The actual example
sdm <- bart(y.train=occ.df[,'presence'],
x.train=occ.df[,xnames],
keeptrees = TRUE)
plot.mcmc(sdm, predictors, iter=50)
Can't find k even if you pass it k? Needs to be adjusted. Could be a cores issue? Unclear
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.