boyanangelov / sdmbench Goto Github PK
View Code? Open in Web Editor NEWBenchmarking Species Distribution Models
License: MIT License
Benchmarking Species Distribution Models
License: MIT License
from JOSS review
I noticed that in get_benchmarking_data()
you use rgbif::occ_search
. You might consider rgbif::occ_data
as it has all the same user interface (parameters) but only spends time internally getting and parsing the occurrence data and completely drops the other data that you don't want.
Also note that occ_search/occ_data use the GBIF occurrence API, which is limited to 200K results for any one query. So if you know or think you might need more than that you should use the download API via rgbif::occ_download
and friends
also also note that we now have an interface to the GBIF maps API via rgbif::map_fetch
which is a super fast way to get a raster map of occurrences, so you're not getting the actual occurrence data but a quick summary of the data- see https://www.gbif.org/developer/maps - Don't know if this is appropriate or not for your use case(s) but worth knowing about
Something I always look for when getting started with a new package is a package-level helpfile, so I can do ?sdmbench
and see some general information about the package, how to get started, and see a roadmap to the available functions.
sdmbench doesn't have one of these yet, but it would be great if it did. You can create one of these with roxygen by documenting a NULL
in a script in the R/
folder, like spocc does here.
Hi Boyan, I'm Nick, one of the reviewers of sdmbench for JOSS. I'm starting my code review now, so will post issues as they come up, then I'll summarise them in the review thread later.
I just went to install the package following the readme instructions:
devtools::install_github("boyanangelov/sdmbench")
and it failed to install with the error:
ERROR: this R is version 3.4.3, package 'sdmbench' requires R >= 3.4.4
which is because there's a dependency on a very recent version of R in DESCRIPTION.
Is that dependency on a super-recent R version really necessary?
If not, it might be worth removing this entry altogether.
If it is necessary, it would be worth mentioning that in the installation information.
R CMD check
flagged some exported functions that are not documented:
‘customPredictFunGBM’ ‘customPredictFunKSVM’ ‘customPredictFunLogreg’ ‘customPredictFunMultinom’ ‘customPredictFunNB’ ‘customPredictFunXGB’
perhaps these were exported by mistake?
after #4 was resolved I continued trying the examples in the README, and ran into a problem
benchmarking_data <- get_benchmarking_data("Loxodonta africana", limit = 1200, climate_resolution = 10)
bmr <- benchmark_sdm(benchmarking_data$df_data,
learners = learners,
dataset_type = "block",
sample = FALSE)
#> Error in benchmark_sdm(benchmarking_data$df_data, learners = learners, :
#> Assertion on 'blocking' failed: Must have length 11199, but has length 0.
Session info -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 Patched (2018-08-12 r75119)
system x86_64, darwin15.6.0
ui X11
language (EN)
collate en_US.UTF-8
tz US/Pacific
date 2018-09-24
Packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
.py 0.0.16 <NA> local
abind 1.4-5 2016-07-21 cran (@1.4-5)
assertthat 0.2.0 2017-04-11 CRAN (R 3.5.0)
backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
base * 3.5.1 2018-08-13 local
BBmisc 1.11 2017-03-10 cran (@1.11)
bindr 0.1.1 2018-03-13 CRAN (R 3.5.0)
bindrcpp 0.2.2 2018-03-29 CRAN (R 3.5.0)
broom 0.5.0 2018-07-17 CRAN (R 3.5.1)
checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
class 7.3-14 2015-08-30 CRAN (R 3.5.1)
colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
compiler 3.5.1 2018-08-13 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
crul 0.6.0 2018-07-10 cran (@0.6.0)
curl 3.2 2018-03-28 CRAN (R 3.5.0)
CVST 0.2-2 2018-05-26 cran (@0.2-2)
data.table 1.11.4 2018-05-27 CRAN (R 3.5.0)
datasets * 3.5.1 2018-08-13 local
ddalpha 1.3.4 2018-06-23 cran (@1.3.4)
DEoptimR 1.0-8 2016-11-19 CRAN (R 3.5.0)
devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
digest 0.6.17 2018-09-12 CRAN (R 3.5.1)
dimRed 0.1.0 2017-05-04 cran (@0.1.0)
dismo 1.1-4 2017-01-09 CRAN (R 3.5.0)
docopt 0.6 2018-08-03 CRAN (R 3.5.1)
dplyr 0.7.6 2018-06-29 CRAN (R 3.5.0)
DRR 0.0.3 2018-01-06 cran (@0.0.3)
fastmatch 1.1-0 2017-01-28 CRAN (R 3.5.0)
geoaxe 0.1.0 2016-02-19 CRAN (R 3.5.0)
geometry 0.3-6 2015-09-09 cran (@0.3-6)
ggplot2 3.0.0 2018-07-03 CRAN (R 3.5.0)
glue 1.3.0 2018-07-17 CRAN (R 3.5.1)
gower 0.1.2 2017-02-23 cran (@0.1.2)
graphics * 3.5.1 2018-08-13 local
grDevices * 3.5.1 2018-08-13 local
grid 3.5.1 2018-08-13 local
gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
httpcode 0.2.0 2016-11-14 cran (@0.2.0)
httr 1.3.1 2017-08-20 CRAN (R 3.5.0)
ipred 0.9-7 2018-08-14 cran (@0.9-7)
jsonlite 1.5 2017-06-01 CRAN (R 3.5.0)
kernlab 0.9-27 2018-08-10 CRAN (R 3.5.0)
lattice 0.20-35 2017-03-25 CRAN (R 3.5.1)
lava 1.6.3 2018-08-10 cran (@1.6.3)
lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
lubridate 1.7.4 2018-04-11 CRAN (R 3.5.0)
magic 1.5-9 2018-09-17 CRAN (R 3.5.1)
magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
MASS 7.3-50 2018-04-30 CRAN (R 3.5.1)
Matrix 1.2-14 2018-04-13 CRAN (R 3.5.1)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
methods * 3.5.1 2018-08-13 local
mlr 2.13 2018-08-28 CRAN (R 3.5.0)
munsell 0.5.0 2018-06-12 CRAN (R 3.5.0)
nlme 3.1-137 2018-04-07 CRAN (R 3.5.1)
nnet 7.3-12 2016-02-02 CRAN (R 3.5.1)
oai 0.2.2.9315 2018-05-31 local (ropensci/oai@NA)
parallel 3.5.1 2018-08-13 local
parallelMap 1.3 2015-06-10 cran (@1.3)
ParamHelpers 1.11 2018-06-25 cran (@1.11)
pillar 1.3.0 2018-07-14 CRAN (R 3.5.0)
pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.1)
pls 2.7-0 2018-08-21 CRAN (R 3.5.1)
plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
prodlim 2018.04.18 2018-04-18 cran (@2018.04)
purrr 0.2.5 2018-05-29 CRAN (R 3.5.0)
qlcMatrix 0.9.7 2018-04-20 CRAN (R 3.5.0)
R6 2.2.2 2017-06-17 CRAN (R 3.5.0)
raster 2.6-7 2017-11-13 CRAN (R 3.5.0)
Rcpp 0.12.18 2018-07-23 CRAN (R 3.5.1)
RcppRoll 0.3.0 2018-06-05 cran (@0.3.0)
recipes 0.1.3 2018-06-16 cran (@0.1.3)
rgbif 1.0.2.9421 2018-09-24 local (ropensci/rgbif@43cf71c)
rgdal 1.3-4 2018-08-03 CRAN (R 3.5.0)
rgeos 0.3-28 2018-06-08 CRAN (R 3.5.0)
rlang 0.2.2 2018-08-16 CRAN (R 3.5.0)
robustbase 0.93-2 2018-07-27 CRAN (R 3.5.0)
rpart 4.1-13 2018-02-23 CRAN (R 3.5.1)
rtichoke 0.2.1 <NA> local
scales 1.0.0 2018-08-09 CRAN (R 3.5.1)
scrubr 0.1.3.9321 2018-05-08 local (ropensci/scrubr@NA)
sdmbench * 0.1.2 2018-09-24 local (boyanangelov/sdmbench@cb00187)
sfsmisc 1.1-2 2018-03-05 cran (@1.1-2)
slam 0.1-43 2018-04-23 CRAN (R 3.5.0)
sp 1.3-1 2018-06-05 CRAN (R 3.5.0)
sparsesvd 0.1-4 2018-02-15 CRAN (R 3.5.0)
splines 3.5.1 2018-08-13 local
stats * 3.5.1 2018-08-13 local
stringi 1.2.4 2018-07-20 CRAN (R 3.5.0)
stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
survival 2.42-6 2018-07-13 CRAN (R 3.5.1)
tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
tidyr 0.8.1 2018-05-18 CRAN (R 3.5.0)
tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.0)
timeDate 3043.102 2018-02-21 cran (@3043.10)
tools 3.5.1 2018-08-13 local
triebeard 0.3.0 2016-08-04 CRAN (R 3.5.0)
urltools 1.7.1 2018-08-03 CRAN (R 3.5.1)
utils * 3.5.1 2018-08-13 local
whisker 0.3-2 2013-04-28 CRAN (R 3.5.0)
withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
xml2 1.2.0 2018-01-24 CRAN (R 3.5.0)
devtools::spell_check()
flags a few spelling mistakes in the documentation, e.g.:
WORD | FOUND IN |
---|---|
acessed | benchmark_sdm.Rd:20 |
inbalanced | benchmark_sdm.Rd:17 |
leanring | train_dl.Rd:10 |
occurence | benchmark_sdm.Rd:11, customPredictFun.Rd:12, evaluate_dl.Rd:12 |
occurences | partition_data.Rd:12 |
partitionined | partition_data.Rd:19 |
vlaue | get_benchmarking_data.Rd:18 |
I used the goodpractice package to run some automated code checks of sdmbench. Below are some super minor style/robustness issues it flagged that you could change if you wanted to.
On two lines (here and here) you used =
for assignment, instead of <-
which you used everywhere else.
On quite a few lines of the shiny server file there are more than 80 characters per line, which makes it a bit difficult to read (particularly here). It would be worth reformatting that code to have shorter lines
On this line of the shiny server, you used the code pattern: 1:length(x)
, but seq_along(x)
is (very mildly) preferable (in general), since in the case x
has length 0 (e.g. an empty list), 1:length(x)
returns c(1L, 0L)
, but seq_along
returns integer(0)
.
That's all it found though, and these are very minor, which is great! I run those checks on packages quite regularly, and the reports are rarely as short as that.
from JOSS review:
I couldn't install while also building the vignette
devtools::install_github("boyanangelov/sdmbench", build_vignettes=TRUE, force = TRUE)
* checking for file ‘/private/var/folders/fc/n7g_vrvn0sx_st0p8lxb3ts40000gn/T/Rtmph6H98S/devtools35453b08bf41/boyanangelov-sdmbench-f34ed13/DESCRIPTION’ ... OK
* preparing ‘sdmbench’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
sh: /usr/local/bin/virtualenv: /usr/local/opt/python/bin/python3.6: bad interpreter: No such file or directory
Quitting from lines 12-18 (sdmbench_vignette.Rmd)
Error: processing vignette 'sdmbench_vignette.Rmd' failed with diagnostics:
Error 126 occurred creating virtualenv at ~/.virtualenvs/r-tensorflow
Execution halted
Installation failed: Command failed (1)
I guess a python virtualenv problem, but not sure how to solve. Reason I bring up is that the vignette here https://boyanangelov.com/materials/sdmbench_vignette.html has examples that aren't in line with the version of the package at v0.1.2
, e.g.,
benchmarking_data <- get_benchmarking_data("Ornithorhynchus anatinus", limit = 1200, bioclim_resolution = 10)
#> Error in get_benchmarking_data("Ornithorhynchus anatinus", limit = 1200, :
#> unused argument (bioclim_resolution = 10)
another example: had to change benchmarking_data$raster_data$bioclim_data$bio1
to benchmarking_data$raster_data$climate_variables$bio1
get the first plot eg to work
The following functions are mentioned in the package-level helpfile (note the links don't work), so will probably be used regularly, but they have only very minimal (or no) examples:
run_sdmbench()
get_benchmarking_data()
partition_data()
benchmark_sdm()
get_best_model_results()
plot_sdm_map()
It would be helpful to have more involved examples, with comments, showing the effects of the different arguments, and how you might use them in different situations. Scott's spocc
package is again a good example of this, the major functions all have multiple examples.
Your examples seem to have very long lines too, so it's hard to read them. I'd recommend sticking to an 80-characters-per-line limit for these too.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.