funecology / fundiversity Goto Github PK
View Code? Open in Web Editor NEW📦 R package to compute functional diversity indices efficiently
Home Page: https://funecology.github.io/fundiversity/
License: GNU General Public License v3.0
📦 R package to compute functional diversity indices efficiently
Home Page: https://funecology.github.io/fundiversity/
License: GNU General Public License v3.0
For now the vignettes on the CRAN and using vignette(package = "fundiversity")
or browseVignettes(package = "fundiversity")
are ordered based on alphabetical order:
This may not be a desirable order as it would make more sense to have the introduction vignette first, then the parallelization and other vignettes.
Maybe we should rename the files to be "fundiversity", "fundiversity-2", etc. as is done by future
?
My preferred order would be:
Maybe we should also rename the introduction vignette and parallelization vignette to have more explicit names.
I have not tested anything yet.
Would need to:
Downside:
Using the info currently in the wiki
I'm getting a lot of question of confused users regarding the use of fundiversity with datasets at the individual level because we keep referring to "site-species" matrix and "species-trait" matrix.
Even though we've written it in the paper and in the introduction vignette, it seems that we need to specify it elsewhere with a well chosen example (maybe in another dedicated vignette?).
As ecologists may be interested in the intersection of different convex hulls fundiversity could provide a wrapper around geometry::intersectn()
that outputs the volume of the intersection between two volumes.
The input of this wrapper would be similar to fd_fric()
(site-species matrix, trait matrix) and the output would be a distance matrix or rather a tidy data.frame with the first two columns giving the ids of the two considered sites, the third column would give the volume of the intersection.
Computationnally this could be quite intensive but should parallelize without issue (see #13).
geometry::convhulln()
used in fd_fric()
has some limitation in data size. In order to avoid wasting computer time we should probably issue a warning for large size datasets (many species × many traits) to say that the computation may fail in this case.
Since R 4.1.3, we can run tests without packages from Suggests
by setting the _R_CHECK_DEPENDS_ONLY_TESTS_
env var to true
.
Related issue: #4
Prepare for release:
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
cran-comments.md
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
Related to #27
Even before envisionning a manuscript, add a CITATION file that mention the version of the package used so that people can start using it beforne any scientific publication.
We should add in the CITATION
file, as well as the description a link to the published paper
Currently, if the user has memoise on their computer, they will automatically get the memoised version of fd_chull()
and fd_chull_intersect()
, without any way to opt-out.
I think it would be good to offer the option to opt out, if the user wishes to do so for whatever reason.
Probably discard them but should we display a warning before doing so?
When a value is missing, should we discard the whole row or just ignore this value? What are the implications on the resulting indices (beyond the very practical fact that few datasets will be complete for all traits and species)?
Just a reminder to do this so that the package is archived.
FDiv is computed on the vertices of the convex hull formed by the data points, not on the entirety of the points.
This means we have to compute the convex hull even in the FDiv.
This might cause quite a performance hit so it might be interesting to think about caching the result of convhulln()
to avoid unnecessary re-computation when getting both FRic and FDiv. This is likely the role of verts.txt
in FD.
Here is an issue that aims to be a list of specific errors due to Qhull to better manage #38 in the future.
I obtained the following error in Qhull:
Error: Received error code 5 from qhull. Qhull error:
QH6271 qhull precision error (qh_check_dupridge): wide merge (1122660224434 times wider) due to duplicate ridge with nearly coincident points (0.027) between f102531 and f102515, merge dist 0.093, while processing p2261
- Ignore error with option 'Q12'
- To be fixed in a later version of Qhull
- Vertex distance 0.027 is greater than 100 times maximum distance 8.3e-14
Please report to [email protected] with steps to reproduce and all output
ERRONEOUS FACET:- f102531
- flags: top new seen mergehorizon dupridge mergeridge2
- normal: 0.05806 0.1744 -0.5788 0.7752 -0.174
- offset: 0.1009689
- vertices: p2261(v1312) p2266(v1046) p2272(v799) p2262(v289) p2265(v160)
- neighboring facets: f12145 f12159 f102533 f102515 f102532
- ridges:
- r74938 tested
vertices: p2266(v1046) p2272(v799) p2262(v289) p2265(v160)
between f102531 and f12145- r97773
vertices: p2261(v13
When running the inst/new_benchmark.R
script with the following parameter: 50 sites, 5 traits, 200 species, code at c9d85d8 commit.
It's the second person that contacts me because of the non-continuous trait error message:
Lines 65 to 68 in 0a58ab6
Maybe we should point people to multivariate analyses to get back continuous traits. Like adding a line "If you want to use non-numeric traits with fundiversity
, you have to transform them to obtain numerical traits beforehand (e.g., through a PCoA or similar techniques)"
It would probably be too specific, but at least would point user to ways to overcome the issue themselves.
Most packages that compute FRic propose a version where it is standardized by regional maximum. We could propose this in fd_fric()
with a stand = TRUE
argument
For now, the title of the package is "Easy Computation of Alpha Functional Diversity Indices". However, given that we include fd_fric_intersect()
and could include other beta-diversity indices in the future, we could drop the word "Alpha" altogether.
I'm opening the issue to remind myself of doing so and check that we got rid of "Alpha" across CITATION files (and zenodo, etc.)
We officially recommended not to use memoise
and future
at the same time in the fundiversity
manuscript.
However, there maybe ways to get both.
I will collect here possibilities to work with both:
plumber
, future
, and memoise
together: https://stackoverflow.com/q/70805314future
repo: HenrikBengtsson/future#506future
) suggesting to use R.cache
file caching system instead: https://stackoverflow.com/a/48102804memoise
repo: r-lib/memoise#29It seems for now that using memoise
it's not straightforward to parallelize.
Maybe we should be extra-careful and add a warning when loading the package with memoisation that it shouldn't be used with parallelization.
We should also add this in:
When no site-species matrix is provided, the site is called "s1" should it be called something else?
Also the row.names is then "s1" this should probably be set to NULL to avoid confusion:
fundiversity::fd_fric(fundiversity::traits_birds)
#> site FRic
#> s1 s1 230967.7
Created on 2020-12-11 by the reprex package (v0.3.0)
I'm pretty sure that the call to apply()
in fd_raoq()
could be simplified through matrix algebra:
https://github.com/Bisaloo/fundiversity/blob/1e175ba88531e08242ab9c69b9e470b4cee9e759/R/fd_raoq.R#L60-L62
A naive implementation to compute FRic is to compute the convex hull for each row separately.
However, this problem can be simplified since each row is a subset of the entire species list. The various convex hulls are not computed on completely independent points but on a subset / the union / etc. of points for which we previously computed the convex hull.
As we are reimplementing indices, we should probably check that the computation are numerically correct through other packages.
From the performance vignette #8 we can see that some indices can take while to compute with big matrices.
One interesting (but certainly expansive development-wise) feature would be to allow for automatic parallel computation of functional diversity indices across sites through the split of the site-species matrix in chunks.
The implementation could use the future
package.
When using huge site-species matrices, it is sometimes more memory efficient to use sparse matrices.
We could implement the computation of indices with sparse matrices through the Matrix
package (bundled with base R).
Don't know how this bug slip in the cracks but here it is:
fundiversity::fd_fdiv(fundiversity::traits_plants, fundiversity::site_sp_plants)
#> Error in traits[names(sub_site), , drop = FALSE]: indice hors limites
sum(fundiversity::site_sp_plants[10,])
#> [1] 0
fundiversity::fd_fdiv(fundiversity::traits_plants,
fundiversity::site_sp_plants[1:9,])
#> site FDiv
#> 1 elev_250 0.6341541
#> 2 elev_500 0.6543063
#> 3 elev_1000 0.7111319
#> 4 elev_1500 0.7546447
#> 5 elev_2000 0.7437969
#> 6 elev_2500 0.7620128
#> 7 elev_3000 0.6939043
#> 8 elev_3500 0.6414894
#> 9 elev_3750 0.5879492
Created on 2021-08-03 by the reprex package (v2.0.0)
This should mean that an early check in the individual fd_div() computation should be if all abundance are 0.
We should probably test the behavior of all functions when using non-quantitative traits to make the functions warn the user and not silently compute actual things.
See for example:
data("aviurba", package = "ade4")
fundiversity::fd_fric(aviurba$traits)
#> Warning in storage.mode(p) <- "double": NAs introduits lors de la conversion
#> automatique
#> Error in geometry::convhulln(traits, "FA"): The first argument should not contain any NAs
Created on 2021-08-05 by the reprex package (v2.0.0)
We should cite the dataset used as example in the package as well as document it:
Nowak, Larissa et al. (2019), Data from: Projecting consequences of global warming for the functional diversity of fleshy-fruited plants and frugivorous birds along a tropical elevational gradient, Dryad, Dataset, https://doi.org/10.5061/dryad.c0n737b
Just dropping it here :)
It would be really cool to have a hex logo!
We should brainstorm on how to represent functional diversity the best? Possibly two convex hulls of points and their intersection highlighted?
I often get emailed about non-continuous trait data that fundiversity doesn't handle.
I wonder if we should write a full vignette to show the general workflow, an include this in the error message for non-continuous trait data.
The workflow would go as follow (with a worked through example):
I don't think it would be necessary to add any feature to fundiversity to deal with this case, as it's covered extensively by other tools (especially mFD), but maybe having a long-form documentation could be helpful to point users to.
95e5823 fix the case when geometry::convhulln() returns an error if there are less points than dimensions.
We should also handle the case where the dimensionality is artificially reduced because several points have the same coordinates.
Sometimes when doing thousand of computation of FRic with fd_fric()
qhull
errors because of edge cases. This cancels the computation of all simulation and there is currently no option for the user to proceed anyway or at least to output an NA
volume. Some of these errors are real errors while some are warnings from quickhull that are transformed into errors when transferred to R.
The solution would be to see with the geometry
maintainers to see how to pass warning and errors to R. But it's probably not as easy.
The error message from qhull
indicates that the Pp
option can solve some of these issues.
In order to show the interest of having another package to compute functional diversity indices, we should also make a vignette that shows performance comparison between fundiversity
and related packages.
Reprex
library("fundiversity")
fd_fdis(traits_birds, as.data.frame(site_sp_birds))
#> Error in sp_com %*% traits: nécessite des arguments numériques/complexes matrice/vecteur
Created on 2023-01-30 with reprex v2.0.2
The solution would be to wrap sp_com
in fd_fdis()
into as.matrix()
before performing matrix multiplication.
I ran into an issue using fd_dis()
when my site x species matrix doesn't have row names. For my purposes, I cannot force my site x species matrices to have row names as they are contained within a simulation matrix (class simmat
). I expect this could be an issue for future users as well.
An easy fix would be to either just force "Site" in the return dataframe from the fd_dis()
function to be 1:nrow
:
data.frame(site = 1:nrow(sp_com), FDis = fdis_site, row.names = NULL)
or use an if
statement
if(is.null(rownames(sp_com))) { rownames(sp_com) <- 1:nrow(sp_com) }
data.frame(site = rownames(sp_com), FDis = fdis_site, row.names = NULL)
We use this example in the introduction vignette when subsetting the number of species.
However, it returns NA because they are not enough species present for FRic to be computed.
library("fundiversity")
fd_fric(traits_birds, site_sp_birds[, 1:5])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#> site FRic
#> 1 elev_250 NA
#> 2 elev_500 NA
#> 3 elev_1000 NA
#> 4 elev_1500 NA
#> 5 elev_2000 NA
#> 6 elev_2500 NA
#> 7 elev_3000 NA
#> 8 elev_3500 NA
Created on 2022-10-21 with reprex v2.0.2
In my opinion, we should have a warning to make sure the user is aware the reason why FRic are NAs and also change the example in the introduction vignette.
Like with the following:
library("fundiversity")
fd_fric(traits_birds, site_sp_birds[, 1:60])
#> Differing number of species between trait dataset and site-species matrix
#> Taking subset of species
#> site FRic
#> 1 elev_250 18963.31311
#> 2 elev_500 18963.31311
#> 3 elev_1000 38586.75398
#> 4 elev_1500 38114.26828
#> 5 elev_2000 5888.93690
#> 6 elev_2500 5256.70628
#> 7 elev_3000 2710.81803
#> 8 elev_3500 88.11684
Created on 2022-10-21 with reprex v2.0.2
Reprex:
library(fundiversity)
data(traits_birds)
rownames(traits_birds) <- NULL
fd_fdiv(traits_birds)
#> site FDiv
#> 1 s1 NaN
Created on 2022-08-05 by the reprex package (v2.0.1.9000)
Row names should not be mandatory when sp_com
is not provided.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.