theoreticalecology / s-jsdm Goto Github PK

View Code? Open in Web Editor NEW

63.0 7.0 14.0 46.69 MB

Scalable joint species distribution modeling

Home Page: https://theoreticalecology.github.io/s-jSDM/

License: GNU General Public License v3.0

R 65.21% Python 23.21% Jupyter Notebook 11.59%

species-distribution-modelling species-interactions machine-learning deep-learning gpu-acceleration

s-jsdm's Issues

Error in py_call_impl(callable, dots$args, dots$keywords) : can't convert np.ndarray of type numpy.object_

Dear Max,

I am sorry to bother you with a new issue when using sjSDM function with device = "gpu"...

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

Detailed traceback: 
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 207, in fit
    dataLoader = self._get_DataLoader(X, Y, SP, RE, batch_size, True, parallel)
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 164, in _get_DataLoader
    torch.tensor(Y, dtype=torch.float32, device=torch.device('cpu')))

column sums for importance?

would be nice to implement column sum for print(imp)

Improve documentation

Vignettes should answer/cover (derived from user question):

how to evaluate model fit?
which learning rate should I choose?
what about model selection?

Anything to add? @florianhartig

Model_sjSDM object has no attribute 'set_weights'

I get the following error :

pred = predict(model, test_X)
  # Error in py_get_attr_impl(x, name, silent) : 
  # AttributeError: 'Model_sjSDM' object has no attribute 'set_weights'

Indeed model has an attribute weights, but not an attribute set_weights.

nn.Sequential to DNN()

It should be possible to pass neural network objects (torch.nn.Sequentia(...)) directly to the DNN() config, e.g. it could be used to build custom NNs such as CNNs or pre-trained NNs to sjSDM

install issues

Hi Max, as I said, on my new system, it first didn't work at all (conda not found). I installed Anaconda with python 3.7

I now re-installed reticulate, and now it finds the python system (so, do we maybe have to increase the minimum version for reticulate? Unfortunately, not sure which version I had before).

However, now I get

PackagesNotFoundError: The following packages are not available from current channels:

  - torch
  - torchvision

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I have a call now, will try to solve this later, just to let you know.

Python install

Hi, Max, I just removed the link that didn't work in fc6fb9b

Does the rest of the pip stuff work (e.g. pip install sjSDM_py), or was this changed now that the code is in the package?

multiple gpu when running 'sjSDM'

Hi,
We figured out that there's no argument 'n_gpu' in the function 'sjSDM', but only in 'sjSDM_cv'. Is it possible to use multiple gpus to run 'sjSDM' function at all? If so, is it implemented yet in 'sjSDM' function?
Thanks a lot!

Regularization - model evaluation

We should at least implement a CV function

linear() doesn't accept formula as object

Doesn't seem possible to add formula as object to sjDM() with linear()

set.seed(42)
# simulate data
community <- simulate_SDM(sites = 100, species = 10, env = 3)
Env <- community$env_weights

Env <- as.data.frame(Env)

# make formula
form1 <- as.formula(~V1+V2+V3)
form1

Env.lin1 <- linear(data = Env, formula = form1) # this throws an error: (Error: object of type 'symbol' is not subsettable)

Env.lin2 <- linear(data = Env, formula = ~V1+V2+V3) # this is OK

AIC function

I have implemented a logLik function in 80c906b ... question is if we should also implement an AIC ... I would tend towards not, because of the problem of counting df.

CPU dtype=

Migrating sjSDM code to AWS

Hello,

I'm trying to run an R script that uses 'sjSDM' to do model training on AWS SageMaker. I'm trying to run the code in a Docker container, but the installation procedure fails to install PyTorch and all the other sjSDM dependencies. I'm trying to install sjSDM and dependecies in a Dockerfile using RUN R -e "remotes::install_github('https://github.com/TheoreticalEcology/s-jSDM', subdir = 'sjSDM', dep=FALSE)" and RUN R -e "sjSDM::install_sjSDM(version = 'gpu')".

I wanted to point out this issue for anyone who tries to migrate sjSDM code to AWS, but I would also like to solve this. Thanks.

p-values on env components

As discussed. If faster, I would calculate the hessian per species, as one can assume that env estimates will be approximately independent across species.

Importance plots with missing component (env, BI, space)

If there is no spatial component, the importance plot produces a barplot instead of the ternary plot. I wonder if we should just keep the ternary plot, and set space (or any other component) to zero if it is absent?

Clustered heat map for species associations

Should we have a plot like this https://pypi.org/project/sjSDM-py/

sjSDM::install_sjSDM(version = "cpu") seems to want pytorch

> sjSDM::install_sjSDM(version = "cpu")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - torchvision
  - torch

Current channels:

  - https://conda.anaconda.org/conda-forge/osx-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.



Installation failed... Try to install manually PyTorch (install instructions: https://github.com/TheoreticalEcology/s-jSDM
If the installation still fails, please report the following error on https://github.com/TheoreticalEcology/s-jSDM/issues
one or more Python packages failed to install [error code 1]

multiprocessing on the GPU

just stumbled across (for CV and tuning): https://discuss.pytorch.org/t/training-parallel-multiple-models/35238/4
for sometime in the future:

real multiprocessing instead of multiprocessing via R slaves (memory, efficiency, etc.)
native, in python

sjSDM_cv() Error in unserialize(node$con) : error reading from connection

I get a weird error running sjSDM_cv(). I'm using R 4.0.0. my students are running R 3.6.3 and are able to run the test cod e and also on their dataset.

so i'm wondering if it's an R 4 thing.

# sjSDM_cv()
# simulate sparse community:
com = simulate_SDM(env = 5L, species = 25L, sites = 100L, sparse = 0.5)

# tune regularization:
tune_results = sjSDM_cv(Y = com$response,
                        env = com$env_weights, 
                        tune = "random", # random steps in tune-paramter space
                        CV = 3L, # 3-fold cross validation
                        tune_steps = 25L,
                        alpha_cov = seq(0, 1, 0.1),
                        alpha_coef = seq(0, 1, 0.1),
                        lambda_cov = seq(0, 0.1, 0.001), 
                        lambda_coef = seq(0, 0.1, 0.001),
                        n_cores = 2L, # small models can be also run in parallel on the GPU
                        iter = 2L # we can pass arguments to sjSDM via ...
                        )

Error in unserialize(node$con) : error reading from connection

NumPy array is not writeable, and PyTorch does not support non-writeable tensors

Dear colleagues,

I have the following issue when attempting to run s-jSDM with the R package:

..\torch\csrc\utils\tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

I don't figure out what is the problem here, and how to resolve it.
Do you have any idea?

Best wishes,

François

install to google colab / kaggle

Hey, I wanna install sjSDM to google colab. But I get this error message. I don't know if it's easy to fix but better to check with you.
Error: Failed to install 'sjSDM' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpJEcT8x/file436a79464/sjSDM_0.1.3.9000.tar.gz’ had non-zero exit status
Traceback:

devtools::install_github("https://github.com/TheoreticalEcology/s-jSDM",
. subdir = "sjSDM", auth_token = "xxxxxxx")
pkgbuild::with_build_tools({
. ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action",
. rlang::warn))
. {
. remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir,
. auth_token = auth_token, host = host)
. install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts,
. build_manual = build_manual, build_vignettes = build_vignettes,
. repos = repos, type = type, ...)
. }
. }, required = FALSE)
install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual,
. build_vignettes = build_vignettes, repos = repos, type = type,
. ...)
tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
. stop(remote_install_error(remotes[[i]], e))
. })
tryCatchList(expr, classes, parentenv, handlers)
tryCatchOne(expr, names, parentenv, handlers[[1L]])
value[3L]

readme suggestions

change to simulate_SDM(sites = 100, species = 50, env = 5)
or change to matrix(rnorm(800), 100, 2)

also, i find ternary diagrams easier to read if the three elements are on the axis (e.g. environment at the top vertex, biotic bottom left, spatial bottom right)

on.load() checks

I think we check for pytorch, but not for python / conda, right? I would add such a check.

As said, maybe best to get a general diagnostics function, which checks the system for requirements, and provides a comprehensive error message, together with the note to post this in GitHub in case the problem persists?

Install "private" conda version?

Hi Max, just a follow-up to #23 - Now at least it works from a clean (= conda-free) computer. One thing that I am wondering - what happens if a user already has a conda version on their computer? At the moment, you are trying to use it, right?

Wouldn't it be safer to always install a dedicated "private" miniconda version for sjSDM?

test can't run in 0.1.8

hi there,
when i update to 0.1.8 and run the test model, it throws an error.
what happen with my mac?

Importance, R^2 and p-values for single env predictors.

Hi,

maybe I just havent found it, but is there a way to see the importance, R^2 and p-value of a single environmental predictor?
e.g.
model <- sjSDM(Y = Occ, env = linear(data = Env, formula = ~X1+X2+X3), spatial = linear(data = SP, formula = ~0+X1:X2), se = TRUE, family=binomial("probit"), sampling = 100L, device = 'gpu' )

Where can I see the contribution of X3 in the model? I I got your outputs correct all predictors from the env argument are summed into A in the anova() and under env in the importance() output?!

Best regards,
Julian

Error in py_call_impl: not enough values to unpack

I have the following error with sjSDM function,

Error in py_call_impl(callable, dots$args, dots$keywords) : ValueError: not enough values to unpack (expected 2, got 1)

Any idea on what could be the cause?

Better error message for missing pytorch installation?

Without installing, I got this error message when running the sjSDM

Error in sjSDM(X = com$env_weights, Y = com$response, iter = 10L) : 
  object 'fa' not found

I assume that is because of the missing pytorch install. Given that we can anticipate that a user would forget to do this, maybe provide a better error message?

move troubleshooting help to missing_installation?

The large section about trouble shooting in sjSDM is a bit distracting. Maybe move this to missing_installation, and throw an error message that says look for ?missing_installation

error: ModuleNotFoundError: No module named 'pyro'

Dear Max,

I have the following error when installing the latest version of the package:

.onLoad failed in loadNamespace() for 'sjSDM', details: call: py_module_import(module, convert = convert) error: ModuleNotFoundError: No module named 'pyro'

However, pyro has been installed on my Windows system with Anaconda.
Maybe there is a path to change somewhere to allow proper installation.

Best wishes,

François

Should some functions be better internal

There are a number of functions, such as

is_sjSDM_py_available

for which I wondered if they should be set as internal in Roxygen

summary how to display the covariance/correlation matrix

We discussed it a few days ago but I only remembered it AFTER the latest PR. So it will go into the next PR:

Show the full matrix instead of the L
Question: covariance or correlation matrix?

Phylogeny

I got a question from a user how to include a phylogenetic distance matrix. At some point we have to finally tackle this problem.
At the moment I can think of two options:
a) phylogenetic distance matrix as a kind of species-species "prior" on the env weights
b) treat phylogenetic eigenvectors as traits and fit a fourth-corner-model (as they do in the gllvm pkg: see )

Species / site / predictor names

Systematically support names in outputs / plots?

Error message in sjSDM if pytorch not available

Hi, I just tried this out, if you run

com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
model = sjSDM(Y = com$response,env = com$env_weights, iter = 10L)

you without pytorch (luckily, I can do this, as I still haven't updated), I get

 Error in reticulate::py_is_null_xptr(fa) : object 'fa' not found 
3.
reticulate::py_is_null_xptr(fa) at utils.R#84
2.
check_module() at sjSDM.R#58
1.
sjSDM(Y = com$response, env = com$env_weights, iter = 10L)

whereas a good error message would say "pytorch not installed". I would just do the startup check also in sjSDM to check if the requirements are there.

Register importance and possibly other functions as S3 classes

Just running through Pedro's example, while having run RF before, I noted that if you load the RandomForest package before, this will create a problem

> imp = importance(model)
Error in UseMethod("importance") : 
  no applicable method for 'importance' applied to an object of class "c('sjSDM', 'linear')"

because RF registers importance as S3 class. Because of this, I think it would be safer to register all reasonably general sounding functions as S3 classes, or else use names such as sjSDM_importance (but I would prefer the former)

TypeError: type torch.cuda.FloatTensor not available

With the latest version, I have the new following error :

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: type torch.cuda.FloatTensor not available. Torch not compiled with CUDA enabled.

Detailed traceback: 
  File "C:\Program Files\R\R-3.6.2\library\sjSDM\python\sjSDM_py\model_new.py", line 171, in build
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
  File "C:\ProgramData\Anaconda3\envs\r-reticulate\lib\site-packages\torch\__init__.py", line 206, in set_default_tensor_type
    _C._set_default_tensor_type(t)

Does the new version require reinstalling Torch or Cuda?

Memory problems for importance() with large covariances

Question from a user (redacted for conciseness and privacy):

... we have been working on analyzing an absolutely enormous XXX dataset with s-jSDM.

Good news: given enough processors and memory, s-jSDM does handle datasets working in the tens of thousands of species pretty well.

However, I have run into a subsequent memory problem when attempting to parse the importance from the model output. I’ve looked at the code for the function and I’m pretty sure it stems from the matrix multiplication expression involving the species covariance matrix (unsurprising, given its size).

So I was wondering: have either of you run any tests on resource requirements for the importance function to see how they scale with the number of species?

sjSDM_model - hide or push?

At the moment, sjSDM_model is only / mostly? used internally to build the model. I wonder - is it distracting to have this open, and should we rather hide it? If we're not hiding it, I would add it a bit more prominently to the help and link it to other functions.

Line breaks in help

Any idea why I get these line breaks in the help files?

DNN support

Implementation of functional DNN api (same style as in rstudio-keras)

installation problems

I'm having trouble installing on macOS, which is weird because i previously installed without problem (and then it stopped working). I removed both conda env folders (r-sjSDM and sjSDM_env) before installing, and i only have miniconda2.

conda create --name sjSDM_env python=3.7`
conda activate sjSDM_env`
conda install pytorch torchvision cpuonly -c pytorch # cpu
devtools::install_github("https://github.com/TheoreticalEcology/s-jSDM", subdir = "sjSDM", build_vignettes = TRUE, build_manual = TRUE)

library(sjSDM)
install_sjSDM(version = "cpu", conda_python_version = "3.7")

I get this error:

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

torchvision

torch

Current channels:

https://conda.anaconda.org/conda-forge/osx-64

https://conda.anaconda.org/conda-forge/noarch

https://repo.anaconda.com/pkgs/main/osx-64

https://repo.anaconda.com/pkgs/main/noarch

https://repo.anaconda.com/pkgs/r/osx-64

https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.

Installation failed... Try to install manually PyTorch (install instructions: https://github.com/TheoreticalEcology/s-jSDM
If the installation still fails, please report the following error on https://github.com/TheoreticalEcology/s-jSDM/issues
one or more Python packages failed to install [error code 1]

Traits - fourth corner model?

We could provide the option to include traits following the fourth corner model from Brown et al., 2015 (The GLLVM model does it this way).

Setting a multivariate penality (prior) on the environmental predictors would be another option (I think, Hmsc does it this way), but I think the former would be preferable because any type of penalty would interfere with the p-values.

Various issues / questions about the installation

For me,

install_sjSDM()
install_sjSDM(method = "auto")

produces error

Error in install_sjSDM() : object 'package' not found

Also, https://github.com/TheoreticalEcology/s-jSDM#install-instructions doesn't seem to be up to date

install diagnostic

Hi Max, I wonder if we should merge install diagnostic with installation_help. Seems to me logical to have both functions together.

Also, possibly, I wonder if check_dependencies or so would be a better name for the function?

Space

What shall we do about spatial predictors?

a) No api changes but provide an example with additional predictors in the env matrix
b) provide an extra argument in sjSDM for spatial predictors

dependency installation issue in 0.1.8 - missing madgrad

I reinstalled s-jSDM this morning to get the importance update. Loading the package gave this readout:

── Attaching sjSDM ──────────────────────────────────────────────────── 0.1.8 ──
✔ torch
✔ torch_optimizer
✔ pyro
✖ madgrad

Torch or other dependencies not found:
1. Use install_sjSDM() to install Pytorch and conda automatically
2. Installation trouble shooting guide: ?installation_help
3. If 1) and 2) did not help, please create an issue on https://github.com/TheoreticalEcology/s-jSDM/issues (see ?install_diagnostic)

I tried install_sjSDM() with version = "cpu" which successfully added madgrad, but removed pyro. I also tried version = c("cpu","gpu") and version = "gpu" but that didn't change anything. install_sjSDM() says that all requirements are satisfied, including pyro, but the package still won't load successfully.

CPU dtype="float64" error

A user encountered overflow problems and the use of doubles should help here, but:

> com = simulate_SDM(env = 3L, species = 5L, sites = 100L)
> ## fit model:
> model = sjSDM(Y = com$response,env = com$env_weights, iter = 2L, dtype = "float64") 
Iter: 0/2   0%|          | [00:00, ?it/s]
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  RuntimeError: expected scalar type Float but found Double Timing stopped at: 0.018 0 0.019

Install error sh: line 1: 79672 Killed: 9

Some users get error messages such as the following during install

sh: line 1: 79672 Killed: 9               R_TESTS= '/Library/Frameworks/R.framework/Resources/bin/R' --no-save --no-restore --no-echo 2>&1 < '/var/folders/m_/zb7c8p_13k59p3zrpw84c4hm0000gq/T//Rtmpc0CxFQ/file13725537e4ec1'
ERROR: loading failed
* removing ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library/sjSDM’
Warning in install.packages :
  installation of package ‘/Users/pedro/Desktop/sjSDM_0.0.6.9000.tar.gz’ had non-zero exit status

theoreticalecology / s-jsdm Goto Github PK

s-jsdm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs