noaa-nwfsc / zoid Goto Github PK

View Code? Open in Web Editor NEW

8.0 9.0 2.0 94.22 MB

Trinomial mixture models in Stan, for fitting to compositional data with 0s

Home Page: https://noaa-nwfsc.github.io/zoid/

License: Other

R 15.28% C++ 80.40% Stan 4.26% CSS 0.05%

mixture-models r-package stan cran nwfsc-cb

zoid's Introduction

ZOID

zoid implements zero-and-one inflated Dirichlet regression (also known as trinomial mixture models) in a Bayesian framework (Stan)

You can view the pkgdown package information here: https://noaa-nwfsc.github.io/zoid/

You can install the development version of the package with:

remotes::install_github("noaa-nwfsc/zoid")

NOAA Disclaimer

This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries

zoid's People

Contributors

Stargazers

Watchers

Forkers

eriqande simeond

zoid's Issues

initialization failed for metabarcoding diet data

I'd like to use zoid to model diet data from DNA metabarcoding (data_matrix) across predator sizes (measured as crab carapace width, in mm; CW_mm in design_matrix). But I am having trouble getting past this initialization error:

fit_3_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix)/1000,
                       overdispersion = TRUE,
                       chains=1,   # just for testing
                       iter=500)   # just for testing

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or re parameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

In my starting data set, I have 58 individuals and 61 different prey species; I removed rare taxa that occurred in fewer than three individuals.

I tried aggregating species by family (58 individuals x 57 classes), class (58 individuals x 26 classes), and phylum (58 individuals x 26 phyla), to reduce the amount of "0" values in the matrix, and the variation between the minimum / maximum read values; but I still end up with the same error.

The last time this happened I was able to divide the data matrix by 100 to make the error go away, but that doesn't seem to be working this time.

This is the distribution of size, in case it's helpful!

issues with random effect + question output get_pars()

Heya,
thanks for your nice package,
I have tried to fit a model with a random effect, but I got an eror.
I also tried to use the example you provided (which is weid, because in both case, with my data, or your example, all row got a value).

set.seed(123)
y <- matrix(runif(99,1,4), ncol=3)
design <- data.frame("fac" = sample(letters[1:5], size=nrow(y), replace=TRUE))
design$fac <- as.factor(design$fac)
fit<-fit_zoid(formula=~(1|fac),design_matrix=design,data_matrix=y,chains=1,iter=100)
Error in na.fail.default(list(`1 | fac` = c(NA, NA, NA, NA, NA, NA, NA,  : 
  missing values in object
In addition: Warning message:
In Ops.factor(1, fac) : ‘|’ not meaningful for factors

I also wanted to know, since I am here, once you are using get_pars(), in the output $betas, how can I recognize my variable?
For example, if I have Y = cbind(var1, var2, var3), when I am looking in the $betas from get_pars function if m = 1, is it my var1 or var2 or var 3? how could we determine the different m? (e.g. which variable is used a reference)

Thank you!

Add random effects

Tweak format to use lmer/glmer syntax and parse to Stan

trouble with installation

I'm excited to try the new release with some data that are grouped by site habitat, but I keep getting a non-zero exit status warning when I use devtools to install zoid from github. It looks like there might be an error in the code for model_dirichregmod? Or maybe I need to try updating another package / R permissions on my desktop first? I'm in R version 4.1.3 (2022-03-10). Thanks for the help!

> devtools::install_github("noaa-nwfsc/zoid")

The downloaded source packages are in
	‘C:\Users\mfisher5\AppData\Local\Temp\RtmpOErnoU\downloaded_packages’
-- R CMD build ------------------------------------------------------------------------------------------------------------------------
v  checking for file 'C:\Users\mfisher5\AppData\Local\Temp\RtmpOErnoU\remotes2ae820826225\noaa-nwfsc-zoid-c82b045/DESCRIPTION' ...
-  preparing 'zoid': (714ms)
v  checking DESCRIPTION meta-information ... 
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'zoid_1.3.1.tar.gz'
   Warning: file 'zoid/configure' did not have execute permissions: corrected
   
* installing *source* package 'zoid' ...
** using staged installation
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model_dirichregmod' at line 7, column 2
  -------------------------------------------------
     5:   int N_covar; // number of covariates in design matrix X
     6:   matrix[N_samples, N_covar] design_X;
     7:   array[N_bins,N_bins-1] int prod_idx;
         ^
     8:   int overdisp; // whether or not to include overdispersion term
  -------------------------------------------------

PARSER EXPECTED: <one of the following:
  a variable declaration, beginning with type,
      (int, real, vector, row_vector, matrix, unit_vector,
       simplex, ordered, positive_ordered,
       corr_matrix, cov_matrix,
       cholesky_corr, cholesky_cov
  or '}' to close variable declarations>
Error in rstan::stanc(file_name, allow_undefined = TRUE, obfuscate_model_name = FALSE,  : 
  failed to parse Stan model 'dirichregmod' due to the above error.
Calls: <Anonymous> -> sapply -> lapply -> FUN -> <Anonymous>
Execution halted
ERROR: configuration failed for package 'zoid'
* removing 'C:/Users/mfisher5/Documents/R/R-4.1.3/library/zoid'
* restoring previous 'C:/Users/mfisher5/Documents/R/R-4.1.3/library/zoid'
Warning message:
In i.p(...) :
  installation of package ‘C:/Users/mfisher5/AppData/Local/Temp/RtmpOErnoU/file2ae8f3c3fa4/zoid_1.3.1.tar.gz’ had non-zero exit status

Rejecting initial value when sampling

In some cases, there's a warning from Stan thrown:

Chain 1: Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity. Chain 1: Stan can't start sampling from this initial value.

and sampling proceeds normally. In other cases, the initial sampling doesn't produce valid estimates and initialization fails:

Chain 1: Initialization between (-2, 2) failed after 100 attempts. Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. [1] "Error : Initialization failed." error occurred during calling the sampler; sampling not done

The issue in these cases is often that sample sizes (e.g. # reads for genetic data) is often very large and/or there are many 0s. The solution here is to manually specify the prior standard deviation of the beta parameters using the prior_sd argument, e.g.

fit <- fit_zoid(formula = y ~ 1, ..., prior_sd = 1)

Sometimes, several alternative values of the prior_sd may need to be explored

Hierarchical version?

This might be too hard to code in a general way, but might we/you make zoid hierarchical?

For example, I have a dataset w 4 discrete temperature treatments (10, 13, 15, and 18 degrees C), with many observations in each; each observation is a set of proportions divided among 4 possible categories. If I treat temperature as a continuous predictor, I'm pretty sure zoid (and, for that matter, a linear model or anything similar) will have the pseudoreplication problem: each observation is treated as independent, when really what I should be doing is asking whether the mean of each treatment is responding to the predictor variable (here, temperature).

If zoid could handle the syntax of stan_glm (or similar), we could do something like formula = ~ treatment + (1 | group), and make it work for a more general class of problems. Just an idea.

noaa-nwfsc / zoid Goto Github PK

zoid's Introduction

ZOID

NOAA Disclaimer

zoid's People

Contributors

Stargazers

Watchers

Forkers

zoid's Issues

initialization failed for metabarcoding diet data

issues with random effect + question output get_pars()

Add random effects

trouble with installation

Rejecting initial value when sampling

Hierarchical version?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs