GithubHelp home page GithubHelp logo

noaa-nwfsc / zoid Goto Github PK

View Code? Open in Web Editor NEW
8.0 9.0 2.0 94.22 MB

Trinomial mixture models in Stan, for fitting to compositional data with 0s

Home Page: https://noaa-nwfsc.github.io/zoid/

License: Other

R 15.28% C++ 80.40% Stan 4.26% CSS 0.05%
mixture-models r-package stan cran nwfsc-cb

zoid's Introduction

ZOID

R-CMD-check

zoid implements zero-and-one inflated Dirichlet regression (also known as trinomial mixture models) in a Bayesian framework (Stan)

You can view the pkgdown package information here: https://noaa-nwfsc.github.io/zoid/

You can install the development version of the package with:

remotes::install_github("noaa-nwfsc/zoid")

NOAA Disclaimer

This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.

NOAA Fisheries Logo

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries

zoid's People

Contributors

andrjohns avatar eeholmes avatar ericward-noaa avatar oleshelton avatar willsatterthwaite avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

eriqande simeond

zoid's Issues

initialization failed for metabarcoding diet data

I'd like to use zoid to model diet data from DNA metabarcoding (data_matrix) across predator sizes (measured as crab carapace width, in mm; CW_mm in design_matrix). But I am having trouble getting past this initialization error:

fit_3_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix)/1000,
                       overdispersion = TRUE,
                       chains=1,   # just for testing
                       iter=500)   # just for testing
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or re parameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

In my starting data set, I have 58 individuals and 61 different prey species; I removed rare taxa that occurred in fewer than three individuals.

I tried aggregating species by family (58 individuals x 57 classes), class (58 individuals x 26 classes), and phylum (58 individuals x 26 phyla), to reduce the amount of "0" values in the matrix, and the variation between the minimum / maximum read values; but I still end up with the same error.

The last time this happened I was able to divide the data matrix by 100 to make the error go away, but that doesn't seem to be working this time.

This is the distribution of size, in case it's helpful!
image

issues with random effect + question output get_pars()

Heya,
thanks for your nice package,
I have tried to fit a model with a random effect, but I got an eror.
I also tried to use the example you provided (which is weid, because in both case, with my data, or your example, all row got a value).

set.seed(123)
y <- matrix(runif(99,1,4), ncol=3)
design <- data.frame("fac" = sample(letters[1:5], size=nrow(y), replace=TRUE))
design$fac <- as.factor(design$fac)
fit<-fit_zoid(formula=~(1|fac),design_matrix=design,data_matrix=y,chains=1,iter=100)
Error in na.fail.default(list(`1 | fac` = c(NA, NA, NA, NA, NA, NA, NA,  : 
  missing values in object
In addition: Warning message:
In Ops.factor(1, fac) : ‘|’ not meaningful for factors
image

I also wanted to know, since I am here, once you are using get_pars(), in the output $betas, how can I recognize my variable?
For example, if I have Y = cbind(var1, var2, var3), when I am looking in the $betas from get_pars function if m = 1, is it my var1 or var2 or var 3? how could we determine the different m? (e.g. which variable is used a reference)

Thank you!
image

trouble with installation

I'm excited to try the new release with some data that are grouped by site habitat, but I keep getting a non-zero exit status warning when I use devtools to install zoid from github. It looks like there might be an error in the code for model_dirichregmod? Or maybe I need to try updating another package / R permissions on my desktop first? I'm in R version 4.1.3 (2022-03-10). Thanks for the help!

> devtools::install_github("noaa-nwfsc/zoid")

The downloaded source packages are in
	‘C:\Users\mfisher5\AppData\Local\Temp\RtmpOErnoU\downloaded_packages’
-- R CMD build ------------------------------------------------------------------------------------------------------------------------
v  checking for file 'C:\Users\mfisher5\AppData\Local\Temp\RtmpOErnoU\remotes2ae820826225\noaa-nwfsc-zoid-c82b045/DESCRIPTION' ...
-  preparing 'zoid': (714ms)
v  checking DESCRIPTION meta-information ... 
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'zoid_1.3.1.tar.gz'
   Warning: file 'zoid/configure' did not have execute permissions: corrected
   
* installing *source* package 'zoid' ...
** using staged installation
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'model_dirichregmod' at line 7, column 2
  -------------------------------------------------
     5:   int N_covar; // number of covariates in design matrix X
     6:   matrix[N_samples, N_covar] design_X;
     7:   array[N_bins,N_bins-1] int prod_idx;
         ^
     8:   int overdisp; // whether or not to include overdispersion term
  -------------------------------------------------

PARSER EXPECTED: <one of the following:
  a variable declaration, beginning with type,
      (int, real, vector, row_vector, matrix, unit_vector,
       simplex, ordered, positive_ordered,
       corr_matrix, cov_matrix,
       cholesky_corr, cholesky_cov
  or '}' to close variable declarations>
Error in rstan::stanc(file_name, allow_undefined = TRUE, obfuscate_model_name = FALSE,  : 
  failed to parse Stan model 'dirichregmod' due to the above error.
Calls: <Anonymous> -> sapply -> lapply -> FUN -> <Anonymous>
Execution halted
ERROR: configuration failed for package 'zoid'
* removing 'C:/Users/mfisher5/Documents/R/R-4.1.3/library/zoid'
* restoring previous 'C:/Users/mfisher5/Documents/R/R-4.1.3/library/zoid'
Warning message:
In i.p(...) :
  installation of package ‘C:/Users/mfisher5/AppData/Local/Temp/RtmpOErnoU/file2ae8f3c3fa4/zoid_1.3.1.tar.gz’ had non-zero exit status

Rejecting initial value when sampling

In some cases, there's a warning from Stan thrown:

Chain 1: Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity. Chain 1: Stan can't start sampling from this initial value.

and sampling proceeds normally. In other cases, the initial sampling doesn't produce valid estimates and initialization fails:

Chain 1: Initialization between (-2, 2) failed after 100 attempts. Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. [1] "Error : Initialization failed." error occurred during calling the sampler; sampling not done

The issue in these cases is often that sample sizes (e.g. # reads for genetic data) is often very large and/or there are many 0s. The solution here is to manually specify the prior standard deviation of the beta parameters using the prior_sd argument, e.g.

fit <- fit_zoid(formula = y ~ 1, ..., prior_sd = 1)

Sometimes, several alternative values of the prior_sd may need to be explored

Hierarchical version?

This might be too hard to code in a general way, but might we/you make zoid hierarchical?

For example, I have a dataset w 4 discrete temperature treatments (10, 13, 15, and 18 degrees C), with many observations in each; each observation is a set of proportions divided among 4 possible categories. If I treat temperature as a continuous predictor, I'm pretty sure zoid (and, for that matter, a linear model or anything similar) will have the pseudoreplication problem: each observation is treated as independent, when really what I should be doing is asking whether the mean of each treatment is responding to the predictor variable (here, temperature).

If zoid could handle the syntax of stan_glm (or similar), we could do something like formula = ~ treatment + (1 | group), and make it work for a more general class of problems. Just an idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.