japal / zcompositions Goto Github PK

View Code? Open in Web Editor NEW

4.0 1.0 1.0 310 KB

Imputation of zeros, nondetects and missing data in compositional data sets

R 100.00%

compositional-data missing-data censored-data imputation-methods nondetection r-package

zcompositions's Introduction

zCompositions

Imputation of Zeros, Nondetects and Missing Data in Compositional Data Sets

zcompositions's People

Contributors

Stargazers

Watchers

Forkers

lmartinezgili

zcompositions's Issues

Unable to replicate result of cmultRepl(); An error "missing value where TRUE/FALSE needed" appears after update

Hi all,

I used the function cmultRepl() around half year ago on a dataset with no issue (just warnings of too much zeros).

However, I encounter the following error when I re-run the code again (using the same dataset):
Error in if (any(checkNumZerosRow/ncol(X) >= z.warning)) { : missing value where TRUE/FALSE needed
I believe the error may relate to the latest release or any updates of the package.

Here is the code I used:
df_2 <- cmultRepl(df_1, method = "GBM", output = "p-counts")

I also checked the dataset with no NA or any missing value; and no rows/columns with all zero.
Not sure if this information is relevant or not.

Any advice would be highly appreciated!
Thank you.

missing value where TRUE/FALSE needed

When using rather sparse input tables for cmultRepl(), the function often throws the error:

Error in if (any(X2[i, z] > colmins[z])) {: missing value where TRUE/FALSE needed
Traceback:

1. cmultRepl(mtx, output = "p-counts")

It would be helpful to at least provide a more informative error on what the problem is.

Unexpected Warning and Error Using lrEM Function with High z.warning Threshold and z.delete Set to FALSE

Hi! First of all, thank you for creating this package!

I'm encountering an issue with the lrEM function from the zCompositions package when handling a dataset containing a significant amount of zeros. The warnings suggest that columns and rows with more than 80% zeros/unobserved values are being deleted, even though I have explicitly set z.warning to 0.992 and z.delete to FALSE. Additionally, the process results in an error related to undefined columns being selected.

Function Call and Warning Messages:

Here is the function call I used:

lrEM(df, 
     label = 0, 
     dl = rep(10, ncol(df)), 
     rob = TRUE, 
     ini.cov = "multRepl", 
     z.warning = 0.992, 
     z.delete = FALSE,
     closure = 1440)

And these are the warning messages received:

Warning: Column no. 4 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 5 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 8 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 10 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Column no. 11 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Warning: Row no. 513 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Row no. 1482 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Row no. 1503 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Row no. 2072 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Row no. 2169 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Row no. 2515 containing >80% zeros/unobserved values deleted (see arguments z.warning and z.delete).
Error in `[.data.frame`(X.mr, , obs[[npat]]) : undefined columns selected

Expected Behavior:

With z.delete explicitly set to FALSE and a z.warning threshold of 0.992, my expectation was that no columns or rows would be deleted based on the proportion of zeros/unobserved values, and that I would not receive warnings indicating otherwise.

Observed Behavior:

Warning messages were received indicating the deletion of columns and rows with more than 80% zeros/unobserved values, contrary to the z.delete = FALSE setting.
An error occurred related to undefined columns selected, which might be a result of these unexpected deletions.

Additional Context:

My dataset includes a considerable amount of zero values, and retaining columns/rows with high proportions of zeros is crucial for my analysis.
I'm concerned that the deletion of these columns/rows could impact the integrity and outcome of my analysis.

I would greatly appreciate any guidance on why these deletions and warnings are occurring despite the z.delete setting, as well as any advice on resolving the issue or if there's a potential bug in the function handling.

Thank you for your time and assistance! :)

How to handle zeros whilst ignoring NA in lrEM or lrEMplus

Hi All,
I am using lrEM on a dataset that contains both zeros and NAs but I only want to change the zeros, when I use lrEMplus I can't seem to specify only zeros and when I use lrEM it gives me the error message

NA values not labelled as censored or missing values were found in the data set

From this I assume I can somehow censor the NA (I don't want to get rid of every row containing NA just exclude it from this function) but I can't figure out how to do this or if it is possible.

Any help would be greatly appreciated
Thanks

issue with NAs in lrEM convergence

Hello! I come from a python background, thus this issue has been tough to parse for me:

happens when checking convergence on line 225 (lrEM):
if ((max(c(Mdif,Cdif)) < tolerance) | (niters == max.iter)) iter_again <- 0

from picking it apart, I have NA in my M and C, which cause Mdif and Cdif to == NA as well...

issue is: "Error in if ((max(c(Mdif, Cdif)) < tolerance) | (niters == max.iter)) iter_again <- 0: missing value where TRUE/FALSE needed"

thank you for any help!

Changing cmultRepl function's behaviour drastically in a patch release?

As the z.warning parameter indicates it should be a warning, not including any sort of actions, such as implicitly deleting registers from the dataset if it's above the given z.warning threshold.

Trying to reconcile multRepl() with cmultRepl(method="CZM")

I've been trying to run cmultRepl(method="CZM") on some very sparse microbiome data where the sample totals vary widely. I am finding that the output values are negative for some datasets. The Bayesian methods all fail with the datasets I'm using, likely due to their sparseness.

I wonder if multRepl is meant to be similar to cmultRepl(method="CZM")? I noticed that, multRepl has the nice feature of checking for negative output values and giving the user the ability to add a closure value to make sure that there are no negative output values. When I run the following, I get different outputs...

multRepl(LPdata, label = 0, imp.missing = TRUE, closure = 10^6)$Cu

cmultRepl(LPdata, output = "p-counts", method="CZM")$Cu

It's likely I misunderstand the differences between these two functions. If I'm asking an impossible question, then is there a way to "correct" the output of cmultRepl to not get negative values?