zdebruine / rcppml Goto Github PK

View Code? Open in Web Editor NEW

90.0 3.0 15.0 20.09 MB

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more

License: GNU General Public License v2.0

R 2.33% C++ 97.32% C 0.35%

matrix-factorization clustering nmf sparse-matrix r rcpp rcppeigen

rcppml's Introduction

rcppml's People

Stargazers

Watchers

Forkers

minghao2016 davisvaughan czarnewski navinlabcode trichelab lqm24 ttriche ricomnl nderituc scholln wyattbarber shunsunsun yunuuuu wainberg skyler-ruiter

rcppml's Issues

`SparseMatrix::t()` doesn't work unless Matrix is fully attached

In a fresh R session:

library(modeldata)
data("Chicago")

df <- Chicago[, c("Austin", "Quincy_Wells")]
x <- t(as.matrix(df))
x <- Matrix::Matrix(x, sparse = TRUE)

RcppML::nmf(
  A = x,
  k = 7,
  L1 = c(0.001, 0.001), 
  verbose = FALSE, 
  seed = 86021L, 
  nonneg = TRUE
)
#> Error in RcppML::nmf(A = x, k = 7, L1 = c(0.001, 0.001), verbose = FALSE, : Cannot convert object to an environment: [type=character; target=ENVSXP].

If you library(Matrix), then it works.

The problem is with this line here:

RcppML/inst/include/RcppML/SparseMatrix.hpp

Line 82 in 134be7e

Rcpp::Environment base("package:Matrix");

It can't seem to lookup the Matrix package, so it returns the error from above.

I'm not entirely sure how Rcpp::Environment works, but I assume it is trying to find that environment on the search path, as returned by search(), which only lists fully attached packages (rather than loaded ones). I think Matrix will be loaded, but not fully attached, so it can't find it.

crossValidate aborting R

Hello,

Thank you for developping this incredibly fast nmf package.
I am using RcppML version 0.5.2.

Firstly, when I want to run the crossValidate function with method = "impute" with a 36591x1098 dgCMatrix of scRNAseq data as input, my R session abort almost instantly. I don't have any issue with the 2 others methods.

Secondly, since I am missing the plot generated by the imputation method for my data and even though I read your blog post "Cross-validation for NMF rank determination", I am still confused on how to choose k when the Bi-cross-validation (method = "predict") and the Robustness (method = "robust") return results similar to the one found in the hawaiibirds dataset.

Regards,
Yannick

Clarification wrt L1/Lasso penalties

Hi Zack and team, thanks for this awesome package!

I'd like to seek some clarification about the L1 penalty parameters in the nmf function:

L1/LASSO penalties between 0 and 1, array of length two for c(w,h)

Are the regularizations of w and h achieved independently? In Figure 4 of the preprint you show the effect of varying the penalty parameter per run, but it is not clear if you're referring to the penalty term for w, h, or both (set to the same value?). In Figure S4, however, you specifically focus on regularization of w. What are the implications of regularizing one of the two matrices (w or h) vs both simultaneously? I'm thinking of how to best vary these two parameters in a CV scheme; Does a grid-based search of pairwise combinations of L1 penalties c(w,h) sound reasonable to you?

Many thanks,
Sina

lnmf

Hi Zack!

Thanks for developing RcppML and for providing such clear posts about NMF. I would like to run Linked NMF in my cohort data to obtain shared factors. Before that, I was going through the example at the end of your post of Linked NMF (https://www.zachdebruine.com/post/linked-nmf-for-signal-source-separation/) and realized you updated the aml dataset. I tried the following to get a quick idea of how it works. I obtained the following error:

library(RcppML)
library(singlet)

data(aml)

# make sample names unique
aml$metadata_h$new_names <- paste(aml$metadata_h$samples, aml$metadata_h$category, sep="-")
colnames(aml$data) <- aml$metadata_h$new_names

# make list of datasets
data_examp <- list(
  aml$data[, which(grepl( "AML" , colnames(aml$data)))],
  aml$data[, which(grepl( "Control" , colnames(aml$data)))]
)

# run linked nmf
lnmf_model <- lnmf(data_examp, k_wh = 3, k_uv = c(2, 2))

Error in getClass(Class, where = topenv(parent.frame())): “lnmf” is not a defined class
Traceback:

1. lnmf(data_examp, k_wh = 3, k_uv = c(2, 2))
2. new("lnmf", w = w, u = u, v = v, h = h, d_wh = d_wh, d_uv = d_uv, 
 .     misc = model@misc)
3. getClass(Class, where = topenv(parent.frame()))
4. stop(gettextf("%s is not a defined class", dQuote(Class)), domain = NA)

These are the package versions

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/isentis/software/anaconda3/envs/ines_r4.1.1c/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] singlet_0.99.27    dplyr_1.0.9        SeuratObject_4.0.2 Seurat_4.0.5      
[5] RcppML_0.5.6      

loaded via a namespace (and not attached):
  [1] fgsea_1.20.0          Rtsne_0.16            colorspace_2.0-3     
  [4] deldir_1.0-6          ellipsis_0.3.2        ggridges_0.5.3       
  [7] IRdisplay_1.0         base64enc_0.1-3       spatstat.data_3.0-1  
 [10] leiden_0.4.2          listenv_0.8.0         ggrepel_0.9.1        
 [13] fansi_1.0.3           codetools_0.2-18      splines_4.1.1        
 [16] knitr_1.40            polyclip_1.10-0       IRkernel_1.1         
 [19] jsonlite_1.8.0        ica_1.0-3             cluster_2.1.4        
 [22] rgeos_0.5-9           png_0.1-7             uwot_0.1.10          
 [25] shiny_1.7.2           sctransform_0.3.4     spatstat.sparse_3.0-1
 [28] msigdbr_7.5.1         compiler_4.1.1        httr_1.4.4           
 [31] assertthat_0.2.1      Matrix_1.4-1          fastmap_1.1.0        
 [34] lazyeval_0.2.2        limma_3.50.3          cli_3.3.0            
 [37] later_1.3.0           htmltools_0.5.3       tools_4.1.1          
 [40] igraph_1.3.4          gtable_0.3.0          glue_1.6.2           
 [43] RANN_2.6.1            reshape2_1.4.4        fastmatch_1.1-3      
 [46] Rcpp_1.0.9            scattermore_0.8       vctrs_0.4.1          
 [49] babelgene_22.9        nlme_3.1-159          lmtest_0.9-40        
 [52] spatstat.random_3.1-4 xfun_0.32             stringr_1.4.1        
 [55] globals_0.16.1        mime_0.12             miniUI_0.1.1.1       
 [58] lifecycle_1.0.1       irlba_2.3.5           goftest_1.2-3        
 [61] future_1.22.1         MASS_7.3-58.1         zoo_1.8-10           
 [64] scales_1.2.1          spatstat.core_2.4-4   promises_1.2.0.1     
 [67] spatstat.utils_3.0-2  parallel_4.1.1        RColorBrewer_1.1-3   
 [70] reticulate_1.25       pbapply_1.5-0         gridExtra_2.3        
 [73] ggplot2_3.3.6         rpart_4.1.16          stringi_1.7.8        
 [76] BiocParallel_1.28.3   repr_1.1.3            rlang_1.0.4          
 [79] pkgconfig_2.0.3       matrixStats_0.62.0    evaluate_0.16        
 [82] lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4          
 [85] tensor_1.5            patchwork_1.1.2       htmlwidgets_1.5.4    
 [88] cowplot_1.1.1         tidyselect_1.1.2      parallelly_1.32.1    
 [91] RcppAnnoy_0.0.19      plyr_1.8.7            magrittr_2.0.3       
 [94] R6_2.5.1              generics_0.1.3        pbdZMQ_0.3-5         
 [97] DBI_1.1.3             mgcv_1.8-40           pillar_1.8.1         
[100] fitdistrplus_1.1-8    sp_1.5-0              survival_3.4-0       
[103] abind_1.4-5           tibble_3.1.8          future.apply_1.9.0   
[106] crayon_1.5.1          uuid_1.1-0            KernSmooth_2.23-20   
[109] utf8_1.2.2            spatstat.geom_3.1-0   plotly_4.10.0        
[112] grid_4.1.1            data.table_1.14.2     digest_0.6.29        
[115] xtable_1.8-4          tidyr_1.2.0           httpuv_1.6.5         
[118] munsell_0.5.0         viridisLite_0.4.1

Thank you!

Initialization

Hi there,

Thanks for an excellent R package!

I was wondering if you have any plan to implement an option to leverage user specified initial matrices for the nmf similar to the nnmf function in this package?

Cheers,
Ludvig

error in installing devtools::install_github("zdebruine/RcppML")

devtools::install_github("zdebruine/RcppML")
Using GitHub PAT from the git credential store.
Downloading GitHub repo zdebruine/RcppML@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All
2: CRAN packages only
3: None
4: xfun (0.42 -> 0.43) [CRAN]

Enter one or more numbers, or an empty line to skip updates: 1
xfun (0.42 -> 0.43) [CRAN]
Installing 1 packages: xfun
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.3/xfun_0.42.tgz'
Content type 'application/x-gzip' length 475932 bytes (464 KB)

downloaded 464 KB

The downloaded binary packages are in
/var/folders/xf/8l51mcd939x_wh_596kx9lpw0000gn/T//RtmpSU7Lxl/downloaded_packages
── R CMD build ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/xf/8l51mcd939x_wh_596kx9lpw0000gn/T/RtmpSU7Lxl/remotes5088378f0bf2/zdebruine-RcppML-5449a5b/DESCRIPTION’ ...
─ preparing ‘RcppML’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
NB: this package now depends on R (>= 3.5.0)
WARNING: Added dependency on R >= 3.5.0 because serialized objects in
serialize/load version 3 cannot be read in older versions of R.
File(s) containing such objects:
‘RcppML/data/aml.rdata’ ‘RcppML/data/hawaiibirds.rdata’
‘RcppML/data/movielens.rdata’
─ building ‘RcppML_0.5.6.tar.gz’

installing source package ‘RcppML’ ...
** using staged installation
** libs
using C++ compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using C++11
using SDK: ‘MacOSX14.4.sdk’
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I../inst/include/ -I'/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include' -I/opt/R/arm64/include -DEIGEN_INITIALIZE_MATRICES_BY_ZERO -DEIGEN_NO_DEBUG -fPIC -falign-functions=64 -Wall -g -O2 -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from ./../inst/include/RcppML.h:13:
../inst/include/RcppML/SparseMatrix.h:96:63: warning: private field 's_size' is not used [-Wunused-private-field]
int col_, index, max_index, s_max_index, s_index = 0, s_size;
^
1 warning generated.
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I../inst/include/ -I'/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include' -I/opt/R/arm64/include -DEIGEN_INITIALIZE_MATRICES_BY_ZERO -DEIGEN_NO_DEBUG -fPIC -falign-functions=64 -Wall -g -O2 -c RcppFunctions.cpp -o RcppFunctions.o
In file included from RcppFunctions.cpp:9:
./../inst/include/RcppML/SparseMatrix.h:96:63: warning: private field 's_size' is not used [-Wunused-private-field]
int col_, index, max_index, s_max_index, s_index = 0, s_size;
^
1 warning generated.
clang++ -arch arm64 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I../inst/include/ -I'/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rcpp/include' -I/opt/R/arm64/include -DEIGEN_INITIALIZE_MATRICES_BY_ZERO -DEIGEN_NO_DEBUG -fPIC -falign-functions=64 -Wall -g -O2 -c bipartiteMatch.cpp -o bipartiteMatch.o
clang++ -arch arm64 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o RcppML.so RcppExports.o RcppFunctions.o bipartiteMatch.o -L/Library/Frameworks/R.framework/Resources/lib -lRlapack -L/Library/Frameworks/R.framework/Resources/lib -lRblas -L/opt/gfortran/lib/gcc/aarch64-apple-darwin20.0/12.2.0 -L/opt/gfortran/lib -lgfortran -lemutls_w -lquadmath -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: warning: search path '/opt/gfortran/lib/gcc/aarch64-apple-darwin20.0/12.2.0' not found
ld: warning: search path '/opt/gfortran/lib' not found
ld: library 'gfortran' not found
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [RcppML.so] Error 1
ERROR: compilation failed for package ‘RcppML’
removing ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppML’
restoring previous ‘/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppML’
Warning message:
In i.p(...) :
installation of package ‘/var/folders/xf/8l51mcd939x_wh_596kx9lpw0000gn/T//RtmpSU7Lxl/file508859f9f10d/RcppML_0.5.6.tar.gz’ had non-zero exit status

any thoughts on how to solve this?

Factor loadings dominated by mitochondrial and ribosomal genes

Thank you for writing the blazing-fast package!

I have a more of a practical question regarding interpretation of the NMF results. So, I saw that my factor loadings are mostly dominated by mitochondrial and ribosomal genes. I have made sure that the cells are of good quality, so I am not sure how to interpret the results. It seems like the factors are picking up the highly expressed genes. Would it make sense to remove those genes prior to NMF?

Apologies in advance if I am asking on the wrong venue.

Regards,
Mikhael

scope to support semi NMF?

Hi,

Thank you for this great package!

Any scope to include semi nonnegative matrix factorization? Your docs seems to suggest ability to allow negative numbers in matrix to be factorized, but the functions no longer seem to support it. Some reference implementations:

Python implementation based on original paper "Ding et al. Convex and Semi-Nonnegative Matrix Factorizations." at https://github.com/cthurau/pymf/blob/master/pymf/snmf.py
Matlab implementation based on what appears to be an improved method by "Gillis et al. Exact and Heuristic Algorithms for Semi-Nonnegative Matrix Factorization" at https://gitlab.com/ngillis/nmfbook/-/tree/master/algorithms/semi-NMF?ref_type=heads

The Matlab implementation appears closest to the method used here, using alternating least squares, but replacing the basis matrix update step with ordinary least squares, since the basis matrix is allowed to be negative.

How to perform orthogonal NMF and other questions.

First of all, thanks for this great package, it's blazing fast! But when I read the paper and the code I got confused. Here are some questions:

In the NMF Problem Definition section, right above equation (1), you said NMF seeks to decompose A into orthogonal lower-rank non-negative matrices w and h. How is this achieved? The algorithm seems to be based on solving linear systems, I don't see any thing about orthogonalizing the solution. Also if w is orthogonal, does this mean that w^Tw = I and in equation (3) a is just the identity matrix?
In predict.hpp, when applying the L1 penalty, the script if (L1 != 0) b.array() -= L1; then solve it just like ordinary linear regression. Is this an approximation solution for the regression with L1 penalty since in general that's not how it's solved.
Also equations (4) seems to contain a typo: To update for ith column of h(h_i), given the ith column of A(A_i) and w, the NMF problem is A_i = w h_i, and the corresponding system is w^T A_i = w^T w h_i, which means in (4) it should write b = w^T A_i.

How to emulate functionality from NMF package for estimating the factorization rank

I am interested in using this package to run an NMF analysis on some sparse data. The NMF R package from CRAN, provides the NMF rank survey functionality when using a range of values for k. I was wondering whether it's possible to obtain the sparseness/cophenetic/silhouette metrics per rank from RcppML::nmf()?

how to check the top features in each rank?

Hi Zach,

Thank you for implementing this great package for fast NMF. Just wondering is there a similar function like the extracFeatures from the original nmf package that allows us to quickly get the top fetures from each rank? Thanks!

Best,
Rui

Functions don't work unless Matrix is loaded

Hello 👋

nmf() doesn't work unless the {Matrix} package is explicitly loaded. Below is reprex using examples from README.

A <- Matrix::rsparsematrix(1000, 100, 0.1) # sparse Matrix::dgCMatrix
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)
#> 
#> iter |      tol 
#> ---------------
#> Error in Rcpp_nmf_sparse(A, k, tol, maxit, verbose, nonneg, L1, seed, : Cannot convert object to an environment: [type=character; target=ENVSXP].

library(Matrix)
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)
#> 
#> iter |      tol 
#> ---------------
#>    1 | 9.03e-01
#>    2 | 3.69e-02
#>    3 | 1.85e-02
#>    4 | 1.15e-02
#>    5 | 7.51e-03
#>    6 | 5.05e-03
#>    7 | 3.25e-03
#>    8 | 2.03e-03
#>    9 | 1.44e-03
#>   10 | 9.80e-04
#>   11 | 7.43e-04
#>   12 | 5.99e-04
#>   13 | 5.74e-04
#>   14 | 5.76e-04
#>   15 | 5.64e-04
#>   16 | 5.48e-04
#>   17 | 4.88e-04
#>   18 | 4.27e-04
#>   19 | 3.45e-04
#>   20 | 2.88e-04
#>   21 | 2.47e-04
#>   22 | 2.42e-04
#>   23 | 2.39e-04
#>   24 | 2.42e-04
#>   25 | 2.40e-04
#>   26 | 1.75e-04
#>   27 | 1.56e-04
#>   28 | 1.43e-04
#>   29 | 1.34e-04
#>   30 | 1.30e-04
#>   31 | 1.24e-04
#>   32 | 1.18e-04
#>   33 | 1.09e-04
#>   34 | 9.56e-05

^{Created on 2022-01-26 by the reprex package (v2.0.1)}

Add scope to support consensus connectivity heatmap

Hi!

Great package and enjoy how well/fast the RcppML package performs!

I was just wondering if addition of a consensus map, similar to the ones implemented by brunet https://doi.org/10.1073/pnas.0308531101, would be useful here!

Regards,

ttj131

Alternative Strategy for NNLS Updates

Current NMF updates use a step of $\frac{-W^TA_{:j}}{W^TW}$ for updates of $H$ with Sequential Coordinate Descent (SCD).

We could avoid SCD by adding an additional term to the updates: $\frac{W^TWH_{:j} - W^TA_:j}{W^TW}$ for updates of $H$. The same logic applies to updates of $W$. Since we already have $a = W^TW$, the only additional operations we have are $aH_{:j}$ (which is very cheap) and the subtraction of the two terms in the numerator. This is likely to be faster, but it is unclear whether the convergence properties will be as excellent. Perhaps an adam optimizer would be applicable to this update, while it does not benefit ALS SCD.

do we need to remove batch effect?

Hi team, much better than the original NMF on saving time!
When I did the original NMF, I used
A <- scRNA@assays$[email protected]
all the samples match.
However, in your article, "For example, different normalizations of the data or batch effects can lead to fundamentally different SVD results across most factors. On the other hand, because NMF factors are collectively updated, distinct technical issues are usually explained by a single factor while other factors are left unaffected, making it robust across datasets. "
So I used
A <- scRNA@assays$RNA@counts
But the samples are dispersed. Which one should I use?
Thanks!

Is number of iterations for conversion significant?

I'm running NMF on several 10x samples with a tol of 1e-08. For some samples ~100 iterations is enough to converge for others it gets closer to 1000.

Does this indicate anything specific about the underlying data / chosen k ? I ran crossValidate and selected a k that looked suitable among all samples rather than changing k per sample - could this be the reason?

What would you suggest the best approach here to be?

Cheers

Interpretation of NMF results

Hello,

We have been utilizing RcppML to identify disease related modules and have had great experiences. I really appreciate you have developed the fast and robust NMF package, which is an impactful contribution to our community.

I have a few questions and wanted to hear your insights.

feature selection, feature size
In addition to using variable genes of scRNAseq data, I feel RcppML::NMF is working well with disease related genes (up-regulated DEGs in disease) to decompose ~1000-3000 DEGs into 20-50 co-regulated modules.
I wonder if there is a limit in the size of features that we can feed into NMF.
In some programs, we have ~200-400 DEGs derived in bulk-seq data. I thought NMF might be useful in decomposing these genes into some groups of modules using corresponding scRNAseq data, but am not sure if 200-400 genes are enough to run NMF.
I tried, and in this particular case, RcppML::crossValidate gave optimal size of components as 4-5. (which is not too bad given feature size is small).
NMF score to selecting top genes in each component.
It seems that summation of each component (gene x NMF components) is 1.
I wonder what is a desirable way to determine top contributing genes in each component.
I tried:
a. z-score genes in each component and pick genes with Z>1.96
b. z-score genes in all components (since components are scaled to 1) and pick genes with Z>1.96 in each component.
In Pelka et all., 2021, Cell (https://pubmed.ncbi.nlm.nih.gov/34450029/), authors did some scaling to order genes in each NMF component and then they pick top 100-150 genes in each component (inclusive than Z score way, but I wonder how to determine the right size of top genes (e.g. ~100 genes) from NMF.

Thanks so much!!
Koichi

Expecting a single value: [extent=0]

Hello, I am getting the below error when updating RcppML from 0.3.7 to the current GitHub release.

The error arises when using either Rcpp_nmf_dense or Rcpp_nmf_sparse. It seems related to C++ since both functions are calling C++ code internally which I cannot figure out since my C++ knowledge is limited to non-existent!

I am using RcppML in one of my packages which will be published soon. Although I will revert to using the 0.3.7 version, where the error is not present, I would love to use and incorporate the latest version of RcppML with all the extra functionality it has!

I tried it even with a random 20x5 matrix:
A <- matrix(rnorm(100), ncol = 5, nrow = 20)

The Error:

<error/rlang_error>
Error:
! Expecting a single value: [extent=0].
Caused by error:
! Expecting a single value: [extent=0].
---
Backtrace:
    ▆
 1. └─RcppML::nmf(data = A, k = 2, maxit = 250, seed = 1)
 2.   └─RcppML:::Rcpp_nmf_dense(...)

model failures

I have had a few cases where it appears that the objective function is nan. This has occurred when too much regularization is used but, in the example below, there is no penalty applied. This uses RcppML 0.1.0.

library(RcppML)
library(modeldata)
library(Matrix)

data(biomass)

biomass <- biomass[, 3:7]
biomass <- as.matrix(biomass)
biomass <- t(biomass)
biomass <- Matrix(biomass, sparse = TRUE)

res <- nmf(biomass, k = 5, seed = 1)
#> 
#> iter |      tol 
#> ---------------
#>    1 |      nan
#>    2 |      nan
#>    3 |      nan
#>    4 |      nan
#>    5 |      nan
#>    6 |      nan
#>    7 |      nan
#>    8 |      nan
#>    9 |      nan
#>   10 |      nan
#>   11 |      nan
#>   12 |      nan
#>   13 |      nan
#>   14 |      nan
#>   15 |      nan
#>   16 |      nan
#>   17 |      nan
#>   18 |      nan
#>   19 |      nan
#>   20 |      nan
#>   21 |      nan
#>   22 |      nan
#>   23 |      nan
#>   24 |      nan
#>   25 |      nan
#>   26 |      nan
#>   27 |      nan
#>   28 |      nan
#>   29 |      nan
#>   30 |      nan
#>   31 |      nan
#>   32 |      nan
#>   33 |      nan
#>   34 |      nan
#>   35 |      nan
#>   36 |      nan
#>   37 |      nan
#>   38 |      nan
#>   39 |      nan
#>   40 |      nan
#>   41 |      nan
#>   42 |      nan
#>   43 |      nan
#>   44 |      nan
#>   45 |      nan
#>   46 |      nan
#>   47 |      nan
#>   48 |      nan
#>   49 |      nan
#>   50 |      nan
#>   51 |      nan
#>   52 |      nan
#>   53 |      nan
#>   54 |      nan
#>   55 |      nan
#>   56 |      nan
#>   57 |      nan
#>   58 |      nan
#>   59 |      nan
#>   60 |      nan
#>   61 |      nan
#>   62 |      nan
#>   63 |      nan
#>   64 |      nan
#>   65 |      nan
#>   66 |      nan
#>   67 |      nan
#>   68 |      nan
#>   69 |      nan
#>   70 |      nan
#>   71 |      nan
#>   72 |      nan
#>   73 |      nan
#>   74 |      nan
#>   75 |      nan
#>   76 |      nan
#>   77 |      nan
#>   78 |      nan
#>   79 |      nan
#>   80 |      nan
#>   81 |      nan
#>   82 |      nan
#>   83 |      nan
#>   84 |      nan
#>   85 |      nan
#>   86 |      nan
#>   87 |      nan
#>   88 |      nan
#>   89 |      nan
#>   90 |      nan
#>   91 |      nan
#>   92 |      nan
#>   93 |      nan
#>   94 |      nan
#>   95 |      nan
#>   96 |      nan
#>   97 |      nan
#>   98 |      nan
#>   99 |      nan
#>  100 |      nan
all(is.na(res$w))
#> [1] TRUE

^{Created on 2021-09-09 by the reprex package (v2.0.0)}

nmf error

Expecting a single value: [extent=0].

Same MSE for different crossValidate methods

I switched from python's sklearn NMF to your implementation because of the improved speed and convenient cross-validation. This also means I'm fairly new to R and I'm sorry if my question is obvious.

I have a dataset with 500000 features and 734 observations. I ran crossValidate() with methods 'predict', 'robust', and 'impute' for 2:10 components with ten repetitions each and the same random seed (everything else was default settings). The resulting MSE's were exactly the same across methods, which I believe shouldn't be the case despite using the same random seed. Any ideas why this might happen?

Thank you for developing this amazing package!

Upper-bounded NNLS

Feature request to impose upper bound on NNLS solutions.

Example code is broken, need to run library(RcppML) first

I'm using the latest development version of RcppML, installed via devtools::install_github("zdebruine/RcppML"). When I ran the example code:

A <- Matrix::rsparsematrix(1000, 100, 0.1) # sparse Matrix::dgCMatrix
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)

I got the error:

Error in Rcpp_nmf_sparse(data, mask_matrix, tol, maxit, getOption("RcppML.verbose"),  :
  Expecting a single value: [extent=0].

However, it works fine if you load the library first:

library(RcppML)
A <- Matrix::rsparsematrix(1000, 100, 0.1)
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)  # or: model <- nmf(A, k = 10, nonneg = TRUE)

Matrix-related error in dev version of crossValidate

Hello,

Matrix is throwing an error when trying to run crossValidate on the dev version of RcppML.

Relevant versions.

> R.Version()$version.string
[1] "R version 4.2.1 (2022-06-23)"
> packageVersion("RcppML")
[1] ‘0.5.6’
> packageVersion("Matrix")
[1] ‘1.5.1’

Reproducible example.

> RcppML::crossValidate(as.matrix(mtcars), k=1:5, rep=2)
Error: as(<ngCMatrix>, "dgCMatrix") is deprecated since Matrix 1.5-0; do as(., "dMatrix") instead
> traceback()
6: warning.(gettextf("as(<%s>, \"%s\") is deprecated since Matrix 1.5-0; do %s instead", 
       cln1, cln2, deparse1(.as.via.virtual(Class1, Class2, quote(.)))), 
       call. = FALSE, domain = NA)
5: Matrix.DeprecatedCoerce(cd1, cd2)
4: asMethod(object)
3: as(mask, "dgCMatrix")
2: nmf(data, rank, mask = mask_matrix, ...)
1: RcppML::crossValidate(as.matrix(mtcars), k = 1:5, rep = 2)

This error doesn't effect RcppML::nmf.

Cheers,
Bob

Different cost functions

Hi!

Is it possible to set a different cost function instead of Frobenius norm?

For example, a Poisson cost or KL: https://pubmed.ncbi.nlm.nih.gov/15016911/

R code example in the README is broken?

I tried to run the R example code provided in the README page with RcppML 0.3.7 from CRAN, but ran into errors:

A <- Matrix::rsparsematrix(A, 1000, 1000, 0.1) # sparse Matrix::dgCMatrix
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)
h0 <- RcppML::project(A, w = model$w)
RcppML::mse(A, m$w, m$d, m$h)

Maybe it should be this?

A <- Matrix::rsparsematrix(1000, 1000, 0.1) # sparse Matrix::dgCMatrix
model <- RcppML::nmf(A, k = 10, nonneg = TRUE)
h0 <- RcppML::project(A, w = t(model$w))
RcppML::mse(A, model$w, model$d, model$h)

Thanks for making this great R package available!

RcppML::project renamed?

Hello there, thanks for your work on this package. I installed via devtools::install_github("zdebruine/RcppML") but don't seem to see the method 'project' mentioned in the vignette.

> packageVersion('RcppML')
[1] ‘0.5.2’
> RcppML::project(X, res@w)
Error: 'project' is not an exported object from 'namespace:RcppML'

Has it possibly been replaced by predict?

Matrix package deprecations

Email from Matrix package maintainers:

Dear R Maintainer maintainer,

You are receiving this message because at least one CRAN or
Bioconductor package that you maintain requires revisions due to
deprecations in the forthcoming Matrix version 1.4-2, which you can
install with

install.packages("Matrix", repos = "http://r-forge.r-project.org/")

(The list of 259 affected packages with 209 unique maintainers
is at the end)

Matrix 1.4-2 will formally deprecate 187 coercion methods. More
precisely, coercions of the form

as(object, Class)

where

'object' inherits from the virtual class Matrix, is a traditional
matrix, or is a logical or numeric vector
'Class' specifies a non-virtual subclass of Matrix, such as
dgCMatrix, but really any subclass matching the pattern
```
^[dln]([gts][CRT]|di|ge|tr|sy|tp|sp)Matrix$
```

will continue to work as before but signal a deprecation message or
warning (message in the widely used dg.Matrix and d.CMatrix cases).

By default, the message or warning will be signaled with the first
deprecated method call and suppressed after that. Signaling can be
controlled via option Matrix.warnDeprecatedCoerce:

<=0 = be completely silent [[ at your own risk ! ]]
1 = warn each time

=2 = error each time [[ for debugging ]]
NA = message or warn once then be silent [[ the default ]]

Deprecated coercions in your package sources (including examples,
tests, and vignettes) should be revised to go via virtual classes
only, as has been advocated for quite some time in

vignette("Design-issues", package = "Matrix")

For example, rather than

as(, "dgCMatrix")

we recommend (the full, a "permutation", or a simplification given the
context of)

as(as(as(, "dMatrix"), "generalMatrix"), "CsparseMatrix")

To simplify the revision process, the development version of Matrix
provides Matrix:::.as.via.virtual(), taking a pair of class names and
returning as a call the correct nesting of coercions:

Matrix:::.as.via.virtual("matrix", "dgCMatrix")
as(as(as(from, "dMatrix"), "generalMatrix"), "CsparseMatrix")
Matrix:::.as.via.virtual("matrix", "lsTMatrix")
as(as(as(from, "lMatrix"), "symmetricMatrix"), "TsparseMatrix")
Matrix:::.as.via.virtual("dgCMatrix", "dtrMatrix")
as(as(from, "triangularMatrix"), "unpackedMatrix")

We suggest checking package tarballs built with

options(Matrix.warnDeprecatedCoerce = n) # where n >= 1

in the .onLoad() hook (see ?.onLoad), so that all deprecated coercions
are exposed in the check output. (If you find that a warning or error
has been signaled from 'Matrix' itself, then we'd welcome a report
containing a minimal reproducible example, so that we may revise our
own code.)

If you are unable to make the requested changes before Sep 8 (~4 weeks
from now), or if you need help or clarification while fixing your code,
please let us know by replying and copying

[email protected]

Thank you very much!

The authors of 'Matrix'

Martin Maechler (maintainer)
Douglas Bates
Mikael Jagan

how to set seed?

Hello!

I really appreciate you have shared the fantastic package to the community.
I have been utilizing your RcppML::nmf for a while and find it very useful in identifying cellular programs in scRNAseq data.

I have a few questions around nmf as below.

what is the optimal seed value? I found that "seed" affects the results a bit (largely similar results irrespective of seed values). What if I choose large value like 1000 or small value like 0? What/how does it make differences?
I had a chance to run scikit learn's nmf with the same data. Although it largely provided similar results, I see a few differences. I wonder what components in the code could make differences in the final outputs. Is your nmf independent from scikit learn?
For single cell RNAseq data, can we use raw count or log-normalized data as inputs to nmf?

Thank you so much!

Koichi

Cran update

Hi RcppML team,

It seems that the GitHub dev version has already reached 0.5.x while the version on cran is still 0.3.x. I'm working on a package and would love to include your methods as a dependency. While the cran version is really out-of-date (I can see that those R version-related errors are already fixed in the dev version), I would like to request a cran update with your latest stable version.

Sincerely,
Yichen

compilation of the development version failing

Hi Zach,
First of all, thank you for this great package. I recently started working with it in R (version 4.2.0 ) and I’m having some issues when trying to compile the developement version with the command : devtools::install_github("zdebruine/RcppML"). Could you help me please ?

I’ve tried to install it in two different computers, both with Windows 11 and Rtools installed, and I keep getting the « package installation had non-zero exit status » error . The first warning message says that the package RcppSparse is not available for my R version, which is the latest one and I just can't download it. Do you have any idea where this could be coming from ?

Best regards,
Sonia

zdebruine / rcppml Goto Github PK

rcppml's Introduction

rcppml's People

Stargazers

Watchers

Forkers

rcppml's Issues

Enter one or more numbers, or an empty line to skip updates: 1 xfun (0.42 -> 0.43) [CRAN] Installing 1 packages: xfun trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.3/xfun_0.42.tgz' Content type 'application/x-gzip' length 475932 bytes (464 KB)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Enter one or more numbers, or an empty line to skip updates: 1
xfun (0.42 -> 0.43) [CRAN]
Installing 1 packages: xfun
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.3/xfun_0.42.tgz'
Content type 'application/x-gzip' length 475932 bytes (464 KB)