mims-harvard / nimfa Goto Github PK
View Code? Open in Web Editor NEWNimfa: Nonnegative matrix factorization in Python
Home Page: http://ai.stanford.edu/~marinka/nimfa/
License: Other
Nimfa: Nonnegative matrix factorization in Python
Home Page: http://ai.stanford.edu/~marinka/nimfa/
License: Other
Whenever I perform MF, I'd like to know whether the objective has converged or whether I should have run more iterations. So I was happy to read about the "track_error" option. However, I can't figure out how to use this option.
I have looked through the examples and none of them seem to use this option. In fact, none of them seem to use the "options" parameter, which is a dictionary.
So this "issue" is really just a question: how do I actually track the residuals? It's also a bit of a suggestion to make this clear in one of your examples, or perhaps even in one of the examples on the main mf page.
Here is what I have tried:
model = mf.mf(norm_X, seed="nndsvd",rank=5, max_iter=20, initialize_only=True, objective="div", update="divergence",options={"track_error":True})
fit = mf.mf_run(model)
summary = fit.summary()
print summary.keys()
The fit object does not seem to have a "error_tracking" attribute or function, and neither does the summary object.
Also, when I now type
print model.track_error
it prints "False"
but when I type
print model.options
it prints {'objective': 'div', 'options': {'track_error': True}, 'update': 'divergence'}
I am trying to factor the matrix which you can download here.
I am using the same settings as are used in your document clustering example. When I set max_iter to 25, I get this error. Any idea what's going on?
My parameter settings are:
rank=5
meth=nmf
max_iter=25
update=divergence
obj=div
Really appreciating Nimfa and its working well for me in factorizing some preference data. I am however trying to test different ranks, and would like to use the built-in method to return quality parameter. I cannot get the estimate_rank method to work however. Sample code:
import numpy as np
import nimfa
V = np.array([[1, 2, 3], [4, 5, 6], [6, 7, 8]])
print('Target:\n%s' % V)
lsnmf = nimfa.Lsnmf(V, seed='random_vcol', max_iter=10, rank=3, track_error=True)
lsnmf_fit = lsnmf()
W = lsnmf_fit.basis()
print('Basis matrix:\n%s' % W)
H = lsnmf_fit.coef()
print('Mixture matrix:\n%s' % H)
r = lsnmf_fit.estimate_rank()
print('Rank estimate:\n%' % r)
Running Python 3.5 with Nimfa 1.2.2
Thankyou!
When running build_ext
on Python 3.5
running build_ext
/tmp/nix-build-python3.5-nimfa-1.3.2.drv-0/nimfa-1.3.2/nimfa/examples/cbcl_images.py:98: UserWarning: PIL must be installed to run CBCL images example.
warn("PIL must be installed to run CBCL images example.")
/tmp/nix-build-python3.5-nimfa-1.3.2.drv-0/nimfa-1.3.2/nimfa/examples/orl_images.py:110: UserWarning: PIL must be installed to run ORL images example.
warn("PIL must be installed to run ORL images example.")
Traceback (most recent call last):
File "nix_run_setup.py", line 8, in <module>
exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))
File "setup.py", line 145, in <module>
setup_package()
File "setup.py", line 140, in setup_package
'Programming Language :: Python :: 3',],
File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/__init__.py", line 129, in setup
return distutils.core.setup(**attrs)
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/core.py", line 148, in setup
dist.run_commands()
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 226, in run
self.run_tests()
File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 248, in run_tests
exit=False,
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 94, in __init__
self.parseArgs(argv)
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 118, in parseArgs
self._do_discovery(argv[2:])
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/main.py", line 229, in _do_discovery
self.test = loader.discover(self.start, self.pattern, self.top)
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 341, in discover
tests = list(self._find_tests(start_dir, pattern))
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 398, in _find_tests
full_path, pattern, namespace)
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 475, in _find_test_path
tests = self.loadTestsFromModule(package, pattern=pattern)
File "/nix/store/3h0zjr4jnl7z7z6m33v9sk2ssswyw6ir-python3.5-bootstrapped-pip-9.0.1/lib/python3.5/site-packages/setuptools/command/test.py", line 52, in loadTestsFromModule
tests.append(self.loadTestsFromName(submodule))
File "/nix/store/xbqvpwhj2c9gz28sdh3s543rnfg9ycb5-python3-3.5.4/lib/python3.5/unittest/loader.py", line 213, in loadTestsFromName
raise TypeError("don't know how to make test from: %s" % obj)
TypeError: don't know how to make test from: {'sepnmf': <class 'nimfa.methods.factorization.sepnmf.SepNmf'>, 'snmnmf': <class 'nimfa.methods.factorization.snmnmf.Snmnmf'>, 'bd': <class 'nimfa.methods.factorization.bd.Bd'>, 'icm': <class 'nimfa.methods.factorization.icm.Icm'>, 'lfnmf': <class 'nimfa.methods.factorization.lfnmf.Lfnmf'>, 'lsnmf': <class 'nimfa.methods.factorization.lsnmf.Lsnmf'>, 'none': None, 'pmfcc': <class 'nimfa.methods.factorization.pmfcc.Pmfcc'>, 'bmf': <class 'nimfa.methods.factorization.bmf.Bmf'>, 'psmf': <class 'nimfa.methods.factorization.psmf.Psmf'>, 'nmf': <class 'nimfa.methods.factorization.nmf.Nmf'>, 'pmf': <class 'nimfa.methods.factorization.pmf.Pmf'>, 'snmf': <class 'nimfa.methods.factorization.snmf.Snmf'>, 'nsnmf': <class 'nimfa.methods.factorization.nsnmf.Nsnmf'>}
The error does not seem to present in Python 2.7.
I have found a minor bug. The RMSE calculation did not take a square root. (line 128). Thanks!
I've been using nimfa's NMF implementation for a bit, but recently switched to a python3 environment and decided to install it there. When I attempt to import nimfa in python3.4, I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/site-packages/nimfa/__init__.py", line 18, in <module>
from mf_run import *
ImportError: No module named 'mf_run'
It seems this is because python3 requires explicit relative imports, so the correct import statement would be from .mf_run import *
. After some more digging, I found several places where the library is not currently python3 compatible. I ran the 2to3
utility on the library in order to look for more, and got a number of changes that I have recorded in the following diff: https://gist.github.com/nadesai/0727db6af37383a153e9.
Are there any long-term plans to address these compatibility issues, and perhaps allow the library to be used under python3?
I found methods return basis / coefficient matrix in different data type.
For example,
pmf and nsnmf (as far as I noticed) return basis matrix in _sparse matrix of type '<type 'numpy.float64'_;
bd, lfnmf and nmf return basis matrix in _matrix_ format.
Could you kindly make the return matrix in same data type?
Thanks,
I am using Python 2.7.11, and my pip is up-to-date.
I use pip install nimfa, and it says that version 1.2.3 is being installed.
But when I run the example code, it says:
'module' object has no attribute 'examples'
Moreover strangely enough, when I uninstalled nimfa using pip uninstall nimfa, the module somehow persists, even after I restarted my computer. What could be going on?
My stack trace is here:
https://gist.github.com/2707012
I want to take a look at the initialized W and H (using nndsvd initialization). I try the following:
model = mf.mf(X, rank=10, initialize_only=True, seed="nndsvd")
init_W, init_H = model.basis(), model.coef()
but init_W and init_H are just assigned to be "None".
Any idea what I'm doing wrong?
I have also tried
initializer = mf.methods.seeding.nndsvd.Nndsvd()
init_, init_H = initializer.initialize(X, rank, {})
but I get the following error
/home/conradlee/local/lib/python2.6/site-packages/mf/methods/seeding/nndsvd.pyc in initialize(self, V, rank, options)
58 if negative(V):
59 raise MFError("The input matrix contains negative elements.")
---> 60 U, S, E = svd(V)
61 if sp.isspmatrix(U):
62 return self.init_sparse(V, U, S, E)
/home/conradlee/local/lib/python2.6/site-packages/mf/utils/linalg.pyc in svd(X)
313 U, S, V = _svd_left(X)
314 else:
--> 315 U, S, V = _svd_right(X)
316 else:
317 U, S, V = nla.svd(np.mat(X))
/home/conradlee/local/lib/python2.6/site-packages/mf/utils/linalg.pyc in _svd_right(X)
337 u_vec = err.eigenvectors
338 else:
--> 339 val, u_vec = sla.eigen_symmetric(XXt, k = X.shape[0] - 1)
340 else:
341 val, u_vec = nla.eigh(XXt.todense())
AttributeError: 'module' object has no attribute 'eigen_symmetric'
Hi there,
Just wondering if there is a way in nimfa to obtain the mixing matrix of unseen data?
Equivalent of transform
in sklearn.decomposition
?
Thanks!
Hi,
this is more of an FYI than an issue but I just wanted to let the maintainers know that I added nimfa to conda-forge so that it can be downloaded with conda.
The repo is here: https://github.com/conda-forge/nimfa-feedstock
Let me know if anyone of the maintainers here would like to be added as maintainers there or if I should make a PR here to add a badge or install instructions. Otherwise, feel free to just close the issue.
Thanks! ๐
Since there are multiple algorithms for operating NMF in https://ai.stanford.edu/~marinka/nimfa/ it would be good to do a speed and accuracy test.
Running the recommendations example doesn't work
import nimfa.examples
nimfa.examples.recommendations.run()
Read MovieLens data set
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-1-6947a2ae96c5> in <module>
1 import nimfa.examples
----> 2 nimfa.examples.recommendations.run()
~/.local/lib/python3.8/site-packages/nimfa/examples/recommendations.py in run()
65 """
66 for data_set in ['ua', 'ub']:
---> 67 V = read(data_set)
68 W, H = factorize(V)
69 rmse(W, H, data_set)
~/.local/lib/python3.8/site-packages/nimfa/examples/recommendations.py in read(data_set)
103 fname = join(dirname(dirname(abspath(__file__))), "datasets", "MovieLens", "%s.base" % data_set)
104 V = np.ones((943, 1682)) * 2.5
--> 105 for line in open(fname):
106 u, i, r, _ = list(map(int, line.split()))
107 V[u - 1, i - 1] = r
FileNotFoundError: [Errno 2] No such file or directory: '/home/michele/.local/lib/python3.8/site-packages/nimfa/datasets/MovieLens/ua.base'
Running the script directly doesn't work either
$ pwd
/home/michele/.local/lib/python3.8/site-packages/nimfa/examples
$ ls
all_aml.py documents.py __init__.py orl_images.py recommendations.py
cbcl_images.py gene_func_prediction.py medulloblastoma.py __pycache__ synthetic.py
$ python recommendations.py
Read MovieLens data set
Traceback (most recent call last):
File "recommendations.py", line 133, in <module>
run()
File "recommendations.py", line 67, in run
V = read(data_set)
File "recommendations.py", line 105, in read
for line in open(fname):
FileNotFoundError: [Errno 2] No such file or directory: '/home/michele/.local/lib/python3.8/site-packages/nimfa/datasets/MovieLens/ua.base'
I installed the library via pip,
$ pip install --user nimfa
$ python --version
Python 3.8.1
$ pip --version
pip 19.3 from /usr/lib/python3.8/site-packages/pip (python 3.8)
When running snmf() the following warning appears:
C:\Python27-64\lib\site-packages\numpy\matrixlib\defmatrix.py:318: VisibleDeprecationWarning: non integer (and non boolean) array-likes will not be accepted as indices in the future
out = N.ndarray.getitem(self, index)
I'm seeing warning
nimfa/methods/factorization/snmf.py:610: RuntimeWarning: invalid value encountered in power
np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))), p_set)
I see this happens when l_var is 64 exceeding int64 range. Shouldn't we use floating point here like
np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))).astype(np.float64)
I'm also seeing most of time is spent on flatten() function that can be hoisted out like
from
for i in range(len(i_f)):
alpha[i_f[i], j_f[i]] = t_d.todense().flatten()[0, i]
to
t_d_flattened = t_d.todense().flatten()
for i in range(len(i_f)):
alpha[i_f[i], j_f[i]] = t_d_flattened[0, i]
I'm using Python 3.6.8 and tried SepNMF. (I know that this is tested not on 3.6.8)
Then I got KeyError: <built-in function format>
by following code:
import nimfa
model = nimfa.SepNmf(X_uw) # lil_matrix
model.factorize()
I have checked the code and I wonder theformat
variable not defined? (of course, format
is build-in in Python 3.x). Is this a bug of SepNMF?
https://github.com/marinkaz/nimfa/blob/master/nimfa/methods/factorization/sepnmf.py#L300
Hi, I would like to apply NMF to a dataset with missing values and uncertainties. I know there exists this package: https://github.com/guangtunbenzhu/NonnegMFPy, but it hasn't been updated in a long time, so I am not sure if I trust it.
Nimfa should be able to do the same, right? However, I don't find anywhere in the documentation this application. Is it actually possible to do this with nimfa? Otherwise, I think I would need to hard code this.
Is it possible to obtain uncertainties for the H and/or W matrices?
Thanks in advanced
In 539f069 (see here) is not None
was introduced to replace implicit boolean use (i.e., not alpha
) on multiple models, but I believe the meaning was reversed and should be is None
instead. Currently leaving the defaults as None doesn't set them to a default and leaves them as None, throwing errors on down the line.
This is not the correct place for this question, but as this project has no mailing list, I don't know where else to post it.
I need to factor some rather large matrices (300,000 examples and 30,000 features). I am wondering what method will be able to do so in a reasonable amount of time.
As the author of this library, you must be familiar with a large range of NMF methods. In your experience, which of your implementations is quick while at the same not not making big sacrifices in terms of quality? I know that this question will depend on the dataset and on how I define "quality," but do you have any favorite methods for larger matrices?
Best regards,
Conrad
Often times scalar-only operations like math.sqrt
are passed to sop
. These are often then applied directly onto a numpy array as in
return op(X + eps, s) if s != None else op(X + eps)
Resulting in a type error because math.sqrt
doesn't know how to operate on numpy arrays.
It seems like either the calling function needs to pass np.sqrt
or sop
needs to be smart enough to select the numpy version if given a scalar version.
If you print the length of f_set in each loop, the value never decreases in this while loop:
https://github.com/marinkaz/MF/blob/master/nimfa/methods/factorization/snmf.py#L289
You can reproduce the issue by running the following code-
import scipy.sparse
import random
import nimfa
from time import time
import numpy
m1 = scipy.sparse.lil_matrix((10, 95))
for i in xrange(10):
for j in xrange(95):
if random.random() > 0.8: m1[i, j] = 1
m1 = scipy.sparse.csc_matrix(m1)
m1.sort_indices()
t = time()
fctr = nimfa.mf(m1,
seed = "random_vcol",
rank = 2,
method = "snmf",
max_iter = 15,
initialize_only = True,
version = 'r',
eta = 1.,
beta = 1e-4,
i_conv = 10,
w_min_change = 0)
print numpy.shape(m1)
a = nimfa.mf_run(fctr)
The code never completes. If a dense matrix is passed, it completes in less than a second.
Hi,
I was wondering if nimfa can be used in widget form within the latest Orange3 version? I cannot seem to find its add-on.
Regards,
Bernard
Negative evar in nsnmf an pmf methods (version 1.2.1), with huge rss;
One example for nsnmf:
import nimfa
import numpy as np
V = np.random.rand(40, 100)
nsnmf = nimfa.Nsnmf(V, seed="random", rank=10, max_iter=12, theta=0.5)
nsnmf_fit = nsnmf()
nsnmf_fit.fit.evar()
-859782115351077.0
nsnmf_fit.fit.rss()
1.1591715624174236e+18
One example for Pmf:
V = nimfa.examples.medulloblastoma.read(normalize=True)
pmf = nimfa.Pmf(V, seed='nndsvd', rank=10, max_iter=100)
pmf_fit = pmf()
pmf_fit.fit.evar()
-875.87260969308397
Hi!
The datasets (leukemia and medulloblastoma) located here are no longer available:
http://nimfa.biolab.si/nimfa.datasets.html
Is there a way to get them?
Thanks
The Snmf factorization method Snmf.factorize() does not handle transposes correctly across multiple runs.
In order to enforce sparseness on the left factor the method transposes self.V (lines 175-176) , fits the model and then back-transposes V and swaps W and H (lines 224-226). However, the initial transpose is done outside run loop which starts in line 178. Therefore, sparseness is not enforced on the same factor across runs, with odd runs being correct and even runs operating on the orignal self.V (self.V.T.T). Symptomatic for this, if nruns is even, W and H will be returned with the wrong dimensionality (i.e. swapped and transposed).
This should be easily fixable by moving lines 175-176 of snmf.py below line 178 to ensure a re-transpose of self.V before each new fit.
#36
Hi,
I am now trying to integrate nmf with ability to automatically detect the rank into my program. However, when I tried to get cophenetic to estimate the best rank, they are all 1. I saw another issue and learned that it wan ran only on one model so it's giving 1. However, I could not figure out how should I modify the code to make it work. I wonder if there is any template or tutorial for rank estimation?
Thank you
When I tried to test the example (Lsnmf) given by the official site of Nimfa, like:
import nimfa
V = nimfa.examples.medulloblastoma.read(normalize=True)
lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)
then I got the following error:
AttributeError: 'module' object has no attribute 'Lsnmf'
I guess the reason might be we need to give the whole path of Lsnmf like, nimfa.methods.factorization.lsnmf.Lsnmf.
But when I did this,
lsnmf = nimfa.methods.factorization.lsnmf.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100),
then it gave me:
TypeError: init() takes exactly 1 argument (5 given).
How can I solve this problem?
Hi, thanks for this package.
I read gillis2014 paper entitled "Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization" on IEEE PAMI. This article provides a spa
alike algorithm, it does not give xray
.
Where is this xray algorithm ? should the comment on sepnmf being updated.
Best.
Previous it is possible to use different nmf methods by nimfa.mf interface, like following:
import nimfa
import numpy as np
V = np.random.random((10000, 1000))
fctr = nimfa.mf(V, seed = 'random_vcol', method = 'lsnmf', rank = 40, max_iter = 50)
fctr_res = nimfa.mf_run(fctr)
Now it gives the error:
AttributeError: 'module' object has no attribute 'mf'
What is the proper way to use different method through a common interface ?
I just see seed options in doc but not how to reproduce the same result. Is there a good way?
I was just playing around with this awesome toolbox when I encountered an occasional IndexError in snmf.py
line 565:
t_d = D[l_1n, l_2n] / (D[l_1n, l_2n] - K[l_1n, l_2n])
I checked the code and found the problem in the if
-statement above:
if n_h_set == 1:
h_n = h_set * np.ones((1, len(j_f)))
l_1n = i_f
l_2n = map(int, h_n.tolist()[0])
else:
l_1n = i_f
l_2n = map(int, [h_set[e] for e in j_f])
In newer python versions a mapping can not be used as an index, hence indexing D
with l_2n
fails. A simple fix would be to put a list()
around the mapping.
Thanks!
The latest version on PyPI is still version 1.1, despite v1.2 having been released 8 months ago. Could you update the PyPI release so it is up to date?
Since the API has changed completely and we are developing code to work with the latest version, we need to require v1.2 as a dependency for a package we (@swkeemink) are creating.
Thanks!
I have tried both 1.1 and master (as of 2015-03-19) using Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32 and both produce "TypeError: super() takes at least 1 argument (0 given)" when running:
import nimfa
V = nimfa.examples.medulloblastoma.read(normalize=True)
lsnmf = nimfa.Lsnmf(V, seed='random_vcol', rank=50, max_iter=100)
Traceback (most recent call last):
File "", line 1, in
File "nimfa\methods\factorization\lsnmf.py", line 145, in init
super().init(vars())
TypeError: super() takes at least 1 argument (0 given)
Hi all,
Just started using your package (v1.4.0) and note the following warnings popping up from Numpy:
PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html). Please adjust your code to use regular ndarray.
One place for example is the call to np.asmatrix
here, but sure there are more.
Thanks
Hello,
I'm a new user of python and nimfa. Sorry to submit this as an issue, because it's not. I just need some help about what's possible to do with the library.
So basically my NMF problem to solve is that i have a single vector v. A learning matrix W already defined . So I want to compute the H vector without running updates on W. So only running updates on H using the squared euclidean distance or de kullback-leibler divergence minimisation methods.
Would that be possible with nimfa?
It would be nice if you could add a link (e.g., ADS or arXiv) to all the cited papers in the documentation.
Hello;
I hope you can provide me with a hint to figure this out.
Assume we already have successfully computed X ~= W1 * H1, using SNMF.
Now, say we have an X2 and we want to use the same W1 (as above, the pre-computed W1) to get a new H that can best use the existing W to estimate X2. How would one do this?
Do I just call:
snmf = nimfa.Snmf(X2 , ..., W=W1 ...)
fit = snmf()
H2 = fit.coef()
The new estimated X2 ~= W1 * H2
If my hunch is correct, then I think there is a problem, but I am not sure I am using NIMFA correctly, hence this question.
Thanks!
Regards;
When I try to use the purity function with a fitted nmf model, using a list of numbers as the membership list, I receive the following error:
nmf.purity(classes)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2878, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
nmf.purity(classes)
File "/usr/local/lib/python2.7/dist-packages/nimfa/models/nmf.py", line 388, in purity
[dmbs.setdefault(mbs[i], set()).add(i) for i in range(len(mbs))]
TypeError: unhashable type: 'matrix'
the nmf model was fitted using the Bd function. "classes" is a plain python list. The documentation asks for a list. Does it need something else? How can this be fixed?
Current RSS is calculated in the following lines:
V = self.target(idx)
X = self.residuals(idx = idx)
xX = V - (self.V - dot(self.W, self.H))
return multiply(xX, xX).sum()
Note that the function of residuals performs the following:
return self.V - dot(self.W, self.H)
So to create xX, you're subtracting the residuals from the original matrix, and the you return the sum of the square of xX. Are you sure this is the correct way to calculate the RSS? What is the purpose of xX? Isn't X already the residuals matrix? Shouldn't you just return the sum of the square of X? In other words:
V = self.target(idx)
X = self.residuals(idx = idx)
return multiply(X, X).sum()
I have not read anywhere how the RSS is to be calculated, so I might be completely wrong. I'm just going off the name :)
Dear all,
Is there a method to work with missing values?
I looked in the documentation and I have space data (lots of zeros) that represent missing data. Is there a method that considers 0 as missing data and creates a mask or uses imputation to discard the missing values from the cost function?
Hello Marinka Zitnik,
I have been using your nimfa
module for my research. While comparing results from various seeding
methods, I realized that the variant of the NNDSVD algorithm is not taken into effect by the flag
. So, for example, in nmf.py
,
self.W, self.H = self.seed.initialize(
self.V, self.rank, self.options)
should be like
self.W, self.H = self.seed.initialize(
self.V, self.rank, self.options['options'])
for the non-default NNDSVD algorithms, NNDSVDa or NNDSVDar, to be able to be in effect. As you already know, this applies to almost all the py
files in the factorization
directory.
Thank you,
yskangtamu
HI,
I am trying to use Nimfa to use the NMF. I have installed nimfa in the python interpreter and this is the part in my code.
import nimfa
nmf = nimfa.Nmf(tfidf, max_iter=200, rank=2, update='divergence', objective='div')
fit = nmf()
print("Residual sum of squares: ", fit.summary(None)['rss'])
However, I keep on getting this error
AttributeError: 'module' object has no attribute 'Nmf'
Am I doing something wrong?
Thank you
They guys over at scikit.learn also do NMF in python. They've implemented only the projected gradient method from Lin et al and the SVD seeding (as you have included in your package).
The scikit.learn project puts an emphasis on performance, which means they emphasize using vectorized numpy operations rather than higher-level python operations. For example, here you can see how they went about optimizing their NMF code. However your package is much more comprehensive than theirs.
If you are interested in getting more people to use your code as well as getting some good criticism and suggestions, I would suggest integrating your work into scikit.learn. I would be willing to help out with porting some of your documentation, but I am not very familiar with numpy.
When the 'l' version of SNFM is chosen and initilisation matrices are provided the factorize method does not swap and transpose initial W and H as it says it should in lines 176-177, it instead just keeps them in their original state which results in a error in line 197 when the matrices can't be broadcasted together.
I get weirdly different Cophenetic Correlation numbers,
when I go with model estimation, and ask for
model = nimfa.mf(V, method = "nmf", rank = 3)
est_rank = model.estimate_rank(range=xrange(2,6),n_run=2=
cophenetic_list = [est_rank[item]['cophenetic'] for item in est_rank]
the set of cophenetic correlation scores are different from low_memory version
model = nimfa.mf(V, method = "nmf", rank = 3)
sd = fit.summary()
cophenetic_list.append(sd['cophenetic'])
in the latest version (single mf run) ALL cophenetic correlations are weirdly equal to 1!
I have been playing with the implementation of PMFCC in nimfa and noticed that any arbitrary input for the theta matrix is accepted (and presumably used?) by nimfa. It doesn't seem to throw an error, even if I use a theta with the incorrect dimensions. Surprisingly, it seems that each distinct choice of theta, regardless of whether it has the appropriate dimensions or not, gives different outputs.
Is this an issue?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.