lyst / lightfm Goto Github PK
View Code? Open in Web Editor NEWA Python implementation of LightFM, a hybrid recommendation algorithm.
License: Apache License 2.0
A Python implementation of LightFM, a hybrid recommendation algorithm.
License: Apache License 2.0
I'm reading the code of warp in test_movielens , get one problem about the input, why make all values > 4.0 to 1s and <= 4.0 to -1s?
Hi,
When trying to compile the software on Mac OS X (Yosemite) with GCC installed with brew I get the following error:
14:35 $ python setup.py install
...
building 'lightfm.lightfm_fast' extension
gcc-ranlib-4.9 -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/e/Anaconda/anaconda/envs/py3k/include -arch x86_64 -I/Users/e/Anaconda/anaconda/envs/py3k/include/python3.3m -c lightfm/lightfm_fast.c -o build/temp.macosx-10.5-x86_64-3.3/lightfm/lightfm_fast.o -fopenmp -march=native -ffast-math
gcc-ranlib-4.9: Cannot find plugin 'liblto_plugin.so'
error: command 'gcc-ranlib-4.9' failed with exit status 1
I cannot find any liblto_plugin.so or liblto_plugin.dlyb on my system.
I tried to brew install llvm
and conda install llvmpy
but with no success :(
Any hint on how to fix that ?
Thank you for maintaining this project !
I'm just learning about recommendation system. After reading the example in the docs, I didn't find where to add user_features
in the model?
Am I missing anything? e.g. for movielens dataset, I'd like to add occupation and sex as user_feature.
Forgive me for being a newbie :)
I have seen on youtube your excellent presentation on hybrid recommender systems, i.e. lightfm, in Amsterdam recently. I have a couple of questions:
When running this on the (almost) unmodified jupyter/all-spark-notebook docker image, when attempting to run Python I get the following issue,
import lightfm
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.4/site-packages/lightfm/init.py", line 1, in
from .lightfm import LightFM
File "/opt/conda/lib/python3.4/site-packages/lightfm/lightfm.py", line 7, in
from .lightfm_fast import (CSRMatrix, FastLightFM,
ImportError: /opt/conda/lib/python3.4/site-packages/lightfm/lightfm_fast.cpython-34m.so:
undefined symbol: GOMP_parallel
Things I have tried:
lightfm.py
and moving all of the dependencies onto one line like so:from .lightfm_fast import (CSRMatrix, FastLightFM,
fit_logistic, predict_lightfm,
fit_warp, fit_bpr, fit_warp_kos)
to
from .lightfm_fast import (CSRMatrix, FastLightFM, fit_logistic, predict_lightfm, fit_warp, fit_bpr, fit_warp_kos).
Same error.
But I think if you just pull the docker image and do a pip install lightfm it should replicate the error precisely.
It seems that BPR and Logistic loss functions are broken (at least on Windows), or maybe I'm doing something terribly wrong.
It seems that BPR and Logistic always return the same predictions for every user (not exactly the same scores, but if you sort them, you get the same ranked list).
I'm using one of the MovieLens notebooks for this example (https://github.com/lyst/lightfm/blob/master/examples/movielens/warp_loss.ipynb).
I also tested this with my own datasets, and got the same results.
Apologies if this is user error, but I appear to be getting Nan embeddings from LightFM and I'm unsure what I could have done wrong. I followed the documentation, and have raise the issue on SO.
Basically I have a large data-set where collaborative filtering is working fine, but where user / item embeddings are provided, the model produces nan embeddings.
http://stackoverflow.com/questions/40967226/lightfm-user-item-producing-nan-embeddings
Following the issue #101, I tried running the test as per the Docs:
docker-compose run lightfm py.test -x tests/
And then I ran it from the Docker shell (see below). In both cases, the test fails to find lightfm_fast
root@002ca3eaa9c1:/home/lightfm# py.test -x tests/
================================================== test session starts ==================================================
platform linux2 -- Python 2.7.9, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /home/lightfm, inifile:
collecting 0 items / 1 errors
======================================================== ERRORS =========================================================
__________________________________________ ERROR collecting tests/test_api.py ___________________________________________
tests/test_api.py:7: in <module>
from lightfm import LightFM
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:9: in <module>
from .lightfm_fast import (CSRMatrix, FastLightFM,
E ImportError: No module named lightfm_fast
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================ 1 error in 0.30 seconds ================================================
max_sampled
is the maximum number of negative example to be chosen when using WARP (how about BPR?).
Which parameter controls the actual number of negative samples? Is it max_sampled
, in a best effort kind of way?
The reason I'm asking is that k
and n
do not apply to WARP, per documentation.
Thank you,
I installed lightFM on ubuntu via pip, but the training does not seem to use multithreading. How do I enable multithreading without docker?
I use pip install lightfm
error is
running build_ext
building 'lightfm._lightfm_fast_openmp' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/lightfm
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c lightfm/_lightfm_fast_openmp.c -o build/temp.linux-x86_64-2.7/lightfm/_lightfm_fast_openmp.o -ffast-math -march=native -fopenmp
lightfm/_lightfm_fast_openmp.c:15:20: fatal error: Python.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Often arrays are created into different types than what is required by the lightfm c code. this can result in errors like:
File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 194, in fit verbose=verbose) File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 254, in fit_partial self.loss) File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 289, in _run_epoch CSRMatrix(self._get_positives_lookup_matrix(interactions)), File "lightfm/lightfm_fast.pyx", line 91, in lightfm.lightfm_fast.CSRMatrix.__init__ (lightfm/lightfm_fast.c:1966) ValueError: Buffer dtype mismatch, expected 'int' but got 'long long'
input matricies can automatically be converted to int32
It doesn't seem to work for OS X El Capitan at the moment:
$ sudo python setup.py install
Password:
running install
Checking .pth file support in /Library/Python/2.7/site-packages/
/usr/bin/python -E -c pass
TEST PASSED: /Library/Python/2.7/site-packages/ appears to support .pth files
running bdist_egg
running egg_info
writing pbr to lightfm.egg-info/pbr.json
writing requirements to lightfm.egg-info/requires.txt
writing lightfm.egg-info/PKG-INFO
writing top-level names to lightfm.egg-info/top_level.txt
writing dependency_links to lightfm.egg-info/dependency_links.txt
reading manifest file 'lightfm.egg-info/SOURCES.txt'
writing manifest file 'lightfm.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.11-intel/egg
running install_lib
running build_py
running build_ext
building 'lightfm.lightfm_fast' extension
gcc-5 -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c lightfm/lightfm_fast.c -o build/temp.macosx-10.11-intel-2.7/lightfm/lightfm_fast.o -fopenmp -ffast-math -march=native
gcc-5: error: unrecognized command line option '-Wshorten-64-to-32'
error: command 'gcc-5' failed with exit status 1
Will update if I find a way to make it work ๐
Hey everyone,
thanks a lot for maintaining this project!
Do you also have some examples how to recommend items without having data about users and ratings?
I'd like to recommend similar eBooks based on their content.
Best Regards
predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1)
is giving me some prediction value. I am trying to set a range for the prediction ( ex. 0 to 1)
I am not able to set that range anywhere.
Unable to test environment using Python 3.5
The error log is given by:
==================================================================== ERRORS ====================================================================
______________________________________________________ ERROR collecting tests/test_api.py ______________________________________________________
tests/test_api.py:7: in <module>
from lightfm import LightFM
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
from ._lightfm_fast_no_openmp import * # NOQA
E ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
___________________________________________________ ERROR collecting tests/test_datasets.py ____________________________________________________
tests/test_datasets.py:7: in <module>
from lightfm.datasets import fetch_movielens, fetch_stackexchange
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
from ._lightfm_fast_no_openmp import * # NOQA
E ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
__________________________________________________ ERROR collecting tests/test_evaluation.py ___________________________________________________
tests/test_evaluation.py:7: in <module>
from lightfm import LightFM, evaluation
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
from ._lightfm_fast_no_openmp import * # NOQA
E ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
________________________________________________ ERROR collecting tests/test_fast_functions.py _________________________________________________
tests/test_fast_functions.py:6: in <module>
from lightfm import _lightfm_fast
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
from ._lightfm_fast_no_openmp import * # NOQA
E ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
___________________________________________________ ERROR collecting tests/test_movielens.py ___________________________________________________
tests/test_movielens.py:12: in <module>
from lightfm import LightFM
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
from ._lightfm_fast_no_openmp import * # NOQA
E ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
=========================================================== 5 error in 0.71 seconds ============================================================
I tried to modify some of the cython parts of the code, and I have issues at installing it afterwards.
I tried re-installing the original version of lightfm (just cloned from the repo) with the same procedure. The problem is when I run:
python setup.py cythonize
I obtain the error message:
running cythonize
Traceback (most recent call last):
File "setup.py", line 152, in
ext_modules=define_extensions(use_openmp)
File "/home/paolo/anaconda3/lib/python3.4/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/paolo/anaconda3/lib/python3.4/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/paolo/anaconda3/lib/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 81, in run
self.generate_pyx()
File "setup.py", line 74, in generate_pyx
fl.write(template.format(**template_params))
AttributeError: 'bytes' object has no attribute 'format'
I suspect there is a problem with the version of Python I am using (from a quick googling it appears that bytes have the attribute format starting in Python 3.5). Is python 3.5 really necessary?
File "/private/tmp/pip_build_root/lightfm/setup.py", line 35, in set_gcc
raise Exception('No GCC available. Install gcc from Homebrew '
Exception: No GCC available. Install gcc from Homebrew using brew install gcc.
However gcc is installed via macports:
[terminal]: port installed | grep gcc
gcc48 @4.8.5_0 (active)
macports typically installs under /opt/local/. [1]
If you want to support this, I guess you could simply add to line 28 in setup.py?
[1] https://guide.macports.org/chunked/installing.macports.html
I have a model of (30k, 1m) user/item
when I use predict_rank to predict all the ranks of a single user, it gets slow:
interactions = csr_matrix(
([1] * item_size, ([user_id] * item_size, item_ids)),
shape=(user_size, item_size),
dtype=np.float32)
# this takes very long
model.predict_rank(interactions)
# very fast
model.predict(user_id, item_ids)
isn't the ranks are just the recommended scores' order?
I guess predict_rank do predict for every interaction๏ผ
thanks,
Hello,
the question here is:
How to deal with different users in training and test set?
In the example on the hybrid model for question recommendations to users
(https://github.com/lyst/lightfm/blob/master/examples/stackexchange/hybrid_crossvalidated.ipynb)
the model recommends the highest ranked question (item) to the user not answered by this user yet.
What about this different setting:
Each user only interacts with one item.
Training is done with these interactions
At test time, there is a new user.
The aim is to recommend the highest ranked item for this unknown user.
This means that the aim of this setting is to deal with different users in training and test set but always the same set of items.
Is this possible? How can this be achieved?
It would be appreciated very much if you could give an example or point to an existing example where this problem is tackled.
Thank you!
LightFM borrows from the Scikit-Learn interface but is not fully compatible to a BaseEstimator
. In order to use useful Scikit-Learn functionality like grid_search.RandomizedSearchCV
being compatible to Scikit-Learn would be beneficial to optimize hyper-parameters of LightFM models.
Added PR #107 to resolve this issue.
docker-compose build lightfm
Produces this on my Mac OS X:
File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 966, in add_subpackage caller_level = 2)
File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 935, in get_subpackage caller_level = caller_level + 1)
File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 872, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "scipy/linalg/setup.py", line 20, in configuration
raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found
----------------------------------------
Can't roll back scipy; was not uninstalled
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-XnyZzK/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-cBi7ka-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-XnyZzK/scipy
Storing debug log for failure in /root/.pip/pip.log
ERROR: Service 'lightfm' failed to build: The command '/bin/sh -c pip install .' returned a non-zero code: 1
I'm new to Docker but see that it is SciPy complaining about missing lapack/blas
Looks like this is an issue with the gcc compiler on 64-bit architectures, in particular the Wshorten-64-to-32 option.
Here's the error:
gcc-4.9: error: unrecognized command line option '-Wshorten-64-to-32'
error: command 'gcc-4.9' failed with exit status 1
My machine has a relatively new version of gcc, the one available from brew:
brew upgrade gcc
Error: gcc 4.9.2_1 already installed
Any ideas?
I'm using precision_at_k from lightFM library with the code below.
lightfm_model = LightFM(loss="bpr", no_components=25, random_state=0)
lightfm_model.fit(interactions=coo_matrix((np.ones(len(training)), (training["user"], training["product"]))), num_threads=20)
print lightfm_model.user_embeddings.shape
print lightfm_model.item_embeddings.shape
print (precision_at_k(model=lightfm_model,
train_interactions=coo_matrix((np.ones(len(training)), (training["user"], training["product"]))),
test_interactions=coo_matrix((np.ones(len(test)), (test["user"], test["product"]))),
k=5,
user_features=None,
item_features=None) )
the size of user and item embeddings is correct (25 for both) but I'm getting:
assert self.user_embeddings.shape[0] >= user_features.shape[1]
AssertionError
Notice that I'm not using user/item features.
Very cool library!
Is it the case that LightFM only supports binary classification? Can't I use for multinomial classification?
Thanks!
Kindly tell how to solve the 1 Failed File
The logfile for the code is :
=================================================================== FAILURES ===================================================================
______________________________________________________ test_basic_fetching_stackexchange _______________________________________________________
def test_basic_fetching_stackexchange():
test_fractions = (0.2, 0.5, 0.6)
for test_fraction in test_fractions:
data = fetch_stackexchange('crossvalidated',
min_training_interactions=0,
test_set_fraction=test_fraction)
train = data['train']
test = data['test']
assert isinstance(train, sp.coo_matrix)
assert isinstance(test, sp.coo_matrix)
assert train.shape == test.shape
frac = float(test.getnnz()) / (train.getnnz() + test.getnnz())
assert abs(frac - test_fraction) < 0.01
for dataset in ('crossvalidated', 'stackoverflow'):
data = fetch_stackexchange(dataset,
min_training_interactions=0,
> indicator_features=True, tag_features=False)
tests/test_datasets.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
lightfm/datasets/stackexchange.py:88: in fetch_stackexchange
data = np.load(path)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:399: in load
pickle_kwargs=pickle_kwargs)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:162: in __init__
_zip = zipfile_factory(fid)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:92: in zipfile_factory
return zipfile.ZipFile(*args, **kwargs)
/usr/lib/python2.7/zipfile.py:770: in __init__
self._RealGetContents()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <zipfile.ZipFile object at 0x7f8214b07c90>
def _RealGetContents(self):
"""Read in the table of contents for the ZIP file."""
fp = self.fp
try:
endrec = _EndRecData(fp)
except IOError:
raise BadZipfile("File is not a zip file")
if not endrec:
> raise BadZipfile, "File is not a zip file"
E BadZipfile: File is not a zip file
/usr/lib/python2.7/zipfile.py:811: BadZipfile
===================================================== 1 failed, 49 passed in 68.95 seconds =====================================================
Hi was wondering if we could use rand_s when compiling under windows because compiling as-is does not work. Compiling by replacing rand_r by rand_s does work instead.
Considering the following:
Could you extend LightFM to have an ANN ordering (preferably hsnw)? Give more guidance and details on how go about in getting recommendations out of LightFM. That will seriously benefit most users of this great algorithm you made.
I realized that lightfm only support binary user/item feature according to the paper titled "Metadata Embeddings for User and Item Cold-start Recommendations". Is that true and is it possible to support float features?
thanks,
jianyi
Hi, i install lightfm on my laptop Python enviroment : Windows 10, Python 3.4 64bit
But i had this error :
E:\Download IDM\Compressed\lightfm-1.9.1\lightfm-1.9.1>python setup.py install
Compiling without OpenMP support.
running install
running bdist_egg
running egg_info
writing top-level names to lightfm.egg-info\top_level.txt
writing lightfm.egg-info\PKG-INFO
writing dependency_links to lightfm.egg-info\dependency_links.txt
writing requirements to lightfm.egg-info\requires.txt
reading manifest file 'lightfm.egg-info\SOURCES.txt'
writing manifest file 'lightfm.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build
creating build\lib.win-amd64-3.4
creating build\lib.win-amd64-3.4\lightfm
copying lightfm\evaluation.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\lightfm.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\_lightfm_fast.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\__init__.py -> build\lib.win-amd64-3.4\lightfm
creating build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\movielens.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\stackexchange.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\_common.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\__init__.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\_lightfm_fast_no_openmp.c -> build\lib.win-amd64-3.4\lightfm
copying lightfm\_lightfm_fast_openmp.c -> build\lib.win-amd64-3.4\lightfm
running build_ext
building 'lightfm._lightfm_fast_no_openmp' extension
Traceback (most recent call last):
File "setup.py", line 154, in <module>
ext_modules=define_extensions(use_openmp)
File "C:\Python34\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "C:\Python34\lib\distutils\dist.py", line 955, in run_commands
self.run_command(cmd)
File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Python34\lib\site-packages\setuptools\command\install.py", line 67, in run
self.do_egg_install()
File "C:\Python34\lib\site-packages\setuptools\command\install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Python34\lib\site-packages\setuptools\command\bdist_egg.py", line 161, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\Python34\lib\site-packages\setuptools\command\bdist_egg.py", line 147, in call_command
self.run_command(cmdname)
File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Python34\lib\site-packages\setuptools\command\install_lib.py", line 10, in run
self.build()
File "C:\Python34\lib\distutils\command\install_lib.py", line 107, in build
self.run_command('build_ext')
File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
cmd_obj.run()
File "C:\Python34\lib\site-packages\setuptools\command\build_ext.py", line 66, in run
_build_ext.run(self)
File "C:\Python34\lib\distutils\command\build_ext.py", line 339, in run
self.build_extensions()
File "C:\Python34\lib\distutils\command\build_ext.py", line 448, in build_extensions
self.build_extension(ext)
File "C:\Python34\lib\site-packages\setuptools\command\build_ext.py", line 178, in build_extension
_build_ext.build_extension(self, ext)
File "C:\Python34\lib\distutils\command\build_ext.py", line 503, in build_extension
depends=ext.depends)
File "C:\Python34\lib\distutils\msvc9compiler.py", line 460, in compile
self.initialize()
File "C:\Python34\lib\distutils\msvc9compiler.py", line 371, in initialize
vc_env = query_vcvarsall(VERSION, plat_spec)
File "C:\Python34\lib\site-packages\setuptools\msvc9_support.py", line 52, in query_vcvarsall
return unpatched['query_vcvarsall'](version, *args, **kwargs)
File "C:\Python34\lib\distutils\msvc9compiler.py", line 287, in query_vcvarsall
raise ValueError(str(list(result.keys())))
ValueError: ['path']
How i can fix that error, i tried this : #29
But still error.
Hi,
I have written a quick ipython notebook example where I used lightFM in the Kaggle RECRUIT Ponpare competition.
The goal of the competition was to predict user purchase on a specific week given a year of training data (log of purchase and visit).
This naturally fits into the lightFM hybrid recommendation paradigm : implicit feedback data with user and item metadata
If that fits you, I can create a new pull request to add it to your example section.
Cheers
When running the crossvalided example notebook under Python 3.5 I get when executing:
import data
(interactions, question_features,
user_features, question_vectorizer,
user_vectorizer) = data.read_data() # This will download the data if not present
following error:
AttributeError Traceback (most recent call last)
<ipython-input-1-0c850630fc9e> in <module>()
3 (interactions, question_features,
4 user_features, question_vectorizer,
----> 5 user_vectorizer) = data.read_data() # This will download the data if not present
/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in read_data()
201
202 user_about = {}
--> 203 for (user_id, about) in _read_raw_user_data():
204
205 if user_id == -1:
/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in _read_raw_user_data()
148 about_me = datum.get('AboutMe', '')
149
--> 150 yield int(user_id), _process_about(about_me)
151
152 except etree.XMLSyntaxError:
/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in _process_about(about)
106 def _process_about(about):
107
--> 108 clean_about = (strip_tags(about)
109 .replace('\n', ' ')
110 .lower())
/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in strip_tags(html)
37 def strip_tags(html):
38 s = MLStripper()
---> 39 s.feed(html)
40 return s.get_data()
41
/home/fwilhelm/.miniconda3/lib/python3.5/html/parser.py in feed(self, data)
109 """
110 self.rawdata = self.rawdata + data
--> 111 self.goahead(0)
112
113 def close(self):
/home/fwilhelm/.miniconda3/lib/python3.5/html/parser.py in goahead(self, end)
137 n = len(rawdata)
138 while i < n:
--> 139 if self.convert_charrefs and not self.cdata_elem:
140 j = rawdata.find('<', i)
141 if j < 0:
AttributeError: 'MLStripper' object has no attribute 'convert_charrefs'```
Should lightfm better be run under Python 2.7?
Hi, sorry for the stupid question:
Can this model predict top users for an item by design?
the predict & predict_rank method seem to be designed for user not item
if I do:
prediction = model.predict(all_user_ids, np.repeat(item_id, user_size))
# then sort the prediction
Is the top results reliable?
thank you
I checked the documentation and couldn't find any info on this. Is it possible to serialize a model to disk? Would be useful when training takes multiple hours.
Hi!
I'm getting an strange behavior. If I train (fit
) a model with item_features
and a L2 penalty on the item features (item_alpha
) equal to 0.05
, and then obtain all the item predictions for a user, all that predictions have the same value.
Also, if I obtain all the item predictions for another user (with the same model), again all the predictions have the same value, but different from the previous user.
Also, the auc_score
of that model is 1.0
. ๐
Hello,
I just started playing around with lightFM, so this can be something that I just missed while trying the Movielens example.
Here is the code that I did following the tutorial available on this link
import numpy as np
from lightfm.datasets import fetch_movielens # movielens dataset
from lightfm.evaluation import * # precision_at_k
from lightfm import *
movielens = fetch_movielens()
train = movielens['train']
test = movielens['test']
model = LightFM(learning_rate=0.05, loss='bpr')
model.fit(train, epochs=10)
train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10).mean()
train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))
I'm getting this error as output:
Traceback (most recent call last):
File "main.py", line 17, in <module>
train_precision = precision_at_k(model, train, k=10).mean()
File "/Library/Python/2.7/site-packages/lightfm/evaluation.py", line 70, in precision_at_k
precision = precision[test_interactions.getnnz(axis=1) > 0]
TypeError: getnnz() got an unexpected keyword argument 'axis'
I'm fairly thinking that it's a silly mistake from my code; but because I couldn't figure it out, I'm opening this issue.
Thanks!
Quite often more recent interactions of users with items carry more information than long past interactions. For instance the fashion taste of a hipster may change more towards rockabilly over time which will be reflected by his/her interactions. Having the possibility to weight interactions differently, more recent ones higher than older ones, would allow us to model these effects.
Using the WARP model gives me right now a segmentation fault under 64bit Linux using an environment with conda (4.0.5), conda-env (2.4.5), lightfm (1.9), numpy (1.11.2), pandas (0.18.0), scipy (0.18.1). The attached script outputs:
Shape: (125288, 792409)
NNZ: 5013789
Segmentation fault (core dumped)
Reducing the shape to (100000, 10000) works. Also changing the model to logistic works too. Therefore I would say there is a strong indication that it is related to the WARP cython code itself.
My laptop has 32GB RAM so memory size should not be a problem of this segmentation fault.
import numpy as np
import scipy as sp
from scipy import stats
import pandas as pd
from lightfm import LightFM
from lightfm.evaluation import precision_at_k
from coo_matrix import *
M = load_coo('./intrct_mat.npz')
# Uncommenting these makes it work
#M = M.tocsr()
#M = M[:100000,:10000].tocoo()
print('Shape: {}'.format(M.shape))
print('NNZ: {}'.format(M.nnz))
model = LightFM(loss='warp') # using 'logistic' here makes it work
model.fit(M, epochs=30, num_threads=1)
Hi Maciej,
I haven't used C before, I am trying to understand how to read the _lightfm_fast_openmap.c file, since I am interested to see how you train the models via WARP, BPR and logistic. I am interested to adapt the algorithms for mutliple domains.
It's always good scientific practice to also give references to other freely available recommender systems and maybe a small matrix comparing their main features. This has several benefits:
I think the main alternatives are Mahout, PyFM and LensKit. All of them seem to provide only Collaborative Filtering. Interesting features would be: scalability/runtime performance, language bindings, algorithms/methods.
I know that this issue could be quite work intense but it would be nice to have in order to convince people to use LightFM ;-)
I'm trying to fit a very basic instance of LightFM
model = LightFM(loss='warp')
model = model.fit(df, user_features=user_features)
Where it can be noted that df.shape = (222113, 2269), and user_features.shape = (222113, 24). The model object is able to be fitted, but when I subsequently try to call the auc_score
method, I get the following error.
Traceback (most recent call last):
File "main.py", line 35, in <module>
model = engine.fit(train, test, user_features=user_feat)
File "/path/to/dir/lightfm-engine/engine.py", line 23, in fit
train_auc = auc_score(model, train, num_threads=self.NUM_THREADS).mean()
File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/evaluation.py", line 118, in auc_score
num_threads=num_threads)
File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/lightfm.py", line 649, in predict_rank
item_features)
File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/lightfm.py", line 235, in _construct_feature_matrices
assert self.user_embeddings.shape[0] >= user_features.shape[1]
AssertionError
So I thought, maybe the shapes of the two sparse matrices are different but as it turns out:
model.user_embeddings.shape[0] = 24
user_features.shape[1] = 24
Not really sure what's wrong, because the boolean value of the assert statement should evaluate to True in this case.
d:\Recommender systems\code>conda install lightfm
Fetching package metadata .........
Solving package specifications: .
PackageNotFoundError: Package not found: '' Package missing in current win-64 ch
annels:
You can search for packages on anaconda.org with
anaconda search -t conda lightfm
In the get_movielens_item_metadata() function (lines 114 to 123)
=> the item id are added to genres but the genre index may overlap with the movie_id
Example : if movie has movie_id 3 but also genre 3, there is no distinction between genre 3 and movie_id3
Is this intended ?
I trained a model lfm
with logistic loss.
lfm = lightfm.LightFM(no_components=1, loss="logistic")
lfm.fit(sparse_positive, epochs=100)
lfm.predict(students, items)
Returns:
array([ 7.49537325, 8.20262432, 7.60994577, ..., 4.16664028,
5.1302681 , 3.18788409])
Shouldn't the output be bounded [0..1]?
python27.lib(python27.dll) : fatal error LNK1112: module machine type 'x64'
conflicts with target machine type 'X86'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bi
n\link.exe' failed with exit status 1112
----------------------------------------
Command "c:\users\sanderoct27\anaconda2\python.exe -u -c "import setuptools, tok
enize;file='c:\users\sander1\appdata\local\temp\pip-build-d1jvqd\lig1\appdata\local\temp\pip-rqvifs-record\install-record.txt --si
htfm\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replac
e('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --recor
d c:\users\sander
ngle-version-externally-managed --compile" failed with error code 1 in c:\users
sander~1\appdata\local\temp\pip-build-d1jvqd\lightfm\
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
#Fetch data and format it
data = fetch_movielens(min_rating=4.0)
#Print training and testing data
print(repr(data['train']))
print(repr(data['test']))
The error I got for running this Python code was this:
/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
warnings.warn('LightFM was compiled without OpenMP support. '
Traceback (most recent call last):
File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py", line 3, in <module>
from ._lightfm_fast_openmp import * # NOQA
ImportError: /home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast_openmp.cpython-35m-x86_64-linux-gnu.so: undefined symbol: GOMP_parallel
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "recommendation_system.py", line 2, in <module>
from lightfm.datasets import fetch_movielens
File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/__init__.py", line 1, in <module>
from .lightfm import LightFM
File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/lightfm.py", line 7, in <module>
from ._lightfm_fast import (CSRMatrix, FastLightFM,
File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py", line 12, in <module>
from ._lightfm_fast_no_openmp import * # NOQA
ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
I installed with pip as per instructions (under Ubuntu 16).
$ python setup.py test
running test
running egg_info
writing requirements to lightfm.egg-info/requires.txt
writing lightfm.egg-info/PKG-INFO
writing top-level names to lightfm.egg-info/top_level.txt
writing dependency_links to lightfm.egg-info/dependency_links.txt
reading manifest file 'lightfm.egg-info/SOURCES.txt'
writing manifest file 'lightfm.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-2.7/lightfm/_lightfm_fast_openmp.so -> lightfm
==================================================================================================================== test session starts ====================================================================================================================
platform linux2 -- Python 2.7.12, pytest-2.8.7, py-1.4.31, pluggy-0.3.1
collected 49 items
tests/test_api.py ...........
tests/test_datasets.py ..
tests/test_evaluation.py ...
tests/test_fast_functions.py .
tests/test_movielens.py ................................
================================================================================================================ 49 passed in 114.88 seco
I'm seeing one python job using 100% of the CPU when there should be 20 threads running.
if type in python
data = fetch_movielens(min_rating=4.0)
I get an error like this
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Aryan/anaconda/lib/python2.7/site-packages/lightfm/datasets/movielens.py", line 169, in fetch_movielens
item_metadata_raw, genres_raw) = _read_raw_data(zip_path)
File "/Users/Aryan/anaconda/lib/python2.7/site-packages/lightfm/datasets/movielens.py", line 20, in _read_raw_data
with zipfile.ZipFile(path) as datafile:
File "/Users/Aryan/anaconda/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents()
File "/Users/Aryan/anaconda/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
I have imported these lines above
from lightfm import LightFM
from lightfm.datasets import fetch_movielens
When evaluating implicit feedbacks (using average mean precision@k for example or even AUC), one needs to predict all users against all items. That is done by looping over the interactions matrix row by row and invoking model.predict()
That gets prohibitively time expensive if faced with a 300K x 20MM matrix. (I estimated ~5days on a 8core server using a simple MF model w/ no content features)
Cython might be able to cut this time by 30%. Sharding the user base and distributing on different servers would reduce that time by another let's say ratio of 10. We would still be looking at 9hours.
What is the way around this? Increasing memory to be able to pass a dense interaction matrix with all ones to model.predict in one or two calls?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.