lyst / lightfm Goto Github PK

View Code? Open in Web Editor NEW

4.7K 4.7K 679.0 7.9 MB

A Python implementation of LightFM, a hybrid recommendation algorithm.

License: Apache License 2.0

Python 99.09% Makefile 0.62% Dockerfile 0.28%

learning-to-rank machine-learning matrix-factorization python recommender recommender-system

lightfm's People

Contributors

Stargazers

Watchers

Forkers

maciejkula wubinzzu lenovor alvations stephanesbizzera xingwudao geoion igara432 wavelets xuq rock999 oddskool rv816 ml-ai-nlp-ir wkryst soorajmr infinex fdoperezi evuez disc5 liuchenxjtu kod3r wubr2000 amoliu dds-dong lqleeqee ogrisel denmoroz birchbox nomadotto gratefulbuaa chenjz xiaop1987 lijiankou rwzhao snazz2001 vaibhavbehl florianwilhelm kentshikama knaveofdiamonds juntakagi ericxsun ty01csbaidu xlpe musicformellons paolorais pippobaudos hbudyanto kevinlai88 sandy4321 zbxzc35 pikshor shashankg7 horace89 leezqcst jinyu0310 dongzhixiang kaposztastomi geographicags arogers1 priestd09 qqgeogor justinreboullot hhh920406 metrodatateam hputteti dreadlord1984 saurav111 iamjayakumars sunkanggao zzzrbx alfredgao sandragreiss piyushgu kstseng burakkalac aporia3517 harshanimmagadda44 rithie ltoscano shravankumar147 adaaouak alonazrael zeroows pilotbear forestdengtech hotdrink7 goodbyecaptain zankbennett laszlosragner calebpro boluoyu ineeraj95 felixmaximilian giantstonex scpei solertis javier-ag dle481 brahmaslee

lightfm's Issues

About build interaction matrix

I'm reading the code of warp in test_movielens , get one problem about the input, why make all values > 4.0 to 1s and <= 4.0 to -1s?

Cannot find plugin 'liblto_plugin.so'

Hi,

When trying to compile the software on Mac OS X (Yosemite) with GCC installed with brew I get the following error:

14:35 $ python setup.py install
...
building 'lightfm.lightfm_fast' extension
gcc-ranlib-4.9 -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/e/Anaconda/anaconda/envs/py3k/include -arch x86_64 -I/Users/e/Anaconda/anaconda/envs/py3k/include/python3.3m -c lightfm/lightfm_fast.c -o build/temp.macosx-10.5-x86_64-3.3/lightfm/lightfm_fast.o -fopenmp -march=native -ffast-math
gcc-ranlib-4.9: Cannot find plugin 'liblto_plugin.so'
error: command 'gcc-ranlib-4.9' failed with exit status 1

I cannot find any liblto_plugin.so or liblto_plugin.dlyb on my system.

I tried to brew install llvm and conda install llvmpy but with no success :(

Any hint on how to fix that ?

How to incorporate `user_features` in the movielens example?

Thank you for maintaining this project !

I'm just learning about recommendation system. After reading the example in the docs, I didn't find where to add user_features in the model?

Am I missing anything? e.g. for movielens dataset, I'd like to add occupation and sex as user_feature.
Forgive me for being a newbie :)

Use of database to scale; and comparison vs predictionio

I have seen on youtube your excellent presentation on hybrid recommender systems, i.e. lightfm, in Amsterdam recently. I have a couple of questions:

You mention - when asked about a 'typical website with recommendation that needs to scale' - that you use lightfm with postgres and that Nearest Neighbour is done with C programming against postgres. My questions: This sounds not like very feasible for me to setup...; is that the only way to have this scale in production... any comments maybe? What exactly is programmed in C?
A popular open source recommender is predictionio. It also seems to support 'hybrid recommenders'. Could you tell what sets lightfm apart/ differences between lightfm and predictionio?

Could not find symbol

When running this on the (almost) unmodified jupyter/all-spark-notebook docker image, when attempting to run Python I get the following issue,

import lightfm
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.4/site-packages/lightfm/init.py", line 1, in
from .lightfm import LightFM
File "/opt/conda/lib/python3.4/site-packages/lightfm/lightfm.py", line 7, in
from .lightfm_fast import (CSRMatrix, FastLightFM,
ImportError: /opt/conda/lib/python3.4/site-packages/lightfm/lightfm_fast.cpython-34m.so:
undefined symbol: GOMP_parallel

Things I have tried:

Going to the file lightfm.py and moving all of the dependencies onto one line like so:

from .lightfm_fast import (CSRMatrix, FastLightFM,
fit_logistic, predict_lightfm,
fit_warp, fit_bpr, fit_warp_kos)

from .lightfm_fast import (CSRMatrix, FastLightFM, fit_logistic, predict_lightfm, fit_warp, fit_bpr, fit_warp_kos).

Same error.

Also tried ".lightfm" to "lightfm" to change from relative import. Same error
Checking gcc and kernel versions:
gcc 4.9.2
Ubuntu 14.04
Linux 00846c176840 3.13.0-67-generic #110-Ubuntu SMP Fri Oct 23 13:24:41 UTC 2015 x86_64
GNU/Linux

But I think if you just pull the docker image and do a pip install lightfm it should replicate the error precisely.

BPR and Logistic seem to be broken

It seems that BPR and Logistic loss functions are broken (at least on Windows), or maybe I'm doing something terribly wrong.
It seems that BPR and Logistic always return the same predictions for every user (not exactly the same scores, but if you sort them, you get the same ranked list).

Example:

I'm using one of the MovieLens notebooks for this example (https://github.com/lyst/lightfm/blob/master/examples/movielens/warp_loss.ipynb).

I also tested this with my own datasets, and got the same results.

User / item embeddings Nan with large training set

Apologies if this is user error, but I appear to be getting Nan embeddings from LightFM and I'm unsure what I could have done wrong. I followed the documentation, and have raise the issue on SO.

Basically I have a large data-set where collaborative filtering is working fine, but where user / item embeddings are provided, the model produces nan embeddings.

http://stackoverflow.com/questions/40967226/lightfm-user-item-producing-nan-embeddings

Docker Build Fails Tests

Following the issue #101, I tried running the test as per the Docs:

docker-compose run lightfm py.test -x tests/

And then I ran it from the Docker shell (see below). In both cases, the test fails to find lightfm_fast

root@002ca3eaa9c1:/home/lightfm# py.test -x tests/
================================================== test session starts ==================================================
platform linux2 -- Python 2.7.9, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /home/lightfm, inifile: 
collecting 0 items / 1 errors
======================================================== ERRORS =========================================================
__________________________________________ ERROR collecting tests/test_api.py ___________________________________________
tests/test_api.py:7: in <module>
from lightfm import LightFM
lightfm/__init__.py:1: in <module>
from .lightfm import LightFM
lightfm/lightfm.py:9: in <module>
from .lightfm_fast import (CSRMatrix, FastLightFM,
E   ImportError: No module named lightfm_fast
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================ 1 error in 0.30 seconds ================================================

max_sampled vs k vs n

max_sampled is the maximum number of negative example to be chosen when using WARP (how about BPR?).
Which parameter controls the actual number of negative samples? Is it max_sampled, in a best effort kind of way?

The reason I'm asking is that k and n do not apply to WARP, per documentation.

Thank you,

Enable multithreading without docker

I installed lightFM on ubuntu via pip, but the training does not seem to use multithreading. How do I enable multithreading without docker?

installation failed on linux mint

I use pip install lightfm
error is
running build_ext

building 'lightfm._lightfm_fast_openmp' extension

creating build/temp.linux-x86_64-2.7

creating build/temp.linux-x86_64-2.7/lightfm

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c lightfm/_lightfm_fast_openmp.c -o build/temp.linux-x86_64-2.7/lightfm/_lightfm_fast_openmp.o -ffast-math -march=native -fopenmp

lightfm/_lightfm_fast_openmp.c:15:20: fatal error: Python.h: No such file or directory

compilation terminated.

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Input matricies need to be int32

Often arrays are created into different types than what is required by the lightfm c code. this can result in errors like:

File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 194, in fit verbose=verbose) File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 254, in fit_partial self.loss) File "/home/jattenberg/anaconda/lib/python2.7/site-packages/lightfm/lightfm.py", line 289, in _run_epoch CSRMatrix(self._get_positives_lookup_matrix(interactions)), File "lightfm/lightfm_fast.pyx", line 91, in lightfm.lightfm_fast.CSRMatrix.__init__ (lightfm/lightfm_fast.c:1966) ValueError: Buffer dtype mismatch, expected 'int' but got 'long long'

input matricies can automatically be converted to int32

Installation for OS X El Capitan

It doesn't seem to work for OS X El Capitan at the moment:

$ sudo python setup.py install
Password:
running install
Checking .pth file support in /Library/Python/2.7/site-packages/
/usr/bin/python -E -c pass
TEST PASSED: /Library/Python/2.7/site-packages/ appears to support .pth files
running bdist_egg
running egg_info
writing pbr to lightfm.egg-info/pbr.json
writing requirements to lightfm.egg-info/requires.txt
writing lightfm.egg-info/PKG-INFO
writing top-level names to lightfm.egg-info/top_level.txt
writing dependency_links to lightfm.egg-info/dependency_links.txt
reading manifest file 'lightfm.egg-info/SOURCES.txt'
writing manifest file 'lightfm.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.11-intel/egg
running install_lib
running build_py
running build_ext
building 'lightfm.lightfm_fast' extension
gcc-5 -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c lightfm/lightfm_fast.c -o build/temp.macosx-10.11-intel-2.7/lightfm/lightfm_fast.o -fopenmp -ffast-math -march=native
gcc-5: error: unrecognized command line option '-Wshorten-64-to-32'
error: command 'gcc-5' failed with exit status 1

Will update if I find a way to make it work 😃

Item-based recommendation

Hey everyone,
thanks a lot for maintaining this project!
Do you also have some examples how to recommend items without having data about users and ratings?
I'd like to recommend similar eBooks based on their content.
Best Regards

Value ranges for Predict()

predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1)
is giving me some prediction value. I am trying to set a range for the prediction ( ex. 0 to 1)
I am not able to set that range anywhere.

0 Passed, 5 Failed using Python 3.5

Unable to test environment using Python 3.5

The error log is given by:

==================================================================== ERRORS ====================================================================
______________________________________________________ ERROR collecting tests/test_api.py ______________________________________________________
tests/test_api.py:7: in <module>
    from lightfm import LightFM
lightfm/__init__.py:1: in <module>
    from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
E   ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
___________________________________________________ ERROR collecting tests/test_datasets.py ____________________________________________________
tests/test_datasets.py:7: in <module>
    from lightfm.datasets import fetch_movielens, fetch_stackexchange
lightfm/__init__.py:1: in <module>
    from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
E   ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
__________________________________________________ ERROR collecting tests/test_evaluation.py ___________________________________________________
tests/test_evaluation.py:7: in <module>
    from lightfm import LightFM, evaluation
lightfm/__init__.py:1: in <module>
    from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
E   ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
________________________________________________ ERROR collecting tests/test_fast_functions.py _________________________________________________
tests/test_fast_functions.py:6: in <module>
    from lightfm import _lightfm_fast
lightfm/__init__.py:1: in <module>
    from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
E   ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
___________________________________________________ ERROR collecting tests/test_movielens.py ___________________________________________________
tests/test_movielens.py:12: in <module>
    from lightfm import LightFM
lightfm/__init__.py:1: in <module>
    from .lightfm import LightFM
lightfm/lightfm.py:7: in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
lightfm/_lightfm_fast.py:12: in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
E   ImportError: No module named 'lightfm._lightfm_fast_no_openmp'
--------------------------------------------------------------- Captured stderr ----------------------------------------------------------------
/home/salman/Documents/Codes/TensorFlow/lightfm/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
=========================================================== 5 error in 0.71 seconds ============================================================

Installation as suggested in the readme?

I tried to modify some of the cython parts of the code, and I have issues at installing it afterwards.
I tried re-installing the original version of lightfm (just cloned from the repo) with the same procedure. The problem is when I run:
python setup.py cythonize

I obtain the error message:

running cythonize
Traceback (most recent call last):
File "setup.py", line 152, in
ext_modules=define_extensions(use_openmp)
File "/home/paolo/anaconda3/lib/python3.4/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/paolo/anaconda3/lib/python3.4/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/paolo/anaconda3/lib/python3.4/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "setup.py", line 81, in run
self.generate_pyx()
File "setup.py", line 74, in generate_pyx
fl.write(template.format(**template_params))
AttributeError: 'bytes' object has no attribute 'format'

I suspect there is a problem with the version of Python I am using (from a quick googling it appears that bytes have the attribute format starting in Python 3.5). Is python 3.5 really necessary?

[OSX] GCC available via macports, yet setup does not find it

Because the support text asked for feedback on installation on OS X, I add my current installation issue:

  File "/private/tmp/pip_build_root/lightfm/setup.py", line 35, in set_gcc
    raise Exception('No GCC available. Install gcc from Homebrew '
Exception: No GCC available. Install gcc from Homebrew using brew install gcc.

However gcc is installed via macports:
[terminal]: port installed | grep gcc
gcc48 @4.8.5_0 (active)

gcc49 @4.9.3_0 (active)

macports typically installs under /opt/local/. [1]
If you want to support this, I guess you could simply add to line 28 in setup.py?

[1] https://guide.macports.org/chunked/installing.macports.html

predict_rank is slow for all items of a single user

I have a model of (30k, 1m) user/item
when I use predict_rank to predict all the ranks of a single user, it gets slow:

interactions = csr_matrix(
          ([1] * item_size, ([user_id] * item_size, item_ids)),
          shape=(user_size, item_size),
          dtype=np.float32)
# this takes very long
model.predict_rank(interactions)
# very fast
model.predict(user_id, item_ids)

isn't the ranks are just the recommended scores' order?
I guess predict_rank do predict for every interaction？

thanks,

different users in training and test set

Hello,
the question here is:
How to deal with different users in training and test set?
In the example on the hybrid model for question recommendations to users
(https://github.com/lyst/lightfm/blob/master/examples/stackexchange/hybrid_crossvalidated.ipynb)
the model recommends the highest ranked question (item) to the user not answered by this user yet.
What about this different setting:
Each user only interacts with one item.
Training is done with these interactions
At test time, there is a new user.
The aim is to recommend the highest ranked item for this unknown user.
This means that the aim of this setting is to deal with different users in training and test set but always the same set of items.
Is this possible? How can this be achieved?
It would be appreciated very much if you could give an example or point to an existing example where this problem is tackled.
Thank you!

Scikit-Learn compatibility

LightFM borrows from the Scikit-Learn interface but is not fully compatible to a BaseEstimator. In order to use useful Scikit-Learn functionality like grid_search.RandomizedSearchCV being compatible to Scikit-Learn would be beneficial to optimize hyper-parameters of LightFM models.
Added PR #107 to resolve this issue.

Docker Build Fails to Install SciPy

docker-compose build lightfm

Produces this on my Mac OS X:

File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 966, in add_subpackage caller_level = 2)
File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 935, in get_subpackage caller_level = caller_level + 1)
File "/usr/lib/python2.7/dist-packages/numpy/distutils/misc_util.py", line 872, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "scipy/linalg/setup.py", line 20, in configuration
raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

----------------------------------------
Can't roll back scipy; was not uninstalled
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-XnyZzK/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-cBi7ka-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-XnyZzK/scipy
Storing debug log for failure in /root/.pip/pip.log
ERROR: Service 'lightfm' failed to build: The command '/bin/sh -c pip install .' returned a non-zero code: 1

I'm new to Docker but see that it is SciPy complaining about missing lapack/blas

error when building 'lightfm.lightfm_fast' extension on OSX

Looks like this is an issue with the gcc compiler on 64-bit architectures, in particular the Wshorten-64-to-32 option.

Here's the error:

gcc-4.9: error: unrecognized command line option '-Wshorten-64-to-32'
error: command 'gcc-4.9' failed with exit status 1

My machine has a relatively new version of gcc, the one available from brew:

brew upgrade gcc
Error: gcc 4.9.2_1 already installed

Any ideas?

Assert error in BPR without user/item features

I'm using precision_at_k from lightFM library with the code below.

    lightfm_model = LightFM(loss="bpr", no_components=25, random_state=0)

    lightfm_model.fit(interactions=coo_matrix((np.ones(len(training)), (training["user"], training["product"]))), num_threads=20)

    print lightfm_model.user_embeddings.shape 
    print lightfm_model.item_embeddings.shape

    print (precision_at_k(model=lightfm_model,
                              train_interactions=coo_matrix((np.ones(len(training)), (training["user"], training["product"]))),
                              test_interactions=coo_matrix((np.ones(len(test)), (test["user"], test["product"]))),
                              k=5,
                              user_features=None,
                              item_features=None) )

the size of user and item embeddings is correct (25 for both) but I'm getting:

assert self.user_embeddings.shape[0] >= user_features.shape[1]

AssertionError

Notice that I'm not using user/item features.

Multiclass logloss

Very cool library!

Is it the case that LightFM only supports binary classification? Can't I use for multinomial classification?

Thanks!

1 Failed, 49 Passed on running python2 setup.py test

Kindly tell how to solve the 1 Failed File

The logfile for the code is :

=================================================================== FAILURES ===================================================================
______________________________________________________ test_basic_fetching_stackexchange _______________________________________________________

    def test_basic_fetching_stackexchange():
    
        test_fractions = (0.2, 0.5, 0.6)
    
        for test_fraction in test_fractions:
            data = fetch_stackexchange('crossvalidated',
                                       min_training_interactions=0,
                                       test_set_fraction=test_fraction)
    
            train = data['train']
            test = data['test']
    
            assert isinstance(train, sp.coo_matrix)
            assert isinstance(test, sp.coo_matrix)
    
            assert train.shape == test.shape
    
            frac = float(test.getnnz()) / (train.getnnz() + test.getnnz())
            assert abs(frac - test_fraction) < 0.01
    
        for dataset in ('crossvalidated', 'stackoverflow'):
    
            data = fetch_stackexchange(dataset,
                                       min_training_interactions=0,
>                                      indicator_features=True, tag_features=False)

tests/test_datasets.py:58: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
lightfm/datasets/stackexchange.py:88: in fetch_stackexchange
    data = np.load(path)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:399: in load
    pickle_kwargs=pickle_kwargs)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:162: in __init__
    _zip = zipfile_factory(fid)
/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py:92: in zipfile_factory
    return zipfile.ZipFile(*args, **kwargs)
/usr/lib/python2.7/zipfile.py:770: in __init__
    self._RealGetContents()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <zipfile.ZipFile object at 0x7f8214b07c90>

    def _RealGetContents(self):
        """Read in the table of contents for the ZIP file."""
        fp = self.fp
        try:
            endrec = _EndRecData(fp)
        except IOError:
            raise BadZipfile("File is not a zip file")
        if not endrec:
>           raise BadZipfile, "File is not a zip file"
E           BadZipfile: File is not a zip file

/usr/lib/python2.7/zipfile.py:811: BadZipfile
===================================================== 1 failed, 49 passed in 68.95 seconds =====================================================

rand_r missing stdlib.h on Windows

Hi was wondering if we could use rand_s when compiling under windows because compiling as-is does not work. Compiling by replacing rand_r by rand_s does work instead.

Extend LightFM with ANN ordering for recommendations

Considering the following:

Practically all implementations will not just implement the LightFM model but also an ANN ordering to get to the 'top x results'.
Due to the dot product equation in LightFM not all ANN algorithms are suitable/ easy to implement. Issues to tackle for instance are: is the ANN ordering method applicable, should biases be discarded y/n, normalise item and user vectors y/n?
The ANN benchmark (https://github.com/erikbern/ann-benchmarks) has a clear outperformer, i.e. hsnw (as part of the nsmlib) which does support (approximate) inner product search and thus suits LightFM wonderfully.

Could you extend LightFM to have an ANN ordering (preferably hsnw)? Give more guidance and details on how go about in getting recommendations out of LightFM. That will seriously benefit most users of this great algorithm you made.

Only support binary user/item feature?

I realized that lightfm only support binary user/item feature according to the paper titled "Metadata Embeddings for User and Item Cold-start Recommendations". Is that true and is it possible to support float features?
thanks,
jianyi

Error when building 'lightfm._lightfm_fast_no_openmp' extension

Hi, i install lightfm on my laptop Python enviroment : Windows 10, Python 3.4 64bit
But i had this error :

E:\Download IDM\Compressed\lightfm-1.9.1\lightfm-1.9.1>python setup.py install
Compiling without OpenMP support.
running install
running bdist_egg
running egg_info
writing top-level names to lightfm.egg-info\top_level.txt
writing lightfm.egg-info\PKG-INFO
writing dependency_links to lightfm.egg-info\dependency_links.txt
writing requirements to lightfm.egg-info\requires.txt
reading manifest file 'lightfm.egg-info\SOURCES.txt'
writing manifest file 'lightfm.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build
creating build\lib.win-amd64-3.4
creating build\lib.win-amd64-3.4\lightfm
copying lightfm\evaluation.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\lightfm.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\_lightfm_fast.py -> build\lib.win-amd64-3.4\lightfm
copying lightfm\__init__.py -> build\lib.win-amd64-3.4\lightfm
creating build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\movielens.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\stackexchange.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\_common.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\datasets\__init__.py -> build\lib.win-amd64-3.4\lightfm\datasets
copying lightfm\_lightfm_fast_no_openmp.c -> build\lib.win-amd64-3.4\lightfm
copying lightfm\_lightfm_fast_openmp.c -> build\lib.win-amd64-3.4\lightfm
running build_ext
building 'lightfm._lightfm_fast_no_openmp' extension
Traceback (most recent call last):
  File "setup.py", line 154, in <module>
    ext_modules=define_extensions(use_openmp)
  File "C:\Python34\lib\distutils\core.py", line 148, in setup
    dist.run_commands()
  File "C:\Python34\lib\distutils\dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Python34\lib\site-packages\setuptools\command\install.py", line 67, in run
    self.do_egg_install()
  File "C:\Python34\lib\site-packages\setuptools\command\install.py", line 109, in do_egg_install
    self.run_command('bdist_egg')
  File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Python34\lib\site-packages\setuptools\command\bdist_egg.py", line 161, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "C:\Python34\lib\site-packages\setuptools\command\bdist_egg.py", line 147, in call_command
    self.run_command(cmdname)
  File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Python34\lib\site-packages\setuptools\command\install_lib.py", line 10, in run
    self.build()
  File "C:\Python34\lib\distutils\command\install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "C:\Python34\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Python34\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Python34\lib\site-packages\setuptools\command\build_ext.py", line 66, in run
    _build_ext.run(self)
  File "C:\Python34\lib\distutils\command\build_ext.py", line 339, in run
    self.build_extensions()
  File "C:\Python34\lib\distutils\command\build_ext.py", line 448, in build_extensions
    self.build_extension(ext)
  File "C:\Python34\lib\site-packages\setuptools\command\build_ext.py", line 178, in build_extension
    _build_ext.build_extension(self, ext)
  File "C:\Python34\lib\distutils\command\build_ext.py", line 503, in build_extension
    depends=ext.depends)
  File "C:\Python34\lib\distutils\msvc9compiler.py", line 460, in compile
    self.initialize()
  File "C:\Python34\lib\distutils\msvc9compiler.py", line 371, in initialize
    vc_env = query_vcvarsall(VERSION, plat_spec)
  File "C:\Python34\lib\site-packages\setuptools\msvc9_support.py", line 52, in query_vcvarsall
    return unpatched['query_vcvarsall'](version, *args, **kwargs)
  File "C:\Python34\lib\distutils\msvc9compiler.py", line 287, in query_vcvarsall
    raise ValueError(str(list(result.keys())))
ValueError: ['path']

How i can fix that error, i tried this : #29
But still error.

New lightFM usage example

Hi,

I have written a quick ipython notebook example where I used lightFM in the Kaggle RECRUIT Ponpare competition.

The goal of the competition was to predict user purchase on a specific week given a year of training data (log of purchase and visit).

This naturally fits into the lightFM hybrid recommendation paradigm : implicit feedback data with user and item metadata

If that fits you, I can create a new pull request to add it to your example section.

Cheers

Install Problem

When I install it, there is a problem shown as folllows. Please give me a favor. Thank you!

Crossvalidated example under Python 3.5

When running the crossvalided example notebook under Python 3.5 I get when executing:

import data
(interactions, question_features,
user_features, question_vectorizer,
user_vectorizer) = data.read_data() # This will download the data if not present

following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-0c850630fc9e> in <module>()
      3 (interactions, question_features,
      4  user_features, question_vectorizer,
----> 5  user_vectorizer) = data.read_data() # This will download the data if not present

/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in read_data()
    201 
    202     user_about = {}
--> 203     for (user_id, about) in _read_raw_user_data():
    204 
    205         if user_id == -1:

/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in _read_raw_user_data()
    148                 about_me = datum.get('AboutMe', '')
    149 
--> 150                 yield int(user_id), _process_about(about_me)
    151 
    152             except etree.XMLSyntaxError:

/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in _process_about(about)
    106 def _process_about(about):
    107 
--> 108     clean_about = (strip_tags(about)
    109                    .replace('\n', ' ')
    110                    .lower())

/home/fwilhelm/Sources/lightfm/examples/crossvalidated/data.py in strip_tags(html)
     37 def strip_tags(html):
     38     s = MLStripper()
---> 39     s.feed(html)
     40     return s.get_data()
     41 

/home/fwilhelm/.miniconda3/lib/python3.5/html/parser.py in feed(self, data)
    109         """
    110         self.rawdata = self.rawdata + data
--> 111         self.goahead(0)
    112 
    113     def close(self):

/home/fwilhelm/.miniconda3/lib/python3.5/html/parser.py in goahead(self, end)
    137         n = len(rawdata)
    138         while i < n:
--> 139             if self.convert_charrefs and not self.cdata_elem:
    140                 j = rawdata.find('<', i)
    141                 if j < 0:

AttributeError: 'MLStripper' object has no attribute 'convert_charrefs'```

Should lightfm better be run under Python 2.7?

recommend users for certain item

Hi, sorry for the stupid question:

Can this model predict top users for an item by design?
the predict & predict_rank method seem to be designed for user not item
if I do:

prediction = model.predict(all_user_ids, np.repeat(item_id, user_size))
# then sort the prediction

Is the top results reliable?

thank you

Save model to disk

I checked the documentation and couldn't find any info on this. Is it possible to serialize a model to disk? Would be useful when training takes multiple hours.

Possible values of `item_alpha`

Hi!

I'm getting an strange behavior. If I train (fit) a model with item_features and a L2 penalty on the item features (item_alpha) equal to 0.05, and then obtain all the item predictions for a user, all that predictions have the same value.
Also, if I obtain all the item predictions for another user (with the same model), again all the predictions have the same value, but different from the previous user.

Also, the auc_score of that model is 1.0. 😔

Movielens: getnnz() got an unexpected keyword argument 'axis'

Hello,
I just started playing around with lightFM, so this can be something that I just missed while trying the Movielens example.

Here is the code that I did following the tutorial available on this link
import numpy as np
from lightfm.datasets import fetch_movielens # movielens dataset
from lightfm.evaluation import * # precision_at_k
from lightfm import *

movielens = fetch_movielens()

train = movielens['train']
test = movielens['test']

model = LightFM(learning_rate=0.05, loss='bpr')

model.fit(train, epochs=10)

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10).mean()

train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test).mean()

print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

I'm getting this error as output:

Traceback (most recent call last):
File "main.py", line 17, in <module>
train_precision = precision_at_k(model, train, k=10).mean()
File "/Library/Python/2.7/site-packages/lightfm/evaluation.py", line 70, in precision_at_k
precision = precision[test_interactions.getnnz(axis=1) > 0]
TypeError: getnnz() got an unexpected keyword argument 'axis'

I'm fairly thinking that it's a silly mistake from my code; but because I couldn't figure it out, I'm opening this issue.

Thanks!

Add per-interaction weights to simulate time decay effects

Quite often more recent interactions of users with items carry more information than long past interactions. For instance the fashion taste of a hipster may change more towards rockabilly over time which will be reflected by his/her interactions. Having the possibility to weight interactions differently, more recent ones higher than older ones, would allow us to model these effects.

Segmentation fault with WARP

Using the WARP model gives me right now a segmentation fault under 64bit Linux using an environment with conda (4.0.5), conda-env (2.4.5), lightfm (1.9), numpy (1.11.2), pandas (0.18.0), scipy (0.18.1). The attached script outputs:

Shape: (125288, 792409)
NNZ: 5013789
Segmentation fault (core dumped)

Reducing the shape to (100000, 10000) works. Also changing the model to logistic works too. Therefore I would say there is a strong indication that it is related to the WARP cython code itself.
My laptop has 32GB RAM so memory size should not be a problem of this segmentation fault.

import numpy as np
import scipy as sp
from scipy import stats
import pandas as pd
from lightfm import LightFM
from lightfm.evaluation import precision_at_k

from coo_matrix import *

M = load_coo('./intrct_mat.npz')

# Uncommenting these makes it work
#M = M.tocsr()
#M = M[:100000,:10000].tocoo()

print('Shape: {}'.format(M.shape))
print('NNZ: {}'.format(M.nnz))
model = LightFM(loss='warp')  # using 'logistic' here makes it work
model.fit(M, epochs=30, num_threads=1)

Loss functions

Hi Maciej,
I haven't used C before, I am trying to understand how to read the _lightfm_fast_openmap.c file, since I am interested to see how you train the models via WARP, BPR and logistic. I am interested to adapt the algorithms for mutliple domains.

I was hoping to see clearly f.ex following algorithm:

Provide references and a small comparison matrix

It's always good scientific practice to also give references to other freely available recommender systems and maybe a small matrix comparing their main features. This has several benefits:

It allows new user to get a fast overview of freely available alternatives
It provides an overview of the main differences to the alternatives
Showing that there are alternatives and that their main feature are known to the authors of LightFM increases their reputation and perceived expertise.

I think the main alternatives are Mahout, PyFM and LensKit. All of them seem to provide only Collaborative Filtering. Interesting features would be: scalability/runtime performance, language bindings, algorithms/methods.

I know that this issue could be quite work intense but it would be nice to have in order to convince people to use LightFM ;-)

AssertionError regarding shapes of user_embedding variable and user_features parameter

I'm trying to fit a very basic instance of LightFM

model = LightFM(loss='warp')
model = model.fit(df, user_features=user_features)

Where it can be noted that df.shape = (222113, 2269), and user_features.shape = (222113, 24). The model object is able to be fitted, but when I subsequently try to call the auc_score method, I get the following error.

Traceback (most recent call last):
  File "main.py", line 35, in <module>
    model = engine.fit(train, test, user_features=user_feat)
  File "/path/to/dir/lightfm-engine/engine.py", line 23, in fit
    train_auc = auc_score(model, train, num_threads=self.NUM_THREADS).mean()
  File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/evaluation.py", line 118, in auc_score
    num_threads=num_threads)
  File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/lightfm.py", line 649, in predict_rank
    item_features)
  File "/Users/me/miniconda3/envs/py2/lib/python2.7/site-packages/lightfm/lightfm.py", line 235, in _construct_feature_matrices
    assert self.user_embeddings.shape[0] >= user_features.shape[1]
AssertionError

So I thought, maybe the shapes of the two sparse matrices are different but as it turns out:

model.user_embeddings.shape[0] = 24
user_features.shape[1] = 24

Not really sure what's wrong, because the boolean value of the assert statement should evaluate to True in this case.

fails to install conda install lightfm

d:\Recommender systems\code>conda install lightfm
Fetching package metadata .........
Solving package specifications: .
PackageNotFoundError: Package not found: '' Package missing in current win-64 ch
annels:

lightfm

You can search for packages on anaconda.org with

anaconda search -t conda lightfm

Item features in get_movielens_item_metadata()

In the get_movielens_item_metadata() function (lines 114 to 123)
=> the item id are added to genres but the genre index may overlap with the movie_id
Example : if movie has movie_id 3 but also genre 3, there is no distinction between genre 3 and movie_id3

Is this intended ?

logistic loss not working?

I trained a model lfm with logistic loss.

lfm = lightfm.LightFM(no_components=1, loss="logistic")
lfm.fit(sparse_positive, epochs=100)
lfm.predict(students,  items)

Returns:

array([ 7.49537325,  8.20262432,  7.60994577, ...,  4.16664028,
        5.1302681 ,  3.18788409])

Shouldn't the output be bounded [0..1]?

installation failed om windows computer with python 2.7

python27.lib(python27.dll) : fatal error LNK1112: module machine type 'x64'
conflicts with target machine type 'X86'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bi
n\link.exe' failed with exit status 1112

----------------------------------------

Command "c:\users\sanderoct27\anaconda2\python.exe -u -c "import setuptools, tok
enize;file='c:\users\sander1\appdata\local\temp\pip-build-d1jvqd\lig
htfm\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replac
e('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --recor
d c:\users\sander1\appdata\local\temp\pip-rqvifs-record\install-record.txt --si
ngle-version-externally-managed --compile" failed with error code 1 in c:\users
sander~1\appdata\local\temp\pip-build-d1jvqd\lightfm\

Unable to import LightFM DataSets using Python 3.5

from lightfm.datasets import fetch_movielens
from lightfm import LightFM

#Fetch data and format it
data = fetch_movielens(min_rating=4.0)

#Print training and testing data
print(repr(data['train']))
print(repr(data['test']))

The error I got for running this Python code was this:

/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py:9: UserWarning: LightFM was compiled without OpenMP support. Only a single thread will be used.
  warnings.warn('LightFM was compiled without OpenMP support. '
Traceback (most recent call last):
  File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py", line 3, in <module>
    from ._lightfm_fast_openmp import *  # NOQA
ImportError: /home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast_openmp.cpython-35m-x86_64-linux-gnu.so: undefined symbol: GOMP_parallel

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "recommendation_system.py", line 2, in <module>
    from lightfm.datasets import fetch_movielens
  File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/__init__.py", line 1, in <module>
    from .lightfm import LightFM
  File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/lightfm.py", line 7, in <module>
    from ._lightfm_fast import (CSRMatrix, FastLightFM,
  File "/home/salman/anaconda3/envs/tensorflow/lib/python3.5/site-packages/lightfm/_lightfm_fast.py", line 12, in <module>
    from ._lightfm_fast_no_openmp import *  # NOQA
ImportError: No module named 'lightfm._lightfm_fast_no_openmp'

Multithreading not working under Ubuntu

I installed with pip as per instructions (under Ubuntu 16).

$ python setup.py test
running test
running egg_info
writing requirements to lightfm.egg-info/requires.txt
writing lightfm.egg-info/PKG-INFO
writing top-level names to lightfm.egg-info/top_level.txt
writing dependency_links to lightfm.egg-info/dependency_links.txt
reading manifest file 'lightfm.egg-info/SOURCES.txt'
writing manifest file 'lightfm.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-2.7/lightfm/_lightfm_fast_openmp.so -> lightfm
==================================================================================================================== test session starts ====================================================================================================================
platform linux2 -- Python 2.7.12, pytest-2.8.7, py-1.4.31, pluggy-0.3.1
collected 49 items 

tests/test_api.py ...........
tests/test_datasets.py ..
tests/test_evaluation.py ...
tests/test_fast_functions.py .
tests/test_movielens.py ................................

================================================================================================================ 49 passed in 114.88 seco

I'm seeing one python job using 100% of the CPU when there should be 20 threads running.

Build error if you try to use Light fm data sets

if type in python
data = fetch_movielens(min_rating=4.0)

I get an error like this

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Aryan/anaconda/lib/python2.7/site-packages/lightfm/datasets/movielens.py", line 169, in fetch_movielens
    item_metadata_raw, genres_raw) = _read_raw_data(zip_path)
  File "/Users/Aryan/anaconda/lib/python2.7/site-packages/lightfm/datasets/movielens.py", line 20, in _read_raw_data
    with zipfile.ZipFile(path) as datafile:
  File "/Users/Aryan/anaconda/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/Users/Aryan/anaconda/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

I have imported these lines above

from lightfm import LightFM
from lightfm.datasets import fetch_movielens

slow WARP evaluation

When evaluating implicit feedbacks (using average mean precision@k for example or even AUC), one needs to predict all users against all items. That is done by looping over the interactions matrix row by row and invoking model.predict()
That gets prohibitively time expensive if faced with a 300K x 20MM matrix. (I estimated ~5days on a 8core server using a simple MF model w/ no content features)
Cython might be able to cut this time by 30%. Sharding the user base and distributing on different servers would reduce that time by another let's say ratio of 10. We would still be looking at 9hours.

What is the way around this? Increasing memory to be able to pass a dense interaction matrix with all ones to model.predict in one or two calls?