nicolashug / surprise Goto Github PK

View Code? Open in Web Editor NEW

6.2K 146.0 1.0K 6.8 MB

A Python scikit for building and analyzing recommender systems

Home Page: http://surpriselib.com

License: BSD 3-Clause "New" or "Revised" License

Python 77.07% Cython 22.71% Shell 0.22%

recommender systems recommendation svd matrix factorization machine-learning

surprise's Introduction

Overview

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.

Surprise was designed with the following purposes in mind:

Give users perfect control over their experiments. To this end, a strong emphasis is laid on documentation, which we have tried to make as clear and precise as possible by pointing out every detail of the algorithms.
Alleviate the pain of Dataset handling. Users can use both built-in datasets (Movielens, Jester), and their own custom datasets.
Provide various ready-to-use prediction algorithms such as baseline algorithms, neighborhood methods, matrix factorization-based ( SVD, PMF, SVD++, NMF), and many others. Also, various similarity measures (cosine, MSD, pearson...) are built-in.
Make it easy to implement new algorithm ideas.
Provide tools to evaluate, analyse and compare the algorithms' performance. Cross-validation procedures can be run very easily using powerful CV iterators (inspired by scikit-learn excellent tools), as well as exhaustive search over a set of parameters.

The name SurPRISE (roughly :) ) stands for Simple Python RecommendatIon System Engine.

Please note that surprise does not support implicit ratings or content-based information.

Getting started, example

Here is a simple example showing how you can (down)load a dataset, split it for 5-fold cross-validation, and compute the MAE and RMSE of the SVD algorithm.

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Output:

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9367  0.9355  0.9378  0.9377  0.9300  0.9355  0.0029  
MAE (testset)     0.7387  0.7371  0.7393  0.7397  0.7325  0.7375  0.0026  
Fit time          0.62    0.63    0.63    0.65    0.63    0.63    0.01    
Test time         0.11    0.11    0.14    0.14    0.14    0.13    0.02

Surprise can do much more (e.g, GridSearchCV)! You'll find more usage examples in the documentation .

Benchmarks

Here are the average RMSE, MAE and total execution time of various algorithms (with their default parameters) on a 5-fold cross-validation procedure. The datasets are the Movielens 100k and 1M datasets. The folds are the same for all the algorithms. All experiments are run on a laptop with an intel i5 11th Gen 2.60GHz. The code for generating these tables can be found in the benchmark example.

Movielens 100k	RMSE	MAE	Time
SVD	0.934	0.737	0:00:06
SVD++ (cache_ratings=False)	0.919	0.721	0:01:39
SVD++ (cache_ratings=True)	0.919	0.721	0:01:22
NMF	0.963	0.758	0:00:06
Slope One	0.946	0.743	0:00:09
k-NN	0.98	0.774	0:00:08
Centered k-NN	0.951	0.749	0:00:09
k-NN Baseline	0.931	0.733	0:00:13
Co-Clustering	0.963	0.753	0:00:06
Baseline	0.944	0.748	0:00:02
Random	1.518	1.219	0:00:01

Movielens 1M	RMSE	MAE	Time
SVD	0.873	0.686	0:01:07
SVD++ (cache_ratings=False)	0.862	0.672	0:41:06
SVD++ (cache_ratings=True)	0.862	0.672	0:34:55
NMF	0.916	0.723	0:01:39
Slope One	0.907	0.715	0:02:31
k-NN	0.923	0.727	0:05:27
Centered k-NN	0.929	0.738	0:05:43
k-NN Baseline	0.895	0.706	0:05:55
Co-Clustering	0.915	0.717	0:00:31
Baseline	0.909	0.719	0:00:19
Random	1.504	1.206	0:00:19

Installation

With pip (you'll need numpy, and a C compiler. Windows users might prefer using conda):

$ pip install numpy
$ pip install scikit-surprise

With conda:

$ conda install -c conda-forge scikit-surprise

For the latest version, you can also clone the repo and build the source (you'll first need Cython and numpy):

$ pip install numpy cython
$ git clone https://github.com/NicolasHug/surprise.git
$ cd surprise
$ python setup.py install

License and reference

This project is licensed under the BSD 3-Clause license, so it can be used for pretty much everything, including commercial applications.

I'd love to know how Surprise is useful to you. Please don't hesitate to open an issue and describe how you use it!

Please make sure to cite the paper if you use Surprise for your research:

@article{Hug2020,
  doi = {10.21105/joss.02174},
  url = {https://doi.org/10.21105/joss.02174},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {52},
  pages = {2174},
  author = {Nicolas Hug},
  title = {Surprise: A Python library for recommender systems},
  journal = {Journal of Open Source Software}
}

Contributors

The following persons have contributed to Surprise:

ashtou, bobbyinfj, caoyi, Олег Демиденко, Charles-Emmanuel Dias, dmamylin, Lauriane Ducasse, Marc Feger, franckjay, Lukas Galke, Tim Gates, Pierre-François Gimenez, Zachary Glassman, Jeff Hale, Nicolas Hug, Janniks, jyesawtellrickson, Doruk Kilitcioglu, Ravi Raju Krishna, lapidshay, Hengji Liu, Ravi Makhija, Maher Malaeb, Manoj K, James McNeilis, Naturale0, nju-luke, Pierre-Louis Pécheux, Jay Qi, Lucas Rebscher, Skywhat, Hercules Smith, David Stevens, Vesna Tanko, TrWestdoor, Victor Wang, Mike Lee Williams, Jay Wong, Chenchen Xu, YaoZh1918.

Thanks a lot :) !

Development Status

Starting from version 1.1.0 (September 2019), I will only maintain the package, provide bugfixes, and perhaps sometimes perf improvements. I have less time to dedicate to it now, so I'm unabe to consider new features.

For bugs, issues or questions about Surprise, please avoid sending me emails; I will most likely not be able to answer). Please use the GitHub project page instead, so that others can also benefit from it.

surprise's People

Contributors

Stargazers

Watchers

Forkers

naturale0 vyraun silky djangosporti mathkann backupmanager stevenlol kidkid168 arita37 wenmin-wu chagge youngdev limingdeng wellsoftware yossisolomon yxliang xiyuanhou xcbat benjamesbabala aga-j awesome-python smilemilk1992 jirachikai alkalit sandy4321 bingowan shanliu063 tyrinwu mahermalaeb qilingu ronqb leezqcst leavingseason alex-senov solertis tedfu johann-es chuanfeihuang cedias giantstonex xapcloud supermanvista zengfan92 anmolanmol1234 seufagner jeshica bjuny tangent0 tangent11 melody-xiaomi 092000 guillermogsjc colinsongf behdadahmadi adangadang rfriel lynmonkey dataista0 shotapentaho wujinming nathania hyunsu-jin danilo-augusto chapagain ajoeajoe mdhrdy shiyangbo cwlseu bidexbido zuliwu shashankg7 waldstein1983 coopertian cbentes aminm1364 winnerineast charmoniumq zzj0311 vanova billjeffries namp coocoky garrylau oooqqqooo techstone shravankumar147 ashi1702 geapoch dclong xray1111 shramanap vikibytes pierre-vr nazlimedghalchi xingzhexiaozhu roelschr jiajiadf joeywangtw verashira githubruowong

surprise's Issues

How to split my dataset into a training and testing data

I am trying to understand how I can split my dataset into the a training and testing set.

I am using the following code to read my dataset:

import os
from os.path import dirname
import pandas as pd  
from surprise import SVD
from surprise import Dataset, Reader
from surprise import evaluate, print_perf
from surprise import GridSearch
from surprise.dump import dump
from surprise import KNNBasic


# Entrar endereço da tabela de ratings.
file_path = dirname(os.path.abspath(os.getcwd())) + \
    "/Data/Compras_normalizado_media_surprise.csv"
print(file_path)

# Preparar e configurar o reader to scikit-surprise.
reader = Reader(line_format='user item rating', sep=';')


# Ler os dados na forma cliente, produto, rating.
data = Dataset.load_from_file(file_path, reader=reader)

How can I split data into data_train and data_test?

How to retrieve the k-nearest neighbors?

Given the collaborative filtering issue, how can I retrieve the top K recommendations for a given user?
The documentation is not clear enough for that.

Is the scikit-surprise only an evaluation tool or can it be applied for real world scenario?

Kind Regards,

Tauranis

Slopeone keeps saying {'was_impossible': True, 'reason': 'User and/or item is unkown.'}

When I am trying to look at the training dataset, I have this

for trainset, testset in data.folds():
    # train and test algorithm.
    slopeone.train(trainset)
    for uid, mid, rate in trainset.all_ratings():
        uid, mid = str(uid), str(mid)
        slopepred = slopeone.predict(uid, mid, r_ui=4, verbose=True).est

and this keeps telling me that the user and item cannot be found.

unable to use surprise in anaconda3.6

I want to use surprise..I have anaconda3.6..unable to install it through pip install surprise

How do i find training error ?

I used SVD matrix factorization .First I used GridSearch to tune my hyperparameters. Then I trained the algorithm on my whole dataset using trainset = data.build_full_trainset().
Now I want to find training error on trainset to check if my algorithm is suffering by high bias. How should I do that ? Can you send me the code ?

Missing prediction algorithms when installing through pip

Hello,

I have installed this library using pip on my Ubuntu machine 2 days ago. However I am not able to import the SlopeOne and CoClustering algorithms. I have checked the files inside the library and noticed that the corresponding python files are actually missing.

I am not an expert in python or pip, but I thought I should highlight this issue. Please let me know if I am missing something.

Thanks for all the hardwork put in this library.

Cheers,
Maher

Install Surprise on Datalabs

Hi there,

I'm trying to install Surprise on Datalabs directly in a script using the command:
%%bash
pip install scikit-surprise

I use Windows.

Getting the following error:
Collecting scikit-surprise
Using cached scikit-surprise-1.0.3.tar.gz
Requirement already satisfied: numpy>=1.11.2 in /usr/local/lib/python2.7/dist-packages (from scikit-surprise)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python2.7/dist-packages (from scikit-surprise)
Building wheels for collected packages: scikit-surprise
Running setup.py bdist_wheel for scikit-surprise: started
Running setup.py bdist_wheel for scikit-surprise: finished with status 'error'
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-9Dmyhg/scikit-surprise/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmpSd64VXpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/surprise
copying surprise/accuracy.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/dataset.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/dump.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/init.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/main.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/evaluate.py -> build/lib.linux-x86_64-2.7/surprise
creating build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/random_pred.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/predictions.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/baseline_only.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/init.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/algo_base.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/knns.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
running egg_info
writing requirements to scikit_surprise.egg-info/requires.txt
writing scikit_surprise.egg-info/PKG-INFO
writing top-level names to scikit_surprise.egg-info/top_level.txt
writing dependency_links to scikit_surprise.egg-info/dependency_links.txt
writing entry points to scikit_surprise.egg-info/entry_points.txt
reading manifest file 'scikit_surprise.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'scikit_surprise.egg-info/SOURCES.txt'
copying surprise/similarities.c -> build/lib.linux-x86_64-2.7/surprise
copying surprise/similarities.pyx -> build/lib.linux-x86_64-2.7/surprise
copying surprise/prediction_algorithms/co_clustering.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/co_clustering.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/matrix_factorization.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/matrix_factorization.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/optimize_baselines.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/optimize_baselines.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/slope_one.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/slope_one.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
running build_ext
building 'surprise.similarities' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/surprise
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c surprise/similarities.c -o build/temp.linux-x86_64-2.7/surprise/similarities.o
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Running setup.py clean for scikit-surprise
Failed to build scikit-surprise
Installing collected packages: scikit-surprise
Running setup.py install for scikit-surprise: started
Running setup.py install for scikit-surprise: finished with status 'error'
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-9Dmyhg/scikit-surprise/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-ulJEFx-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/surprise
copying surprise/accuracy.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/dataset.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/dump.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/init.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/main.py -> build/lib.linux-x86_64-2.7/surprise
copying surprise/evaluate.py -> build/lib.linux-x86_64-2.7/surprise
creating build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/random_pred.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/predictions.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/baseline_only.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/init.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/algo_base.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/knns.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
running egg_info
writing requirements to scikit_surprise.egg-info/requires.txt
writing scikit_surprise.egg-info/PKG-INFO
writing top-level names to scikit_surprise.egg-info/top_level.txt
writing dependency_links to scikit_surprise.egg-info/dependency_links.txt
writing entry points to scikit_surprise.egg-info/entry_points.txt
reading manifest file 'scikit_surprise.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'scikit_surprise.egg-info/SOURCES.txt'
copying surprise/similarities.c -> build/lib.linux-x86_64-2.7/surprise
copying surprise/similarities.pyx -> build/lib.linux-x86_64-2.7/surprise
copying surprise/prediction_algorithms/co_clustering.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/co_clustering.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/matrix_factorization.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/matrix_factorization.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/optimize_baselines.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/optimize_baselines.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/slope_one.c -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
copying surprise/prediction_algorithms/slope_one.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
running build_ext
building 'surprise.similarities' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/surprise
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c surprise/similarities.c -o build/temp.linux-x86_64-2.7/surprise/similarities.o
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------

Any idea what's going on? I'm a simple statistician with average IT experience/knowledge, looking for an easy fix.

Cheers!

Allow a 'biased' option in the SVD algo. If true, use baselines, if False, don't.

How can i use another metric similarity in KNN

I have tried to make the euclidean and another metric and put them in similarities.pyx. But it doesnt work. How can i use the another metric for KNN?

Thanks,

Optimize baselines computation with Cython

Estimates on unseen user or item

When producing an estimate for a known user but unknown item, would it be helpful to provide self.trainset.global_mean+ self.bu[u] instead? Similarly, we can provide self.trainset.global_mean+ self.bi[i] when the user is unknown but the item is known. I believe the current implementation returns self.trainset.global_mean when PredictionImpossible is raised.

https://github.com/NicolasHug/Surprise/blob/master/surprise/prediction_algorithms/matrix_factorization.pyx#L244

    def estimate(self, u, i):
        # Should we cythonize this as well?

        est = self.trainset.global_mean if self.biased else 0

        if self.trainset.knows_user(u):
            est += self.bu[u]

        if self.trainset.knows_item(i):
            est += self.bi[i]

        if self.trainset.knows_user(u) and self.trainset.knows_item(i):
            est += np.dot(self.qi[i], self.pu[u])
        else:
            raise PredictionImpossible

        return est

As a side note, if we're checking for both user and item anyways, why not do it in one step? i.e.

    def estimate(self, u, i):
        # Should we cythonize this as well?

        est = self.trainset.global_mean if self.biased else 0

        if self.trainset.knows_user(u) and self.trainset.knows_item(i):
            est += self.bu[u] + self.bi[i] + np.dot(self.qi[i], self.pu[u])
        else:
            raise PredictionImpossible

        return est

Installation on Windows 8.1 failed

I have an issue of Surprise installation on Windows 8.1.

When running

pip install scikit-surprise

from command line, I get the following output:

C:\Users\Drugi>pip install scikit-surprise
Collecting scikit-surprise
  Using cached scikit-surprise-1.0.3.tar.gz
Requirement already satisfied: numpy>=1.11.2 in c:\programdata\anaconda3\lib\si
e-packages (from scikit-surprise)
Requirement already satisfied: six>=1.10.0 in c:\programdata\anaconda3\lib\site
packages (from scikit-surprise)
Building wheels for collected packages: scikit-surprise
  Running setup.py bdist_wheel for scikit-surprise ... error
  Complete output from command C:\ProgramData\Anaconda3\python.exe -u -c "impor
 setuptools, tokenize;__file__='C:\\Users\\Drugi\\AppData\\Local\\Temp\\pip-bui
d-qbu2qjqt\\scikit-surprise\\setup.py';f=getattr(tokenize, 'open', open)(__file
_);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, '
xec'))" bdist_wheel -d C:\Users\Drugi\AppData\Local\Temp\tmp_h27f0pzpip-wheel-
-python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.6
  creating build\lib.win-amd64-3.6\surprise
  copying surprise\accuracy.py -> build\lib.win-amd64-3.6\surprise
  copying surprise\dataset.py -> build\lib.win-amd64-3.6\surprise
  copying surprise\dump.py -> build\lib.win-amd64-3.6\surprise
  copying surprise\evaluate.py -> build\lib.win-amd64-3.6\surprise
  copying surprise\__init__.py -> build\lib.win-amd64-3.6\surprise
  copying surprise\__main__.py -> build\lib.win-amd64-3.6\surprise
  creating build\lib.win-amd64-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\algo_base.py -> build\lib.win-amd64-3.
\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\baseline_only.py -> build\lib.win-amd6
-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\knns.py -> build\lib.win-amd64-3.6\sur
rise\prediction_algorithms
  copying surprise\prediction_algorithms\predictions.py -> build\lib.win-amd64-
.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\random_pred.py -> build\lib.win-amd64-
.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\__init__.py -> build\lib.win-amd64-3.6
surprise\prediction_algorithms
  running egg_info
  writing scikit_surprise.egg-info\PKG-INFO
  writing dependency_links to scikit_surprise.egg-info\dependency_links.txt
  writing entry points to scikit_surprise.egg-info\entry_points.txt
  writing requirements to scikit_surprise.egg-info\requires.txt
  writing top-level names to scikit_surprise.egg-info\top_level.txt
  warning: manifest_maker: standard file '-c' not found

  reading manifest file 'scikit_surprise.egg-info\SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  writing manifest file 'scikit_surprise.egg-info\SOURCES.txt'
  copying surprise\similarities.c -> build\lib.win-amd64-3.6\surprise
  copying surprise\similarities.pyx -> build\lib.win-amd64-3.6\surprise
  copying surprise\prediction_algorithms\co_clustering.c -> build\lib.win-amd64
3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\matrix_factorization.c -> build\lib.wi
-amd64-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\optimize_baselines.c -> build\lib.win-
md64-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\slope_one.c -> build\lib.win-amd64-3.6
surprise\prediction_algorithms
  copying surprise\prediction_algorithms\co_clustering.pyx -> build\lib.win-amd
4-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\matrix_factorization.pyx -> build\lib.
in-amd64-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\optimize_baselines.pyx -> build\lib.wi
-amd64-3.6\surprise\prediction_algorithms
  copying surprise\prediction_algorithms\slope_one.pyx -> build\lib.win-amd64-3
6\surprise\prediction_algorithms
  running build_ext
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\Drugi\AppData\Local\Temp\pip-build-qbu2qjqt\scikit-surprise\
etup.py", line 101, in <module>
      ['surprise = surprise.__main__:main']},
    File "C:\ProgramData\Anaconda3\lib\distutils\core.py", line 148, in setup
      dist.run_commands()
    File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_com
ands
      self.run_command(cmd)
    File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_com
and
      cmd_obj.run()
    File "C:\ProgramData\Anaconda3\lib\site-packages\wheel\bdist_wheel.py", lin
 179, in run
      self.run_command('build')
    File "C:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_comm
nd
      self.distribution.run_command(command)
    File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_com
and
      cmd_obj.run()
    File "C:\ProgramData\Anaconda3\lib\distutils\command\build.py", line 135, i
 run
      self.run_command(cmd_name)
    File "C:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_comm
nd
      self.distribution.run_command(command)
    File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_com
and
      cmd_obj.run()
    File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build
ext.py", line 185, in run
      _build_ext.build_ext.run(self)
    File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 30
, in run
      force=self.force)
    File "C:\ProgramData\Anaconda3\lib\distutils\ccompiler.py", line 1031, in n
w_compiler
      return klass(None, dry_run, force)
    File "C:\ProgramData\Anaconda3\lib\distutils\cygwinccompiler.py", line 282,
in __init__
      CygwinCCompiler.__init__ (self, verbose, dry_run, force)
    File "C:\ProgramData\Anaconda3\lib\distutils\cygwinccompiler.py", line 126,
in __init__
      if self.ld_version >= "2.10.90":
  TypeError: '>=' not supported between instances of 'NoneType' and 'str'

  ----------------------------------------
  Failed building wheel for scikit-surprise
  Running setup.py clean for scikit-surprise
Failed to build scikit-surprise
Installing collected packages: scikit-surprise
  Running setup.py install for scikit-surprise ... error
    Complete output from command C:\ProgramData\Anaconda3\python.exe -u -c "imp
rt setuptools, tokenize;__file__='C:\\Users\\Drugi\\AppData\\Local\\Temp\\pip-b
ild-qbu2qjqt\\scikit-surprise\\setup.py';f=getattr(tokenize, 'open', open)(__fi
e__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__,
'exec'))" install --record C:\Users\Drugi\AppData\Local\Temp\pip-l4ooc5n4-recor
\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.6
    creating build\lib.win-amd64-3.6\surprise
    copying surprise\accuracy.py -> build\lib.win-amd64-3.6\surprise
    copying surprise\dataset.py -> build\lib.win-amd64-3.6\surprise
    copying surprise\dump.py -> build\lib.win-amd64-3.6\surprise
    copying surprise\evaluate.py -> build\lib.win-amd64-3.6\surprise
    copying surprise\__init__.py -> build\lib.win-amd64-3.6\surprise
    copying surprise\__main__.py -> build\lib.win-amd64-3.6\surprise
    creating build\lib.win-amd64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\algo_base.py -> build\lib.win-amd64-
.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\baseline_only.py -> build\lib.win-am
64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\knns.py -> build\lib.win-amd64-3.6\s
rprise\prediction_algorithms
    copying surprise\prediction_algorithms\predictions.py -> build\lib.win-amd6
-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\random_pred.py -> build\lib.win-amd6
-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\__init__.py -> build\lib.win-amd64-3
6\surprise\prediction_algorithms
    running egg_info
    writing scikit_surprise.egg-info\PKG-INFO
    writing dependency_links to scikit_surprise.egg-info\dependency_links.txt
    writing entry points to scikit_surprise.egg-info\entry_points.txt
    writing requirements to scikit_surprise.egg-info\requires.txt
    writing top-level names to scikit_surprise.egg-info\top_level.txt
    warning: manifest_maker: standard file '-c' not found

    reading manifest file 'scikit_surprise.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'scikit_surprise.egg-info\SOURCES.txt'
    copying surprise\similarities.c -> build\lib.win-amd64-3.6\surprise
    copying surprise\similarities.pyx -> build\lib.win-amd64-3.6\surprise
    copying surprise\prediction_algorithms\co_clustering.c -> build\lib.win-amd
4-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\matrix_factorization.c -> build\lib.
in-amd64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\optimize_baselines.c -> build\lib.wi
-amd64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\slope_one.c -> build\lib.win-amd64-3
6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\co_clustering.pyx -> build\lib.win-a
d64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\matrix_factorization.pyx -> build\li
.win-amd64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\optimize_baselines.pyx -> build\lib.
in-amd64-3.6\surprise\prediction_algorithms
    copying surprise\prediction_algorithms\slope_one.pyx -> build\lib.win-amd64
3.6\surprise\prediction_algorithms
    running build_ext
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Drugi\AppData\Local\Temp\pip-build-qbu2qjqt\scikit-surpris
\setup.py", line 101, in <module>
        ['surprise = surprise.__main__:main']},
      File "C:\ProgramData\Anaconda3\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_c
mmands
        self.run_command(cmd)
      File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_c
mmand
        cmd_obj.run()
      File "C:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.
gg\setuptools\command\install.py", line 61, in run
      File "C:\ProgramData\Anaconda3\lib\distutils\command\install.py", line 54
, in run
        self.run_command('build')
      File "C:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_co
mand
        self.distribution.run_command(command)
      File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_c
mmand
        cmd_obj.run()
      File "C:\ProgramData\Anaconda3\lib\distutils\command\build.py", line 135,
in run
        self.run_command(cmd_name)
      File "C:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_co
mand
        self.distribution.run_command(command)
      File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_c
mmand
        cmd_obj.run()
      File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_bui
d_ext.py", line 185, in run
        _build_ext.build_ext.run(self)
      File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line
08, in run
        force=self.force)
      File "C:\ProgramData\Anaconda3\lib\distutils\ccompiler.py", line 1031, in
new_compiler
        return klass(None, dry_run, force)
      File "C:\ProgramData\Anaconda3\lib\distutils\cygwinccompiler.py", line 28
, in __init__
        CygwinCCompiler.__init__ (self, verbose, dry_run, force)
      File "C:\ProgramData\Anaconda3\lib\distutils\cygwinccompiler.py", line 12
, in __init__
        if self.ld_version >= "2.10.90":
    TypeError: '>=' not supported between instances of 'NoneType' and 'str'

    ----------------------------------------
Command "C:\ProgramData\Anaconda3\python.exe -u -c "import setuptools, tokenize
__file__='C:\\Users\\Drugi\\AppData\\Local\\Temp\\pip-build-qbu2qjqt\\scikit-su
prise\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().repl
ce('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --rec
rd C:\Users\Drugi\AppData\Local\Temp\pip-l4ooc5n4-record\install-record.txt --s
ngle-version-externally-managed --compile" failed with error code 1 in C:\Users
Drugi\AppData\Local\Temp\pip-build-qbu2qjqt\scikit-surprise\

Could you someone help me?

Regards,
Dragan

How to dump user vector and item vector after training?

Matrix factorization algorithms - Utility of estimate function

I'm not sure I understand why the function estimate is defined for each factorization algorithm.

Take for example SVD from matrix_factorization.pyx:

At lines 210 to 214 the prediction is explicitly calculated as:

(global_mean + bu[u] + bi[i] + dot)

This is exactly what the function estimate does. The same calculation for each user/item pair can be calculated from a call to this function that is defined at line 233

This renders the estimate function completely useless unless:

lines 210 to 214 are substituted by a simple call to estimate or
maybe the function estimate is used somewhere else in the framework (e.g. for evaluation purposes (I haven't thoroughly checked if this is true), e.g. when calling:

#We'll use the famous SVD algorithm.
algo = SVD()
# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])

real time recommendations

I wan't to make a movie recommendation website with real time predictions. I am thinking of combining scikit-surprise with django and socket.io/websockets/gevent for real time predictions. Are these the right tools to use or do you recommend something else ?

how can I build contextaware recommender systems using Surprise?

how can I build contextaware recommender systems using Surprise?
can you give me some tips?

Simple example so difficult to understand

Hi,

Im making a simple book recommendation. So im using your "custom_dataset" in my example:

import os

from surprise import Dataset
from surprise import KNNBasic
from surprise import Reader

reader = Reader(line_format='user item rating', sep=' ', skip_lines=3, rating_scale=(1, 5))

custom_dataset_path = (os.path.dirname(os.path.realpath(__file__)) + '/custom_dataset')
print("Using: " + custom_dataset_path)

data = Dataset.load_from_file(file_path=custom_dataset_path, reader=reader)
trainset = data.build_full_trainset()

sim_options = {
    'name': 'cosine',
    'user_based': True  # compute  similarities between items
}

algo = KNNBasic(sim_options=sim_options)
algo.train(trainset)

uid = "user0"

pred = algo.predict(uid=uid, iid="", r_ui=0, verbose=True)
print(pred)

Results of execution:

python process.py 

Using: /Users/paulo/Developer/workspaces/python/ubook-recommender/custom_dataset
Computing the cosine similarity matrix...
Done computing similarity matrix.
user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}
user: user0      item:            r_ui = 0.00   est = 3.00   {u'reason': 'User and/or item is unkown.', u'was_impossible': True}

Can you help me?

I need understand how it works, i think it is simple, but it dont enter on my mind. It is not intuitive as this:

https://www.librec.net/dokuwiki/doku.php?id=introduction

I want understand your library to make it work at same level.

Can you help me?

Implementing social recommendation algorithms

Hi
First of all thanks for this awesome tool, I wanted to know how should I use Surprise to implement recommender algorithms that require social data? Algorithms like SocialMF, TrustMF ,SoRec, SoReg and ...?

algo.predict function returns PredictionImpossible for user = 0 and item = 0 for 100k dataset

Hi,
While testing the predict function for SVD algorithm I observed that the predict function returns was_impossible = True in the details section of Prediction Object. I have mentioned the details below.

Dataset : 100k dataset (Default)
Algorithm : SVD
Validation : 3 Folds cross validation

the predict function associated with the algorithm object returns PredictionImpossible True for user = 0 with any combination of item id (in range). PredictionImpossible is also returns True for item = 0 with any combination of user id (in range)
Check logs below :
For User = 0
algo.trainset.knows_user(0)
True
algo.trainset.knows_item(1)
True
algo.predict('0','1',4)
Prediction(uid='0', iid='1', r_ui=4, est=3.5444502774861255, details={u'reason': '', u'was_impossible': True})

For Item = 0
algo.trainset.knows_user(1)
True
algo.trainset.knows_item(0)
True
algo.predict('1','0',4)
Prediction(uid='1', iid='0', r_ui=4, est=3.5444502774861255, details={u'reason': '', u'was_impossible': True})

Predictions work for other inputs:
algo.predict('1','10',4)
Prediction(uid='1', iid='10', r_ui=4, est=2.7008138081499409, details={u'was_impossible': False})

algo.predict('4','10',4)
Prediction(uid='4', iid='10', r_ui=4, est=4.0589210159479157, details={u'was_impossible': False})

Thanks!

Can I apply number of listening instead of star rating?

I am doing a recommendation for a pair user/artist.
In the dataset, there is a number of user's listening of the artist instead of rating 1-5.
Can I apply the Surprise library on this task?

Thanks

Cold-Start problem

Hi Nicolas.

I'm thinking to use the Surprise Library for my thesis, but, I need to attack the cold-start problem because it's one of the main topic's of my thesis. For that, I'm working on taxonomic user profiling. Can the Surprise library be modified to support these approch?
How do you address the cold-start problem with the Surprise library?

Many thanks.

Suggestion needed on recommendations made on own dataset?

Hi,

I have seen the format of ratings data that you used in example codes.
its is in following form :
user id | item | ratings
I too have same like of data but ratings are not given by the users.My data contains the
userid | textid | score.
Here score is something like index rate of the text. (Readability Score of text given by algorithm like Automated Readability Index).Now, I want apply same svd example code with my data.It is giving me the recommendations of text,based on scores but How can I know that I'm getting the correct recommendations based on scores ? My scores will be in between 1 to 100.
am I doing correct thing by directly using the example code by replacing the the input data?

Error while install from "master"

Hi,

Im trying install from master to use panda data frame, but get the error:

Command:

pip install https://github.com/NicolasHug/Surprise/archive/master.zip

Log:

Step 14/14 : RUN pip install https://github.com/NicolasHug/Surprise/archive/master.zip
 ---> Running in f1f4b0b9ee7f
Collecting https://github.com/NicolasHug/Surprise/archive/master.zip
  Downloading https://github.com/NicolasHug/Surprise/archive/master.zip (275kB)
Requirement already satisfied: numpy>=1.11.2 in /usr/local/lib/python2.7/dist-packages (from scikit-surprise==1.0.3)
Collecting six>=1.10.0 (from scikit-surprise==1.0.3)
  Downloading six-1.10.0-py2.py3-none-any.whl
Installing collected packages: six, scikit-surprise
  Running setup.py install for scikit-surprise: started
    Running setup.py install for scikit-surprise: finished with status 'error'
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-Gcx41f-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-qnkjKl-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/surprise
    copying surprise/dump.py -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/__init__.py -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/__main__.py -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/evaluate.py -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/dataset.py -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/accuracy.py -> build/lib.linux-x86_64-2.7/surprise
    creating build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/algo_base.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/__init__.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/random_pred.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/baseline_only.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/predictions.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/knns.py -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    running egg_info
    creating scikit_surprise.egg-info
    writing requirements to scikit_surprise.egg-info/requires.txt
    writing scikit_surprise.egg-info/PKG-INFO
    writing top-level names to scikit_surprise.egg-info/top_level.txt
    writing dependency_links to scikit_surprise.egg-info/dependency_links.txt
    writing entry points to scikit_surprise.egg-info/entry_points.txt
    writing manifest file 'scikit_surprise.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found
    
    reading manifest file 'scikit_surprise.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.c' under directory 'surprise'
    writing manifest file 'scikit_surprise.egg-info/SOURCES.txt'
    copying surprise/similarities.pyx -> build/lib.linux-x86_64-2.7/surprise
    copying surprise/prediction_algorithms/co_clustering.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/matrix_factorization.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/optimize_baselines.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    copying surprise/prediction_algorithms/slope_one.pyx -> build/lib.linux-x86_64-2.7/surprise/prediction_algorithms
    running build_ext
    building 'surprise.similarities' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/surprise
    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c surprise/similarities.c -o build/temp.linux-x86_64-2.7/surprise/similarities.o
    x86_64-linux-gnu-gcc: error: surprise/similarities.c: No such file or directory
    x86_64-linux-gnu-gcc: fatal error: no input files
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-Gcx41f-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-qnkjKl-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-Gcx41f-build/
The command '/bin/sh -c pip install https://github.com/NicolasHug/Surprise/archive/master.zip' returned a non-zero code: 1
make: *** [docker-build] Error 1

ValueError: could not convert string to float: rating, when loading CSV data

I followed the load_custom_dataset.py example and try to write my own version like this:

reader = Reader(line_format='user item rating timestamp', sep=',', rating_scale=(1, 5), skip_lines=0)
data = Dataset.load_from_file('ml-latest-small/ratings.csv', reader=reader)

When I run those 2 lines, it gives me the error as below:

Traceback (most recent call last):
  File "run2.py", line 9, in <module>
    data = Dataset.load_from_file('ml-latest-small/ratings.csv', reader=reader)
  File "/usr/local/lib/python2.7/site-packages/surprise/dataset.py", line 173, in load_from_file
    return DatasetAutoFolds(ratings_file=file_path, reader=reader)
  File "/usr/local/lib/python2.7/site-packages/surprise/dataset.py", line 306, in __init__
    self.raw_ratings = self.read_ratings(self.ratings_file)
  File "/usr/local/lib/python2.7/site-packages/surprise/dataset.py", line 205, in read_ratings
    itertools.islice(f, self.reader.skip_lines, None)]
  File "/usr/local/lib/python2.7/site-packages/surprise/dataset.py", line 455, in parse_line
    return uid, iid, float(r) + self.offset, timestamp
ValueError: could not convert string to float: rating

Is it possible to load the custom data from a CSV file successfully?
Thank you.

Not able to load the dumped predictions

Hi, I am using this library to predict ratings. I am able to dump the predictions using :
surprise.evaluate(algo,data,with dump=True,verbose=2) This command dumps my prediction and also prints the prediction. But I want to construct a dataframe from those predictions .I am using dump.load to load the dumped field but its throwing an AttributeError: 'function' object has no attribute 'dump',
Again, Even If I try to dump my predictions using dump.dump('./dump_file', predictions, algo) it is throwing the same error. Can you tell me a way out for this?
I have referred this example : http://nbviewer.jupyter.org/github/NicolasHug/Surprise/blob/master/examples/notebooks/KNNBasic_analysis.ipynb . Everything can get sorted if the "dump and load " thing works.

Online Learning

I am making a movie recommendation website. I am using matrix factorization using SVD.
I would be having new users signing up and rating movies. I am not sure how to go about the online learning part of it. Can you give me a suggestion ? Does the library have features that make this easier ?

Binarizing dataset without rating

Hi,
I wanted to ask about giving binarized dataset to Surprise which doesn't have any rating. For example in ml-100k dataset a user either watched a movie or not without considering any rating..
For this purpose I converted:

196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806

to this:

and then I read the dataset like this:

reader = Reader(line_format='user item rating', sep='\t', rating_scale=(1, 1))

Which it gave me all zero in the result and I know I'm doing it wrong.
Or should I also add not rated (and not watched) movies with 0 rating and read the dataset with range_scale=(0, 1)?

reader = Reader(line_format='user item rating', sep='\t', rating_scale=(0, 1))

Can I even do this in Surprise?

Thanks

Create DataFrame from database with timestamp

Hi,

Im using Dataset.load_from_df as you said, but i need put the timestamp column inside dictionary. But reading the source code it only have a comment about the 3 columsn (item, user, rating).

It has support for timestamp?

Thanks for any help

[Feature Request] Recommendation strategy to recommend the items with the 10 highest estimation

Hi Nicolas,
I have been using the Surprise Package for a few days now. Great job +1:
In your todo list, you have an item "Implement some recommendation strategy (like recommend the items with the 10 highest estimation)". Do you have an idea as to when it will be released, approximately? Any info in regards to this would be really helpful.

What would be your basic strategy to implement this feature?

I wrote a function for SVD to return a list of top K recommendations. I follow these steps to get the list,

Iterate through all items for a specific user
Run predict function for items that don't have a rating ( Item for a user without rating)
Sort top K and return them as recommendations.

However, the function is returning same or almost similar 10 / 20 recommendations irrespective of the user.

Is the above strategy wrong? Any thoughts on why I am getting almost similar recommendations?

Thanks :)

Unable to retrieve Adjacency Matrix after applying, or to test Recommendation manually

If i have a M x N features, I want to be able to conver this into M x M matrix to be useful

AttributeError: module 'surprise.dataset' has no attribute 'load_from_df'

Hello,

I have updated dataset.py in my library, so it does include load_from_df(), however when I try to call it, I get the above attribute error.

My data set is the standard three column data containing user id, item id, rating.

Any advise would be appreciated.

Thanks,
Mayank

Can we load data using pandas data frame?

Hi,
In the code example, Input data is taken from a file.

file_path = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/u.data')

reader = Reader(line_format='user item rating timestamp', sep='\t')

data = Dataset.load_from_file(file_path, reader=reader)

can we load pandas data frame in the place of file-Input ? please share a code snippet if yes .

How 'RMSE', 'MAE' helps in building a correct Rc system?

Theoretically, I have gone through the RMSE and MAE but I did not understand how they help us in building a correct Rc system ? Can you help in explaining those Rmse and MAE in simple words.
when I use my own data set which looks like movielens data.I'm getting following values

    Fold 1  Fold 2  Fold 3  Mean

MAE 5.4207 5.4768 5.4711 5.4562
RMSE 6.4610 6.5579 6.6052 6.5414

Read in dataframe instead of file

Hi!

Are there any plans to allow reading in pandas dataframes or numpy matrices or even strings instead of files?

Thanks!
Ken

Can we use it with Spark?

Hi,

Can we use it with Spark?

If yes, do you have any example?

1 - I have a database of 76m rows(10.5mb), i can't predict in my machine (macbook with 8gb - ssd). Im processing on AWS with a machine with 64gb RAM. It is normal?
2 - It consumes a lot of memory? With spark it can be solved?

Thanks.

Accuracy behavior is strange

Hi,

With my simple test, the accuracy behavior is very strange:

DATASET [user, item, rating]:

PRECISION IN 10x EXECUTIONS:

0.00771723811595
0.0278066509315
0.0170203594873
0.111386411745
0.103972524222
0.0
0.0134623913693
0.0974465112307
0.0574640800362
0.0122466495424

Why the precision change between 0 to 97% on every execution?

Explicit vs Implicit Feedback

Is there an option available to account for "implicit" feedback (purchases, clicks) instead of "explicit" feedback ("ratings") ?

Some more information is here:
https://spark.apache.org/docs/2.1.0/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback

Thanks!

Great package by-the-way.

Problem between decimal and float value

Hi,

I made the rating for my dataset, and im selecting all from the table of database to make the prediction.

My ratings are between 0.0 and 10.0, but after some time processing i have error:

Traceback (most recent call last):
  File "process.py", line 112, in <module>
    predictions = algo.test(testset, verbose=False)
  File "/usr/local/lib/python2.7/dist-packages/surprise/prediction_algorithms/algo_base.py", line 152, in test
    for (uid, iid, r_ui_trans) in testset]
  File "/usr/local/lib/python2.7/dist-packages/surprise/prediction_algorithms/algo_base.py", line 99, in predict
    est = self.estimate(iuid, iiid)
  File "surprise/prediction_algorithms/matrix_factorization.pyx", line 243, in surprise.prediction_algorithms.matrix_factorization.SVD.estimate (surprise/prediction_algorithms/matrix_factorization.c:4637)
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
Makefile:60: recipe for target 'importer-user-prod' failed
make: *** [importer-user-prod] Error 1

gcc failed when I run basic_usage.py

File "D:\python\lib\distutils\command\build_ext.py", line 499, in build_extension
depends=ext.depends)
File "D:\python\lib\distutils\ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "D:\python\lib\distutils\cygwinccompiler.py", line 166, in _compile
raise CompileError, msg
ImportError: Building module surprise.similarities failed: ["CompileError: command 'gcc' failed: No such file or directory\n"]

[Feature] GridSearchCV for Surprise algorithms

Add the ability for a user to easily test an algorithm with different parameters similar to GridSearchCV of sklearn.

One can give an algorithm a dictionary of the different parameters to try and it generates the best combination of parameters based on some error measurements. Since the data model of Surprise does not directly plug into GridSearchCV an analogous functionality should be added.

Use rating data in a pandas data frame

Hi,

Just wondering if there is a way in surprise to use rating data that is already inside a pandas data frame?

Regards,
Longhow

Recommendation for a specific user

How to get recommendation for a single single user.
I mean without calculating prediction for all users.
Is it possible to get data from RDBMS?

MemoryError File "surprise/similarities.pyx", line 143, in surprise.similarities.msd (surprise/similarities.c:3524)

how to solve this?

"Surprise" module not recognized in Jupyter notebook

Dear Nicolas,

I'm a newbie to recommender systems and found your package to kickstart with basic examples.

Your kind help in helping me resolving this issue presented in the snapshot will be appreciated.

Thanks.
Bala

SVD variants - dump user/item features

Is there an easy way to dump the user (pu) and item (qi) features after an SVD variant training, e.g. like

from surprise import dump
algo_svd = SVD()
...
dump.dump('./dump_SVD_features', get_features_somehow, algo_svd)

How to make top N recommendations

It is still not clear to me how to make top N recommendations using surprise.

I ran the simple sample code described below to create a SVD-based recommender and now I would like to make top N recommendations to a user.

I am using a rating scale in the interval [0, 1].

import os
from os.path import dirname
import pandas as pd  
from surprise import SVD
from surprise import Dataset, Reader
from surprise import evaluate, print_perf
from surprise import GridSearch
from surprise.dump import dump
from surprise import KNNBasic


# Entrar endereço da tabela de ratings.
file_path = dirname(os.path.abspath(os.getcwd())) + \
    "/Data/Compras_normalizado_media_surprise.csv"
print(file_path)

# Preparar e configurar o reader to scikit-surprise.
reader = Reader(line_format='user item rating', sep=';')


# Ler os dados na forma cliente, produto, rating.
data = Dataset.load_from_file(file_path, reader=reader)
data.split(n_folds=5)

# Apresentar as primeiras linhas da tabela.
data.raw_ratings[0:10]

# Preparar o algoritmo de recomendação, por exemplo, SVD.
algo = SVD()


# Avaliar a performance do algoritmo de recomendação.
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
print_perf(perf)

raise ValueError('line_format parameter is incorrect.')

When I try to change the inline parameter names I'm getting following error.Can't I change the inline parameter names ?

File "Reco.py", line 114, in
reader = Reader(line_format='user itemd ratinge', rating_scale=(0, 5))
File "/usr/local/lib/python2.7/dist-packages/surprise/dataset.py", line 423, in init
raise ValueError('line_format parameter is incorrect.')
ValueError: line_format parameter is incorrect.

k-NN-based algorithms use too much memory

I tried using this library to make k-NN-based recommendations for a dataset with about 10,000 items and 150,000 users, but when I do, my OS always kills the process running surprise for using too much memory. SVD and SVD++ work just fine, though.

Have you tested the knns module with that much data, and do you have any thoughts about what causes it in particular to fail? I glanced through the code and on the off-chance the issue is the call to sorted, it would be relatively simple to avoid sorting the entire set of neighbors, and instead just scan through them once, storing a running list of the k most similar.

How to calculate confusion matrix, precise and recall

Hello,

This library is amazing. So i really interested with this lib. Now i tried to build rec. system with collaborative filtering. I use some of library in there to build a model. Then i need to calculate performance of algorithm that I used for. Is it possible to get confusion matrix to calculate precise and recall?

Thank you,

Serializing recommenders

It is not clear to me if is possible to serialize a recommender for later use using the surprise.dump.dump function, if it is the case I would change the documentation to be more explicit about this

nicolashug / surprise Goto Github PK

surprise's Introduction

Overview

Getting started, example

Benchmarks

Installation

License and reference

Contributors

Development Status

surprise's People

Contributors

Stargazers

Watchers

Forkers

surprise's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs