GithubHelp home page GithubHelp logo

python-recsys's Introduction

python-recsys

A python library for implementing a recommender system.

Installation

Dependencies

python-recsys is build on top of Divisi2, with csc-pysparse (Divisi2 also requires NumPy, and uses Networkx).

python-recsys also requires SciPy.

To install the dependencies do something like this (Ubuntu):

sudo apt-get install python-scipy python-numpy
sudo apt-get install python-pip
sudo pip install csc-pysparse networkx divisi2

# If you don't have pip installed then do:
# sudo easy_install csc-pysparse
# sudo easy_install networkx
# sudo easy_install divisi2

Download

Download python-recsys from github.

Install

tar xvfz python-recsys.tar.gz
cd python-recsys
sudo python setup.py install

Example

  1. Load Movielens dataset:
from recsys.algorithm.factorize import SVD
svd = SVD()
svd.load_data(filename='./data/movielens/ratings.dat', 
            sep='::', 
            format={'col':0, 'row':1, 'value':2, 'ids': int})
  1. Compute Singular Value Decomposition (SVD), M=U Sigma V^t:
k = 100
svd.compute(k=k, 
            min_values=10, 
            pre_normalize=None, 
            mean_center=True, 
            post_normalize=True, 
            savefile='/tmp/movielens')
  1. Get similarity between two movies:
ITEMID1 = 1    # Toy Story (1995)
ITEMID2 = 2355 # A bug's life (1998)

svd.similarity(ITEMID1, ITEMID2)
# 0.67706936677315799
  1. Get movies similar to Toy Story:
svd.similar(ITEMID1)

# Returns: <ITEMID, Cosine Similarity Value>
[(1,    0.99999999999999978), # Toy Story
 (3114, 0.87060391051018071), # Toy Story 2
 (2355, 0.67706936677315799), # A bug's life
 (588,  0.5807351496754426),  # Aladdin
 (595,  0.46031829709743477), # Beauty and the Beast
 (1907, 0.44589398718134365), # Mulan
 (364,  0.42908159895574161), # The Lion King
 (2081, 0.42566581277820803), # The Little Mermaid
 (3396, 0.42474056361935913), # The Muppet Movie
 (2761, 0.40439361857585354)] # The Iron Giant
  1. Predict the rating a user (USERID) would give to a movie (ITEMID):
MIN_RATING = 0.0
MAX_RATING = 5.0
ITEMID = 1
USERID = 1

svd.predict(ITEMID, USERID, MIN_RATING, MAX_RATING)
# Predicted value 5.0 

svd.get_matrix().value(ITEMID, USERID)
# Real value 5.0 
  1. Recommend (non-rated) movies to a user:
svd.recommend(USERID, is_row=False) #cols are users and rows are items, thus we set is_row=False

# Returns: <ITEMID, Predicted Rating>
[(2905, 5.2133848204673416), # Shaggy D.A., The
 (318,  5.2052108435956033), # Shawshank Redemption, The
 (2019, 5.1037438278755474), # Seven Samurai (The Magnificent Seven)
 (1178, 5.0962756861447023), # Paths of Glory (1957)
 (904,  5.0771405690055724), # Rear Window (1954)
 (1250, 5.0744156653222436), # Bridge on the River Kwai, The
 (858,  5.0650911066862907), # Godfather, The
 (922,  5.0605327279819408), # Sunset Blvd.
 (1198, 5.0554543765500419), # Raiders of the Lost Ark
 (1148, 5.0548789542105332)] # Wrong Trousers, The
  1. Which users should see Toy Story? (e.g. which users -that have not rated Toy Story- would give it a high rating?)
svd.recommend(ITEMID)

# Returns: <USERID, Predicted Rating>
[(283,  5.716264440514446),
 (3604, 5.6471765418323141),
 (5056, 5.6218800339214496),
 (446,  5.5707524860615738),
 (3902, 5.5494529168484652),
 (4634, 5.51643364021289),
 (3324, 5.5138903299082802),
 (4801, 5.4947999354188548),
 (1131, 5.4941438045650068),
 (2339, 5.4916048051511659)]

Documentation

Documentation and examples available here.

To create the HTML documentation files from doc/source do:

cd doc
make html

HTML files are created here:

doc/build/html/index.html

python-recsys's People

Contributors

fcurella avatar ocelma avatar robottwo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-recsys's Issues

IndexError: Error creating second index list

hi

I have this
user=1
stars=nn.get("rating")
movie_id=nn.get("movieId")
id_user=nn.get("userId")
tupla=(stars,movie_id,id_user) #if tupla=(4.0, 1193, 2)
data.add_tuple(tupla)

and I have this problem when:
recomendaciones= svd.recommend(user, n=10, only_unknowns=True, is_row=False)

ERROR

File "/usr/local/lib/python2.7/dist-packages/pysparse/sparse/pysparseMatrix.py", line 149, in getitem
m = self.matrix[index]
IndexError: Error creating second index list
image

Multiple Values

How do I add multiple values instead of just ratings in the module?

RuntimeWarning: invalid value encountered in divide ----Special characters in user_id

hi, please I need to do recommender with id_user="X3escB3aJ_rP1u5DaTN9cw" but It does not allow it because the character hyphen.
in the case id_user="X3escB3aJ_rP1u5DaTN9cw"

"X3escB3aJ_rP1u5DaTN9cw
Creating matrix (426351 tuples)
Matrix density is: 0.0237%
Updating matrix: squish to at least 10 values
Computing svd k=10, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True
[WARNING] mean_center is True. svd.similar(...) might return nan's. If so, then do svd.compute(..., mean_center=False)
/root/anaconda2/lib/python2.7/site-packages/divisi2/dense.py:269: RuntimeWarning: invalid value encountered in divide
return self / norms
Traceback (most recent call last):
File "ejemplo.py", line 47, in
recomendaciones= svd.recommend(id_user, n=10,is_row=False)
File "build/bdist.linux-x86_64/egg/recsys/algorithm/factorize.py", line 352, in recommend
File "build/bdist.linux-x86_64/egg/recsys/algorithm/factorize.py", line 300, in _get_col_reconstructed
File "/root/anaconda2/lib/python2.7/site-packages/divisi2/labels.py", line 65, in col_named
return self[:,self.col_index(label)]
File "/root/anaconda2/lib/python2.7/site-packages/divisi2/labels.py", line 57, in col_index
return self.col_labels.index(label)
KeyError: 'X3escB3aJ_rP1u5DaTN9cw'"

Thanks

replace csc-pyparse with SciPy

while I was trying to install csc-pyparse I got this error

pysparse/sparse/src/spmatrixmodule.c:1:20: fatal error: Python.h: No such file or directory

so it seems that csc-pyparse couldn't be installed for newer versions of python (python 3) check here

so could you port python-recsys to SciPy as an alternative or solve this problem please :/ .

AveragePrecision in recsys.evaluation.ranking

Hi Oscar,
The calculation of AveragePrecision in recsys.evaluation.ranking is not correct. The returned value should be sum(p_at_k)/number of relevant items, rather than sum(p_at_k)/hits.

In your document --> evaluation, the corresponding part also needs to be changed.

from recsys.evaluation.ranking import AveragePrecision

ap = AveragePrecision()

GT = [1,2,3,4,5]
q = [1,3,5]
ap.load(GT, q)
ap.compute() # returns 1.0, should return 0.6

GT = [1,2,3,4,5]
q = [99,3,5]
ap.load(GT, q)
ap.compute() # returns 0.5833335, should return 0.23333

Kind Regards,
Siqi

Can't install csc

I pip install the csc module, and can import recsys after that.
But everytime csc is imported, this error show up:
ImportError: No module named csc

ValueError: No data set, Matrix is empty!

I have tried some data set, but if I call the functions svd.similarity() or svd.recommend(), the ouput of the console is :

Traceback (most recent call last):
File "recsys_data.py", line 20, in
svd.compute(k=k, min_values=5, pre_normalize=None, mean_center=True, post_normalize=True,savefile=None)
File "/usr/local/lib/python2.7/dist-packages/python_recsys-0.2-py2.7.egg/recsys/algorithm/factorize.py", line 244, in compute
super(SVD, self).compute(min_values)
File "/usr/local/lib/python2.7/dist-packages/python_recsys-0.2-py2.7.egg/recsys/algorithm/baseclass.py", line 126, in compute
raise ValueError('No data set. Matrix is empty!')
ValueError: No data set. Matrix is empty!

I want to konw why ?

SVD.predict MIN_VALUE parameter does not work if equal to 0.0

A piece of code:

if MIN_VALUE:
predicted_value = max(predicted_value, MIN_VALUE)
if MAX_VALUE:
predicted_value = min(predicted_value, MAX_VALUE)

0.0 is casted to False (ex.: "print "True" if 0.0 else "False" prints "False"), therefore you can still get results <0 even if you specify MIN_VALUE=0.0.

Factorise.py file showing indentation error in each line after line 15 even though there is not any

After doing step 1 (load data set) it shows error in line 15 and onwards of factorise.py(i am new to python)
is there something i am doing wrong
1)i installed python first
2)then pip and dependies
3)then placed movie lens data set in respective places
4)install python-recys-master

Now when i run first step this error
even though i have installed divis2
::

Traceback (most recent call last):
File "load.py", line 1, in
from recsys.algorithm.factorize import SVD
File "/home/abhimanyu/Desktop/python-recsys-master/recsys/algorithm/factorize.py", line 15, in
from csc import divisi2
ImportError: No module named csc

SVD.compute() kernel fail on Windows

Used the Movielens SVD example in my class tonight to show them a rec system.

People (including me) who were using a MAC went though the tutorial without issues.

All people on Windows (7 and 8.1): the kernel fails on the "svd.compute" step.

I believe everyone is using the 2.7+ Anaconda version of Python.

thoughts?

Getting error while laoding the data.

I want to try out the python-recsys but it is giving me the error while loading the movie ratings mail.

**>>> svd.load_data(filename='./movierates.dat', sep='::', format={'col':0,'row':1,'value':2,'ids': int})

Error (ID is not int) while reading: [u'userId', u'movieId', u'rating', u'timestamp']
Error while reading: [u'']**

GPL License

Is there a specific reason why the code is under GPL or are you open to change the license to something more permissive, like BSD or MIT?

Storing results for all dataset in json file

From the code, we can get movies similar to a particular movie. So I want to store this data of similar movies in different json files. Like one json file for movies similar to some movie. And I want to do the same for rest of the recommendations.
I have a way to do it using a csv file dataset. But I am unable to do it in .dat format.

Can someone please help!

Python 3 support

Hey,

are there any plans to move towards Python 3? Dependencies are obsolete (divisi2 can be easily replaced by scipy and csc-pysparse as well...) .

Thanks

Can we load data using pandas dataframe?

Hi,
In the code example the Input data is taken from a file.

svd.load_data(filename='./data/movielens/ratings.dat',
            sep='::',
            format={'col':0, 'row':1, 'value':2, 'ids': int}) 

can we load pandas dataframe in the place of file-Input ? please share a code snippet if yes.

How to increase the number of similarity/recommending item results

Hi, now I am testing recsys algorithm for only similar users finding. Each output result contains 10 users with similarity scores (actually it has 9 similar users and 1 search/target user). I want to increase the number of results from 10 to 50. I couldn't find about it in documentation of parameter setting . Could you please give me a direction?

No data set. Matrix is empty!I

According to the python-recsys v1.0 documentation Algorithms,i put movielens-1M ratiing.dat in /usr/local/python-recsys-master/recsys/data/movielens/,
,then i load data,!!!
but when i comput,ValueError: No data set. Matrix is empty!I want to konw why ?

from recsys.algorithm.factorize import SVD
filename = '/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat'
svd = SVD()
svd.load_data(filename=filename, sep='::', format={'col':0, 'row':1, 'value':2, 'ids':int})

from recsys.datamodel.data import Data
from recsys.algorithm.factorize import SVD
filename = '/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat'
data = Data()
format = {'col':0, 'row':1, 'value':2, 'ids': int}
data.load(filename, sep='::', format=format)
train, test = data.split_train_test(percent=80)
svd = SVD()
svd.set_data(train)

from recsys.utils.svdlibc import SVDLIBC
svdlibc = SVDLIBC('/usr/local/python-recsys-master/recsys/data/movielens/ratings.dat')
svdlibc.to_sparse_matrix(sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})
svdlibc.compute(k=100)
svd = svdlibc.export()

!!!
but when i comput,ValueError: No data set. Matrix is empty!I want to konw why ?

K=100
svd.compute(k=K, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True, 
savefile=None)
Traceback (most recent call last):
File "", line 3, in 
File "recsys/algorithm/factorize.py", line 244, in compute
super(SVD, self).compute(min_values)
File "recsys/algorithm/baseclass.py", line 126, in compute
raise ValueError('No data set. Matrix is empty!')
ValueError: No data set. Matrix is empty!

How to score just one new user at a time

I need to be able to score one new user without having to recalculate the model. For example, we'll be loading a few million historical records of customers and what they bought. We can do this offline and are not too worried about run-time.

But then, each time a new user comes into our system, we need to score them and make recommendations without rebuilding the model. How do we do that?

Just as a mock-up, it would be something like this:

 svd.load_data(filename=sFileSource, sep=',', format=dictFormat)
 k = 5 # Number of clusters
 svd.compute(
     k=k, min_values=3, pre_normalize=None, 
     mean_center=False,
     post_normalize=True, savefile=sFileTarget
 )

Where we can do this overnight and are not worried if it takes a while. But then, when a new, never-seen user comes in, we can put in the products they bought and get a recommendation. Sort of like this code, which does not do what I'm describing, but emulates it:

svd.add_tuple((1.0, 1, 88))
svd.add_tuple((1.0, 2, 88))
svd.add_tuple((1.0, 61, 88))
svd.recommend(88, n=3, is_row=False, only_unknowns=True)

While we don't worry if the model build takes a while, we do need the recommendation of a new user to happen in about on second.

How can we go about doing that?

`recsys` package ownership on PyPI

Hi @ocelma ,

I currently "own" (but do not use) the recsys package name on PyPI. recsys was the name I wanted to use when I was developing Surprise, but I later realized that it would conflict with your own python-recsys package, whose import-time name is also recsys. (so I had to come up with a more far-fetched acronym! 😅).

@jiwidi recently reached out to me, asking if I'd be open to transferring the PyPI recsys name to him. I don't use the namespace myself so I'm happy to transfer; that being said, @jiwidi and I both agree that it's best to reach out to you first.

Is it fine with you for @jiwidi to own and use the recsys name on PyPI? (that means users will install and use the project with pip install recsys / import recsys). We also reached out to you via email a few weeks ago, so feel free to follow up there if you prefer.

Thank you!

How to?

Does anybody know how to cluster one set?
I have a bunch of products, I want to rank products to clusters of similar users.

I am collecting user preference data for products (rank 1-5).

How do I achieve this with python-recsys?

ImportError: No module named algorithm

when I run Movielens I get the error ImportError: No module named algorithm. I have installed all the packages execpt divisi2, it keeps failing to install. My python version is 2.7. Is there a way to make it work or to install divisi2 with another way other than pip?

Problem in load() method

If you do this:

rmse = RMSE()
rmse.load(x, y)

with x and y being numpy uni-dimensional arrays, you get this:

in compute(self)
98 Computes the evaluation using the loaded ground truth and test lists
99 """
--> 100 if not self._ground_truth:
101 raise ValueError('Ground Truth dataset is empty!')
102 if not self._test:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I fixed it this way:

    if self._ground_truth is None:
        raise ValueError('Ground Truth dataset is empty!')
    if self._test is None:
        raise ValueError('Test dataset is empty!')

KeyError in svd.recommend()

I have a dat-file with 3705912 lines, matrix 602x6156. When I call svd.recommend for identifiers have little values I get KeyError.

svd.get_matrix().get_col(50652)

SparseVector (4 of 602 entries): [6840789=14, 6843925=100, 6843926=100, 6843927=16]


svd.recommend(50652, is_row=False)

Traceback (most recent call last):
  File "svd.py", line 251, in <module>
    print svd.recommend(50652, is_row=False)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 352, in recommend
    item = self._get_col_reconstructed(i, zeros)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 300, in _get_col_reconstructed
    return self._matrix_reconstructed.col_named(j)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 65, in col_named
    return self[:,self.col_index(label)]
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 57, in col_index
    return self.col_labels.index(label)
KeyError: 50652

But for identifiers have many values

svd.get_matrix().get_col(10536)

SparseVector (22 of 602 entries): [6840778=96, 6840779=100, 6840780=100, 6840781=100, 6840782=100, 6840783=100, 6840784=100, 6840785=100, 6840786=65, 6840818=83, 6840819=100, 6840820=100, 6840821=100, 6840822=100, 6840823=100, 6840824=100, 6840825=100, 6840826=100, 6840827=100, 6840828=21, ...]


svd.recommend(10536, is_row=False)

[(6900161, 100.00000000232269), (6840819, 100.00000000214945), (6840822, 100.00000000186564), (6840821, 100.00000000178625), (6840820, 100.0000000016603), (6840783, 100.00000000144556), (6840827, 100.00000000137024), (6840826, 100.00000000134551), (6840784, 100.00000000123218), (6840825, 100.00000000112296)]

Working with csv

since most data files are written is csv format ,can we use csv files in order to analytic examinations ?
moreover ,I found movielens dataset files in csv format.
thank u so much.

how to install in python3?

when i install in python3, this error is occur: " missing parentheses in call to 'print' "
please tell me how to do, thank you!

Loaded SVD model with only_unknowns=True, so need create_matrix(), but it finds 0 tuples

When I run this code:

svd.recommend(x, n=3, is_row=False, only_unknowns=True)

I get this error:

ValueError: Matrix is empty! If you loaded an SVD model you can't use only_unknowns=True, unless svd.create_matrix() is called

And it's very right: I'm loading with this code:

svd = SVD(filename=sFileTarget) # Loading already computed SVD model

Which I had previously generated with this code:

svd = SVD()
svd.load_data(filename=sFileSource, sep=',', format=dictFormat)
k = 5 # Number of clusters
svd.compute(
    k=k, min_values=3, pre_normalize=None, 
    mean_center=False,
    post_normalize=True, savefile=sFileTarget
)

But when I use create_matrix(), I get this:

>>> svd = SVD(filename=sFileTarget) # Loading already computed SVD model
>>> svd.create_matrix()
Creating matrix (0 tuples)
Matrix density is: None%

And nothing works from there, of course.

What might the solution be?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.