GithubHelp home page GithubHelp logo

statisticianinstilettos / recmetrics Goto Github PK

View Code? Open in Web Editor NEW
561.0 15.0 100.0 5.85 MB

A library of metrics for evaluating recommender systems

License: MIT License

Jupyter Notebook 98.56% Python 1.39% Makefile 0.03% Dockerfile 0.02%

recmetrics's Introduction

recmetrics

A python library of evalulation metrics and diagnostic tools for recommender systems.

**This library is actively maintained. My goal is to continue to develop this as the main source of recommender metrics in python. Please submit issues, bug reports, feature requests or contribute directly through a pull request. If I do not respond you can ping me directly at [email protected] **

Description Command
Installation pip install recmetrics
Notebook Demo make run_demo
Test make test

Full documentation coming soon.... In the interim, the python notebook in this repo, example.ipynb, contains examples of these plots and metrics in action using the MovieLens 20M Dataset. You can also view my Medium Article.

This library is an open source project. The goal is to create a go-to source for metrics related to recommender systems. I have begun by adding metrics and plots I found useful during my career as a Data Scientist at a retail company, and encourage the community to contribute. If you would like to see a new metric in this package, or find a bug, or have suggestions for improvement, please contribute!

Long Tail Plot

recmetrics.long_tail_plot()

The Long Tail plot is used to explore popularity patterns in user-item interaction data. Typically, a small number of items will make up most of the volume of interactions and this is referred to as the "head". The "long tail" typically consists of most products, but make up a small percent of interaction volume.

Long Tail Plot

The items in the "long tail" usually do not have enough interactions to accurately be recommended using user-based recommender systems like collaborative filtering due to inherent popularity bias in these models and data sparsity. Many recommender systems require a certain level of sparsity to train. A good recommender must balance sparsity requirements with popularity bias.

Mar@K and Map@K

recmetrics.mark()

recmetrics.mark_plot()

recmetrics.mapk_plot()

Mean Average Recall at K (Mar@k) measures the recall at the kth recommendations. Mar@k considers the order of recommendations, and penalizes correct recommendations based on the order of the recommendations. Map@k and Mar@k are ideal for evaluating an ordered list of recommendations. .

Mar@k

Map@k and Mar@k metrics suffer from popularity bias. If a model works well on popular items, the majority of recommendations will be correct, and Mar@k and Map@k can appear to be high while the model may not be making useful or personalized recommendations.

Coverage

recmetrics.prediction_coverage()

recmetrics.catalog_coverage()

recmetrics.coverage_plot()

Coverage is the percent of items that the recommender is able to recommend. It is depicted by this formula.

Coverage Equation

Where 'I' is the number of unique items the model recommends in the test data, and 'N' is the total number of unique items in the training data. The catalog coverage is the rate of distinct items recommended over a period of time to the user. For this purpose the catalog coverage function take also as parameter 'k' the number of observed recommendation lists. In essence, both of metrics quantify the proportion of items that the system is able to work with.

Coverage Plot

Novelty

recmetrics.novelty()

Novelty measures the capacity of a recommender system to propose novel and unexpected items which a user is unlikely to know about already. It uses the self-information of the recommended item and it calculates the mean self-information per top-N recommended list and averages them over all users.

Coverage Equation

Where the absolute U is the number of users, count(i) is the number of users consumed the specific item and N is the length of recommended list.

Personalization

recmetrics.personalization()

Personalization is the dissimilarity between user's lists of recommendations. A high score indicates user's recommendations are different). A low personalization score indicates user's recommendations are very similar.

For example, if two users have recommendations lists [A,B,C,D] and [A,B,C,Y], the personalization can be calculated as:

Coverage Plot

Intra-list Similarity

recmetrics.intra_list_similarity()

Intra-list similarity uses a feature matrix to calculate the cosine similarity between the items in a list of recommendations. The feature matrix is indexed by the item id and includes one-hot-encoded features. If a recommender system is recommending lists of very similar items, the intra-list similarity will be high.

Coverage Plot

Coverage Plot

MSE and RMSE

recmetrics.mse()
recmetrics.rmse()

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used to evaluate the accuracy of predicted values that such as ratings compared to the true value, y. These can also be used to evalaute the reconstruction of a ratings matrix.

MSE Equation

RMSE Equation

Predicted Class Probability Distribution Plots

recmetrics.class_separation_plot()

This is a plot of the distribution of the predicted class probabilities from a classification model. The plot is typically used to visualize how well a model is able to distinguish between two classes, and can assist a Data Scientist in picking the optimal decision threshold to classify observations to class 1 (0.5 is usually the default threshold for this method). The color of the distribution plots represent true class 0 and 1, and everything to the right of the decision threshold is classified as class 0.

binary class probs

This plot can also be used to visualize the recommendation scores in two ways.

In this example, and item is considered class 1 if it is rated more than 3 stars, and class 0 if it is not. This example shows the performance of a model that recommends an item when the predicted 5-star rating is greater than 3 (plotted as a vertical decision threshold line). This plot shows that the recommender model will perform better if items with a predicted rating of 3.5 stars or greater is recommended.

ratings scores

The raw predicted 5 star rating for all recommended movies could be visualized with this plot to see the optimal predicted rating score to threshold into a prediction of that movie. This plot also visualizes how well the model is able to distinguish between each rating value.

ratings distributions

ROC and AUC

recmetrics.roc_plot()

The Receiver Operating Characteristic (ROC) plot is used to visualize the trade-off between true positives and false positives for binary classification. The Area Under the Curve (AUC) is sometimes used as an evaluation metrics.

ROC

Recommender Precision and Recall

recmetrics.recommender_precision()
recmetrics.recommender_recall()

Recommender precision and recall uses all recommended items over all users to calculate traditional precision and recall. A recommended item that was actually interacted with in the test data is considered an accurate prediction, and a recommended item that is not interacted with, or received a poor interaction value, can be considered an inaccurate recommendation. The user can assign these values based on their judgment.

Precision and Recall Curve

recmetrics.precision_recall_plot()

The Precision and Recall plot is used to visualize the trade-off between precision and recall for one class in a classification.

PandRcurve

Confusion Matrix

recmetrics.make_confusion_matrix()

Traditional confusion matrix used to evaluate false positive and false negative trade-offs.

PandRcurve

recmetrics's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recmetrics's Issues

dev dependencies breaking installation

» poetry add recmetrics   
Using version ^0.1.5 for recmetrics

Updating dependencies
Resolving dependencies... (0.2s)

Because no versions of recmetrics match >0.1.5,<0.2.0
 and recmetrics (0.1.5) depends on pytest-cov (>=2.10.1,<3.0.0), recmetrics (>=0.1.5,<0.2.0) requires pytest-cov (>=2.10.1,<3.0.0).
So, because jewel-ml-models depends on both recmetrics (^0.1.5) and pytest-cov (^4.0.0), version solving failed.

pytest-cov is a development dependency, it shouldn't break like this. You can easily solve this by installing pytest-cov as development dependency. There are others as well... ipython maybe? Jupyter and twine too

License

This is missing a license. You can use https://tldrlegal.com/ for an overview. The top-3 are MIT, BSD and GPL (see my analysis).

The simplest way to add it is in the setup.py as license='MIT' or similar.

Unused Requirement

Surprise is listed as a module dependency but is not used in metrics or plots. Might be worth removing the dependency - especially since it requires additional built tools (Visual C++) and thus may throw unnecessary errors.

Unable to import recmetrics

I am working on a recommendation engine using collaborative filtering and wanted to try the metrics provided by recmetrics. Here, the error I get trying to import the package (version 0.0.12).

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-309-301854677c00> in <module>
----> 1 import recmetrics
      2 
      3 recmetrics.long_tail_plot()

~/.virtualenvs/py3/lib/python3.6/site-packages/recmetrics/__init__.py in <module>
----> 1 from .plots import long_tail_plot, mark_plot, mapk_plot, coverage_plot, class_separation_plot, roc_plot, precision_recall_plot
      2 from .metrics import mark, coverage, personalization, intra_list_similarity, rmse, mse, make_confusion_matrix, recommender_precision, recommender_recall

~/.virtualenvs/py3/lib/python3.6/site-packages/recmetrics/plots.py in <module>
      5 from matplotlib.lines import Line2D
      6 from sklearn.metrics import roc_curve, auc, precision_recall_curve, average_precision_score
----> 7 from sklearn.utils.fixes import signature
      8 
      9 

ImportError: cannot import name 'signature'

Coverage over 100%

In the example bellow, the coverage measured exceeds 100%, which does not make sense.

This happens when items that are not listed on the catalog are recommended.

> from rcmetrics import prediction_coverage
> prediction_coverage([['x', 'y'], ['w', 'z']], catalog=['w', 'x', 'y'])
133.33

module 'recmetrics' has no attribute 'prediction_coverage'

Hi there
I am trying to run example notebook. But I am getting 'module 'recmetrics' has no attribute 'prediction_coverage'' and "attribute error: module 'recmetrics' has no attribute 'catalog_coverage'"

any pointer or suggestion.

Thanks in advance

Is surprise really required?

First of all: this package looks great! It's exactly what I need for some small projects, so thanks for putting it out there!

I'm looking at the setup.py, and it lists surprise as a requirement. I don't see it imported anywhere in the package though, so I'm wondering if it can be removed? I get that it's useful for the example notebook, but that wouldn't be included in the pip install anyway. (I might suggest making surprise an extras_require if you want to keep it in there for demo purposes.)

If you're open to some packaging changes along these lines, I'd be happy to send a PR your way.

Personalization metric calculation optimization

Hi @statisticianinstilettos,

kudos for a great tool!
I would like to propose an optimization for calculating Personalization Metric here:

#get indicies for upper right triangle w/o diagonal
upper_right = np.triu_indices(similarity.shape[0], k=1)

#calculate average similarity
personalization = np.mean(similarity[upper_right])
return 1-personalization

There is no need to get the upper triangle indices, as the cosine similarity is a symmetric distance.
I will follow up with a pull request for this.

mapk shouldn't require actual and predicted have the same length

This assertion check is incorrect. The actual parameter as used in _apk is expecting a list of true items and the predicted parameter is expecting a list of predicted items that can be true or false. See an example below where only A-C are true items and the prediction can be longer than the true list because it can contain false items.

if len(actual) != len(predicted):
raise AssertionError("Length mismatched")

true_items = ["A","B","C"]
prediction = ["A","Z","B","X"]
metrics.mapk(actual=true_items, predicted=prediction, k = 3)

ImportError: cannot import name 'signature'

Importing the repository does not work. I am getting the following error ImportError: cannot import name 'signature'

pip install recmetrics
import recmetrics as re

The problem is this import from sklearn.utils.fixes import signature.

Update PyPI pacakge

The current version needs to be updated as the packages depends on deprecated/removed functionality from different dependencies.

personalization() has explosive memory requirements due to pairwise comparison

On my system (16gb ram), a list of 10k recommendations will run. A list of 50k will crash out. I'd like to try to understand the personalization score across my entire hypothetical customer base 250k+.

Is there a way to chunk the scipy.sparse.csr_matrix and iteratively calculate the cosine similarity to avoid holding the whole thing in memory?

Integration with Deep Learning Based Frameworks

Is there any way to integrate this with recommender system frameworks that involve more deep learning-based algorithms such as PyTorch etc.? Sci-Kit Learn's with Surprise doesn't really support such algorithms

Installation issues

Hi! Have been trying to install recmetrics with "pip install recmetrcis", keep getting an error "ERROR: Could not build wheels for scikit-learn, which is required to install pyproject.toml-based projects". I'm using Windows, Python version 3.9.7, pip all upgraded. pip freeze shows that scikit-learn is actually already installed: "scikit-learn==0.24.2". I've also tried installing with pip from git, same result. Any ideas what I could still try?

TypeError on class_separation_plot of example notebook

I attached the error below

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-05160122655c> in <module>
----> 1 recmetrics.class_separation_plot(pred_df, n_bins=45, class0_label="True class 0", class1_label="True class 1")

TypeError: class_separation_plot() got an unexpected keyword argument 'class0_label'

Slows down even if x_labels=False

First, thank you for providing this great libary! I faced an issue on rather large data sets, in particular when option x_labels is set to False. Suggests to insert an
if x_labels == True: before, similar as it is done on bottom of function. Because, whe I don't want to plot labels, why should plt.xticks(x) be executed?

plt.xticks(x)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.