GithubHelp home page GithubHelp logo

mobiletelesystems / rectools Goto Github PK

View Code? Open in Web Editor NEW
237.0 7.0 23.0 3.39 MB

RecTools - library to build Recommendation Systems easier and faster than ever before

License: Apache License 2.0

Makefile 0.25% Python 99.74% Dockerfile 0.01%
deep-learning machine-learning personalization recomendations recommendation-system recsys recommendation-algorithms recommendation-engine recommender-system

rectools's Introduction

RecTools

Python versions PyPI Docs

License Coverage Tests

Contributors Telegram

RecTools is an easy-to-use Python library which makes the process of building recommendation systems easier, faster and more structured than ever before. It includes built-in toolkits for data processing and metrics calculation, a variety of recommender models, some wrappers for already existing implementations of popular algorithms and model selection framework. The aim is to collect ready-to-use solutions and best practices in one place to make processes of creating your first MVP and deploying model to production as fast and easy as possible.

For more details, see the Documentation and Tutorials.

Get started

Prepare data with

wget https://files.grouplens.org/datasets/movielens/ml-1m.zip
unzip ml-1m.zip
import pandas as pd
from implicit.nearest_neighbours import TFIDFRecommender
    
from rectools import Columns
from rectools.dataset import Dataset
from rectools.models import ImplicitItemKNNWrapperModel

# Read the data
ratings = pd.read_csv(
    "ml-1m/ratings.dat", 
    sep="::",
    engine="python",  # Because of 2-chars separators
    header=None,
    names=[Columns.User, Columns.Item, Columns.Weight, Columns.Datetime],
)
    
# Create dataset
dataset = Dataset.construct(ratings)
    
# Fit model
model = ImplicitItemKNNWrapperModel(TFIDFRecommender(K=10))
model.fit(dataset)

# Make recommendations
recos = model.recommend(
    users=ratings[Columns.User].unique(),
    dataset=dataset,
    k=10,
    filter_viewed=True,
)

Installation

RecTools is on PyPI, so you can use pip to install it.

pip install rectools

The default version doesn't contain all the dependencies, because some of them are needed only for specific models. Available user extensions are the following:

  • lightfm: adds wrapper for LightFM model,
  • torch: adds models based on neural nets,
  • visuals: adds visualization tools,
  • nmslib: adds fast ANN recommenders.

Install extension:

pip install rectools[extension-name]

Install all extensions:

pip install rectools[all]

Important: If you're using poetry and you want to add rectools to your project, then you should either install rectools without lightfm extras or use poetry==1.4.0 and add to your poetry.toml file the next lines:

[experimental]
new-installer = false

Recommender Models

The table below lists recommender models that are available in RecTools.

Model Type Description Extra features
implicit ALS Wrapper Matrix Factorization rectools.models.ImplicitALSWrapperModel - Alternating Least Squares Matrix Factorizattion algorithm for implicit feedback Support for user/item features! Check our boost to metrics
implicit ItemKNN Wrapper Collaborative Filtering rectools.models.ImplicitItemKNNWrapperModel - Algorithm that calculates item-item similarity matrix using distances between item vectors in user-item interactions matrix -
LightFM Wrapper Matrix Factorization rectools.models.LightFMWrapperModel - Hybrid matrix factorization algorithm which utilises user and item features and supports a variety of losses 10-25 times faster inference! Check our boost to inference
EASE Collaborative Filtering rectools.models.EASEModel - Embarassingly Shallow Autoencoders implementation that explicitly calculates dense item-item similarity matrix -
PureSVD Matrix Factorization rectools.models.PureSVDModel - Truncated Singular Value Decomposition of user-item interactions matrix -
DSSM Neural Network rectools.models.DSSMModel - Two-tower Neural model that learns user and item embeddings utilising their explicit features and learning on triplet loss -
Popular Heuristic rectools.models.PopularModel - Classic baseline which computes popularity of items Hyperparams (time window, pop computation)
Popular in Category Heuristic rectools.models.PopularInCategoryModel - Model that computes poularity within category and applies mixing strategy to increase Diversity Hyperparams (time window, pop computation, mixing/ratio strategy)
Random Heuristic rectools.models.RandomModel - Simple random algorithm useful to benchmark Novelty, Coverage, etc. -
  • All of the models follow the same interface. No exceptions
  • No need for manual creation of sparse matrixes or mapping ids. Preparing data for models is as simple as dataset = Dataset.construct(interactions_df)
  • Fitting any model is as simple as model.fit(dataset)
  • For getting recommendations filter_viewed and items_to_recommend options are available
  • For item-to-item recommendations use recommend_to_items method
  • For feeding user/item features to model just specify dataframes when constructing Dataset. Check our tutorial

Contribution

Contributing guide

To install all requirements

  • you must have python3 and poetry==1.4.0 installed
  • make sure you have no active virtual environments (deactivate conda base if applicable)
  • run
make install

For autoformatting run

make format

For linters check run

make lint

For tests run

make test

For coverage run

make coverage

To remove virtual environment run

make clean

RecTools Team

Previous contributors: Ildar Safilo [ex-Maintainer], Daniil Potapov [ex-Maintainer], Igor Belkov, Artem Senin, Mikhail Khasykov, Julia Karamnova

rectools's People

Contributors

altair7610 avatar blondered avatar feldlime avatar in48semenov avatar iomallach avatar irsafilo avatar jegorus avatar mikesokolovv avatar redfox193 avatar sharthz23 avatar yukeeul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rectools's Issues

Add random seed tests to all applicable models

Feature Description

Check which models have random seed fixation and write tests for all of them:

  • models with fixed seed produce same results
  • models with different fixed seeds produce different results

Why this feature?

Avoid problems with seed fixation

Additional context

No response

Fix `Serendipity` and `MIUF` doctring formulas

Feature Description

We need to fix k+1 to k in sum. Also we need to add that formula is correct for one user. For multiple users we need to average results.

Why this feature?

Doc is not 100% correct

Additional context

No response

Save/load functions to all of the models

Feature Description

Support for saving and loading models in appropriate formats

Why this feature?

It' just super helpful in many cases

Additional context

No response

ALS example notebooks are not reproducible on Colab

What happened?

When code is executed on Colab, kernel dies.

import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"

This is not enough for running implicit library iALS on Colab. User still gets warning about multithreading. It is necessary to do both:

import os
os.environ["OPENBLAS_NUM_THREADS"] = "1"
import threadpoolctl
threadpoolctl.threadpool_limits(1, "blas")

Expected behavior

No response

Relevant logs and/or screenshots

No response

Operating System

Google Colab

Python Version

3.10

RecTools version

0.4.2

Functions to load popular datasets

Add some functions to load commonly used recommendation datasets like movielens, lastfm, kion, etc.

Think about:

  • Caching (should we implement saving loaded data? if yes, then how to do this?)
  • Should we provide raw data or convert it to our structures?
  • Should we keep some tiny datasets together with the code like scikit-learn does?

HitRate metric

Feature Description

Realisation of HitRate metric

Why this feature?

Classic RecSys metric

Additional context

No response

does not work rectools.metrics.ranking.MAP

The method does not work even with the example from the documentation :(

!pip install RecTools

from rectools.metrics.ranking import MAP
from rectools import Columns
import pandas as pd

Columns.Item = 'movie_id'
Columns.User = 'user_id'
Columns.Rank = 'rank'

reco = pd.DataFrame(
{
Columns.User: [1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4],
Columns.Item: [7, 8, 1, 2, 1, 2, 3, 4, 1, 2, 3],
Columns.Rank: [1, 2, 1, 2, 1, 2, 3, 4, 1, 2, 3],
}
)
interactions = pd.DataFrame(
{
Columns.User: [1, 1, 2, 3, 3, 3, 4, 4, 4],
Columns.Item: [1, 2, 1, 1, 3, 4, 1, 2, 3],
}
)

MAP(k=3).calc_per_user(reco, interactions)


File /opt/conda/lib/python3.10/site-packages/rectools/metrics/ranking.py:247, in MAP.calc_per_user(self, reco, interactions)
245 self._check(reco, interactions=interactions)
246 merged_reco = merge_reco(reco, interactions)
--> 247 fitted = self.fit(merged_reco, k_max=self.k)
248 return self.calc_per_user_from_fitted(fitted)

File /opt/conda/lib/python3.10/site-packages/rectools/metrics/ranking.py:192, in MAP.fit(cls, merged, k_max)
189 prec_at_k_csr = sparse.csr_matrix(np.array([]).reshape(0, 0))
190 return MAPFitted(prec_at_k_csr, users, np.array([]))
--> 192 n_relevant_items = merged.groupby(Columns.User, sort=False)[Columns.Item].agg("size")[users].values
194 user_to_idx_map = pd.Series(np.arange(users.size), index=users)
195 df_prepared = merged.query(f"{Columns.Rank} <= @k_max")

KeyError: 'Column not found: movie_id'

Не работают обертки для ряда моделей

Ряд оберток моделей (LightFM, Implicit) не работает с float32, им необходим float64

В этой строчке sparse матрица переводится во float32.

Варианты решения:

  • Переводить матрицу взаимодействия в классе Interactions во float64, затем внутри методов _fit моделей переводить в необходимое разрешение.
  • Оставить разрешение полученное классом Interaction и переводить матрицу взаимодействий в необходимый формат внутри методов _fit моделей.
  • Переводить матрицу взаимодействия в классе Interactions во float32, затем внутри методов _fit моделей переводить в необходимое разрешение.

Третий вариант самый не оптимальный, т.к. при преводе np.float64 -> np.float32 -> np.float64 будет происходить "округление"

Могу взять этот тикет и повесить пиар.

Community guidlines

Add

  • issue template
  • pull request template
  • contribгtion guide
  • code of conduct

Item-to-item meta affinity metric

Feature Description

I2I validation on meta-vectors distances between target item and recommended items

Why this feature?

One of the approaches to item-to-item validation

Additional context

No response

rectools.metrics.calc_metrics method does not work

rectools.metrics.calc_metrics method does not work when trying to run it i got empty result.

code that i tried to run :

from rectools import Columns
from rectools.metrics import Accuracy, NDCG
reco = pd.DataFrame(
{
Columns.User: [1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4],
Columns.Item: [7, 8, 1, 2, 1, 2, 3, 4, 1, 2, 3],
Columns.Rank: [1, 2, 1, 2, 1, 2, 3, 4, 1, 2, 3],
}
)
interactions = pd.DataFrame(
{
Columns.User: [1, 1, 2, 3, 3, 3, 4, 4, 4],
Columns.Item: [1, 2, 1, 1, 3, 4, 1, 2, 3],
Columns.Datetime: [1, 1, 1, 1, 1, 2, 2, 2, 2],
}
)
split_dt = 2
df_train = interactions.loc[interactions[Columns.Datetime] < split_dt]
df_test = interactions.loc[interactions[Columns.Datetime] >= split_dt]
metrics = {
'ndcg@1': NDCG(k=1),
'accuracy@1': Accuracy(k=1)
}
calc_metrics(
metrics,
reco=reco,
interactions=df_test,
prev_interactions=df_train,
catalog=df_train[Columns.Item].unique()
)

Output:
{}

RecallGain metric

Feature Description

Given test interactions, we can exclude all Positives that would have been recommended by the reference model (e.g. popular model). After that we can calculate Recall as usual. Resulting metrics is interpreted as a gain to overall Recall which is achieved by the new algorithm in comparison with the reference algorithm.

Why this feature?

Useful for both candidate-generators selection and simple validation protocol in case of strong popularity bias in data (e.g. calculating RecallGain from a Popular model).

Additional context

recall_gain = RecallGain(k=10, k_ref=20)

# one metric
value = recall_gain.calc(reco=recos, ref_recos=ref_recos)
per_user = recall_gain.calc_per_user(reco=recos, ref_recos=ref_recos)

# calc_metrics
calc_metrics(
    metrics = {"precision": precision, "intersecton": intersecton, "recall_gain": recall_gain},
    reco=recos,
    interactions=df_test,
    ref_recos: Union[pd.Datafame, Dict[Hashable, pd.DataFrame]]=ref_recos
)
# here we can pass multiple models with:
ref_recos = {"one": ref_recos_one, "two": ref_recos_two}
# result dict will have `intersecton_one`, `intersecton_two`, `recall_gain_one`, `recall_gain_two` as keys

# cross_validate
cv_results = cross_validate(
    dataset=dataset,
    splitter=splitter,
    models=models,
    metrics=metrics,
    k=10,
    filter_viewed=True,
    ref_models=["one", "two"], # just selecting keys from `models` argument
    validate_ref_models=False  # optionally exclude ref_models from other metrics
)

Users parameter for cross-validate

Feature Description

users parameter in cross-validate will accept the list of external user ids to run all experiments only on these users.

Why this feature?

This will help to solve a common task to calculate metrics only on specific subset of users

Additional context

No response

DSSM default model fix

Feature Description

Right now DSSMModel has one parameter without the default value: dataset_type: TorchDataset[tp.Any]. This is very confusing since DSSMModel also has a default model. But user can't use it out of the box.

Why this feature?

DSSMModel doesn't follow the simple interface from other RecTools models

Additional context

No response

np.setdiff1d is too slow

If user_id and item_id columns are CategoryDType, then np.setdiff1d works very slowly on large volumes (>10 million unique ones)
Possible solution is to replace:

new_users = np.setdiff1d(df_test[Columns.User].unique(), df_train[Columns.User].unique())

with

new_users = set(df_test[Columns.User].unique()) - set(df_train[Columns.User].unique()) 

And same for

new_items = np.setdiff1d(df_test[Columns.Item].unique(), df_train[Columns.Item].unique())

DebiasWrapper for metrics

Feature Description

A metric wrapper that creates debiased validation in case of strong popularity bias in test data. One way to do this is to fight power-law popularity distribution in test interactions on each fold with down-sampling fold popular items.

Why this feature?

It helps as a correct goal for hyper-parameters tuning and model selection

Additional context

Algorithm to detect and down-sample excessively popular items. More algorithms and modifications can be proposed here. For now we can use IQR (interquartile-range) that is also used for boxplots: logic.

  1. We find first and third quartiles in test items popularity distribution (Q1 and Q3)
  2. IQR = Q3 - Q1. This is interquartile range. 50% of the observed data is inside this range.
  3. Outliers popularity border will be defined as Q3 + iqr_coef * IQR
  4. Maximum accepted popularity will be defined as the maximum value inside the border.
  5. Every item that exceeds the border should be down-sampled to match the maximum accepted popularity.

For all exceeding items in the test fold we need to randomly keep only the maximum allowed subset of users. We use downsampling for this.

The wrapper changes test interactions, but afterwards any metrics can be calculated as usual.

from rectools.metrics import DebiasWrapper, Precision

debiased_precision = DebiasWrapper(Precision(k=10), iqr_coef=1.5, random_state=32)

Other possible namings are: PopDownSamplingWrapper, DownSamplingWrapper, UnbiasedWrapper

Models method to generate a dict of its hyper-params (including wrapped)

Feature Description

Method get_params for all RecTools models. It outputs a dict of all hyper-params available for tuning together with the values of the current instance. Wrapped models params are also added.

Why this feature?

This can be used in validation pipelines for easy-to-use integration with experiments trackers and metrics visualisation that is based on hyper-params.

Additional context

No response

Holdout fold in splitter and cross-validate

Feature Description

add_holdout_fold parameter for splitters. default is False
run_on_holdout_fold parameter in cross-validate. default is False

Why this feature?

Holdout validation is an import part of experiments. It's nice ti have it out of the box.

Additional context

No response

Widgets for simultaneous dual metrics analysis

Feature Description

Plotly scatterplot widgets with functionality to select metrics for axes and hue from model parameters

Why this feature?

Great way to find pareto-optimum decisions in case of metrics trade-off

Additional context

No response

AttributeError: 'IntegerArray' object has no attribute 'size'

Reproducible example

With pandas==0.25.3

import pandas as pd
from rectools.dataset import IdMap

user_id_map = IdMap.from_values(pd.array([1, 2], dtype=pd.Int32Dtype()))
user_id_map.size
# AttributeError: 'IntegerArray' object has no attribute 'size'

Issue description

The issue seems to be affecting not only pandas==0.25.3, but also pandas==1.0.x (didn't check it thoroughly)
There is no such problem with version pandas==1.1.0 and higher

Expected behaviour

Accessing size of IdMap with IntegerArray type doesn't throw an AttributeError

Visual analysis

Add tool for visual analysis of recommendations in Jupyter Notebook

Tutorial on how to create a custom model from ModelBase

Feature Description

A jupyter notebook which shows how to create and use a custom recommender model that inherits from ModelBase

Why this feature?

This allows using any models in our pipelines

Additional context

No response

Add all models scores descriptions to docstrings

Feature Description

Extend models docstrings in the following way:

  1. Add short description of the algorithm if necessary (examples of such short descriptions can be found in our README)
  2. Add description to scores. For example, for ALS model it could be something like: Model scores are dot products of learnt user embedding and recommended item embeddings for user-to-item recommendations. For item-to-item recommendations scores are cosine similarities between target item embedding and recommended item embeddings.

Why this feature?

Helpful for new users

Additional context

No response

Support warm and cold users and items

Hot user (item) - present in interactions
Warm - has features, but doesn't present in interactions
Cold - totally new

Now Dataset cannot include warm users (items). And models cannot return reco for users (items) that are not in the dataset. Also models cannot recommend items not from the dataset.

The goal is:

  • Support warm users (items) in the Dataset
  • If model by design allows to recommend for warm/cold users (items) or/and allows to recommend warm/cold items, we should implement this in code. If not, we should give an error.

ImportError: cannot import name 'KFoldSplitter' from 'rectools.model_selection'

Добрый день!
В версии 0.3.0 при установке через pip невозможно импортировать KFoldSplitter.

(test) grigoriy@T430:~/Repos/Book_Crossing_RecSys$ python3
Python 3.8.16 (default, Mar  2 2023, 03:21:46) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rectools.model_selection import KFoldSplitter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'KFoldSplitter' from 'rectools.model_selection' (/home/grigoriy/anaconda3/envs/test/lib/python3.8/site-packages/rectools/model_selection/__init__.py)
>>> 

Если проверить содержимое папки model_selection в рабочей среде, то там отсутствуют файлы:

  • kfold_split.py
  • splitter.py
  • utils.py
ls ~/anaconda3/envs/test/lib/python3.8/site-packages/rectools/model_selection
__init__.py  __pycache__  time_split.py

Для воспроизведения проблемы:

conda create -n test python=3.8
conda activate test
pip3 install rectools
python3
from rectools.model_selection import KFoldSplitter

Версия Python - 3.8.16
Остальные зависимости:

Package            Version
------------------ ---------
attrs              21.4.0
certifi            2022.12.7
charset-normalizer 3.1.0
idna               3.4
implicit           0.4.4
joblib             1.2.0
lightfm            1.17
Markdown           3.2.2
nmslib             2.1.1
numpy              1.24.3
pandas             1.5.3
pip                23.0.1
psutil             5.9.5
pybind11           2.6.1
python-dateutil    2.8.2
pytz               2023.3
rectools           0.3.0
requests           2.29.0
scikit-learn       1.2.2
scipy              1.10.1
setuptools         66.0.0
six                1.16.0
threadpoolctl      3.1.0
tqdm               4.65.0
typeguard          2.13.3
urllib3            1.26.15
wheel              0.38.4

errore install [all]

pip install rectools и pip install rectools[extension-name] установились нормально
а вот :
pip install rectools[all] выдал вот такие ошибки ,
SDK и Visial Studio установлены !

    opt = self.warn_dash_deprecation(opt, section)
  running bdist_wheel
  running build
  running build_ext
  Extra compilation arguments: ['/EHsc', '/openmp', '/O2', '/DVERSION_INFO=\\"2.1.1\\"']
  building 'nmslib' extension
  creating build
  creating build\temp.win-amd64-cpython-310
  creating build\temp.win-amd64-cpython-310\Release
  creating build\temp.win-amd64-cpython-310\Release\similarity_search
  creating build\temp.win-amd64-cpython-310\Release\similarity_search\src
  creating build\temp.win-amd64-cpython-310\Release\similarity_search\src\method
  creating build\temp.win-amd64-cpython-310\Release\similarity_search\src\space
  creating build\temp.win-amd64-cpython-310\Release\tensorflow
  "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.37.32822\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -I.\similarity_search\include -Itensorflow -Ic:\temp\pip-install-b171phqx\nmslib_dde37a94bd0b4873a19978ddf40d8a69\.eggs\pybind11-2.6.1-py3.10.egg\pybind11\include -Ic:\temp\pip-install-b171phqx\nmslib_dde37a94bd0b4873a19978ddf40d8a69\.eggs\pybind11-2.6.1-py3.10.egg\pybind11\include -Ic:\temp\pip-install-b171phqx\nmslib_dde37a94bd0b4873a19978ddf40d8a69\.eggs\pybind11-2.6.1-py3.10.egg\pybind11\include -Ic:\temp\pip-install-b171phqx\nmslib_dde37a94bd0b4873a19978ddf40d8a69\.eggs\pybind11-2.6.1-py3.10.egg\pybind11\include -IC:\Users\TDL\mambaforge\envs\karp\lib\site-packages\numpy\core\include -IC:\Users\TDL\mambaforge\envs\karp\include -IC:\Users\TDL\mambaforge\envs\karp\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.37.32822\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" /EHsc /Tp.\similarity_search\src\distcomp_bregman.cc /Fobuild\temp.win-amd64-cpython-310\Release\.\similarity_search\src\distcomp_bregman.obj /EHsc /openmp /O2 /DVERSION_INFO=\\\"2.1.1\\\"
  distcomp_bregman.cc
  C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.37.32822\include\yvals.h(21): fatal error C1083: ЌҐ г¤ Ґвбп ®вЄалвм д ©« ўЄ«о祭ЁҐ: crtdbg.h: No such file or directory,
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.37.32822\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for nmslib
Running setup.py clean for nmslib
Failed to build nmslib
ERROR: Could not build wheels for nmslib, which is required to install pyproject.toml-based projects

DSSM tutorial

Feature Description

Create a jupyter notebook with explanation of our default model architecture and ways to use our wrapper with different architectures. Special explanation to dataset_type parameter

Why this feature?

DSSMModel is not clear in usage

Additional context

No response

Completeness metric (ratio of provided recos at top k positions)

Feature Description

We need to measure, how many items are actually recommended for users at top k positions out of all possible when Completeness = 100% (every user has all K items recommended).

Why this feature?

Not all models recommend full lists that were required. It is necessary to easily find cases with Completeness less then 100%.

Additional context

Other names might be: Filling or Delivery or Fulfillment or Sufficiency or smth else. But not Coverage. Coverage is mostly used for other cases.

Recommendations intersection metric

Feature Description

Metric to measure intersection in user-item (or item-item) pairs between recommendation lists.

Why this feature?

It helps for both candidate-generators selection in pipelines, and for popularity bias measurement.

Additional context


intersecton = Intersection(k=10)

# one metric
intersecton_value = intersecton.calc(reco=recos, ref_recos=ref_recos)
intersecton_per_user = precision.calc_per_user(reco=recos, ref_recos=ref_recos)

# calc_metrics
calc_metrics(
    metrics = {"precision": precision, "intersection": intersecton},
    reco=recos,
    interactions=df_test,
    ref_recos: Union[pd.Datafame, Dict[Hashable, pd.DataFrame]]=ref_recos
)

We can keep ref_recos a simple pd.DataFrame to calculate intersections with one algorithm.
For multiple intersection calculations we can pass multiple models recommendations in a dict:
ref_recos = {"one": ref_recos_one, "two": ref_recos_two}
Result dict from calc_metrics will have intersection_one and intersection_two keys if ref_recos is a dict (merging keys from metrics and ref_recos.

# cross_validate
cv_results = cross_validate(
    dataset=dataset,
    splitter=splitter,
    models=models,
    metrics=metrics,
    k=10,
    filter_viewed=True,
    ref_models=["one", "two"], # here we just select keys from `models` argument
    validate_ref_models=False  # optionally exclude ref_models from other metrics calculation
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.