GithubHelp home page GithubHelp logo

tnakae / irspack Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tohtsky/irspack

0.0 1.0 0.0 758 KB

Train, evaluate, and optimize implicit feedback-based recommender systems.

Home Page: https://irspack.readthedocs.io/

License: MIT License

CMake 0.18% C++ 23.29% Python 76.34% Shell 0.19%

irspack's Introduction

irspack - Implicit recommender systems for practitioners

Python pypi GitHub license Build Read the Docs codecov

Docs

irspack is a Python package for recommender systems based on implicit feedback, designed to be used by practitioners.

Some of its features include:

  • Efficient parameter tuning enabled by C++/Eigen implementations of core recommender algorithms and optuna.
    • In particular, if an early stopping scheme is available, optuna can prune out unpromising trial based on the intermediate validation scores.
  • Various utility functions, including
    • ID/index mapping utilities
    • Fast, multithreaded argsort for batch recommendation retrieval
    • Efficient and configurable evaluation of recommender system performance

Installation & Optional Dependencies

In most cases, you can install the pre-build binaries via

pip install irspack

The binaries have been compiled to use AVX instruction. If you want to use AVX2/AVX512 or your environment does not support AVX (like Rosetta 2 on Apple M1), install it from source like

CFLAGS="-march=native" pip install git+https://github.com/tohtsky/irspack.git

In that case, you must have a decent version of C++ compiler (with C++11 support). If it doesn't work, feel free to make an issue!

Optional Dependencies

I have also prepared a wrapper class (BPRFMRecommender) to train/optimize BPR/warp loss Matrix factorization implemented in lightfm. To use it you have to install lightfm separately, e.g. by

pip install lightfm

If you want to use Mult-VAE, you'll need the following additional (pip-installable) packages:

Basic Usage

Step 1. Train a recommender

To begin with, we represent the user/item interaction as a scipy.sparse matrix. Then we can feed it into recommender classes:

import numpy as np
import scipy.sparse as sps
from irspack import IALSRecommender, df_to_sparse
from irspack.dataset import MovieLens100KDataManager

df = MovieLens100KDataManager().read_interaction()

# Convert pandas.Dataframe into scipy's sparse matrix.
# The i'th row of `X_interaction` corresponds to `unique_user_id[i]`
# and j'th column of `X_interaction` corresponds to `unique_movie_id[j]`.
X_interaction, unique_user_id, unique_movie_id = df_to_sparse(
  df, 'userId', 'movieId'
)

recommender = IALSRecommender(X_interaction)
recommender.learn()

# for user 0 (whose userId is unique_user_id[0]),
# compute the masked score (i.e., already seen items have the score of negative infinity)
# of items.
recommender.get_score_remove_seen([0])

Step 2. Evaluation on a validation set

To evaluate the performance of a recommenderm we have to split the dataset to train and validation sets:

from irspack.split import rowwise_train_test_split
from irspack.evaluation import Evaluator

# Random split
X_train, X_val = rowwise_train_test_split(
    X_interaction, test_ratio=0.2, random_state=0
)

evaluator = Evaluator(ground_truth=X_val)

recommender = IALSRecommender(X_train)
recommender.learn()
evaluator.get_score(recommender)

This will print something like

{
    'appeared_item': 435.0,
    'entropy': 5.160409123151053,
    'gini_index': 0.9198367595008214,
    'hit': 0.40084835630965004,
    'map': 0.013890322881619916,
    'n_items': 1682.0,
    'ndcg': 0.07867240014767263,
    'precision': 0.06797454931071051,
    'recall': 0.03327028758587522,
    'total_user': 943.0,
    'valid_user': 943.0
}

Step 3. Hyperparameter optimization

Now that we can evaluate the recommenders' performance against the validation set, we can use optuna-backed hyperparameter optimization.

best_params, trial_dfs  = IALSRecommender.tune(X_train, evaluator, n_trials=20)

# maximal ndcg around 0.43 ~ 0.45
trial_dfs["ndcg@10"].max()

Of course, we have to hold-out another interaction set for test, and measure the performance of tuned recommender against the test set.

See examples/ for more complete examples.

TODOs

  • more benchmark dataset

irspack's People

Contributors

k-tahiro avatar random1992 avatar tnakae avatar tohtsky avatar wararaki718 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.