GithubHelp home page GithubHelp logo

hildar / recsys-retail Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 1.4 MB

Recommender system in retail

Jupyter Notebook 90.48% Python 9.52%
alternating-least-squares catboost implicit lightgbm matrix-factorization nlp matplotlib numpy pandas

recsys-retail's Introduction

Two-layer Hybrid Recommender System for retail

About

Two-layer hybrid recommender system for retail. Layer 1 uses an Implicit library for sparse data (KNN and ALS approaches). Level 2 uses a ranking model using the CatBoost (gradient boosting). This gave double the growth compared to the baseline. Evaluated by custom precision metric.

Stack:

  • 1-st layer: Implicit, ItemItemRecommender, ALS, sklearn, pandas, numpy, matplotlib
  • 2-nd layer: CatBoost, LightGBM

Data: from Retail X5 Hero Competition

Steps:

  1. Prepare data: prefiltering
  2. Matching model (initialize MainRecommender 1-st layer model as baseline)
  3. Evaluate Top@k Recall
  4. Ranking model (choose 2-nd layer model)
  5. Feature engineering for ranking

Usage

Please, open train.ipynb Jupiter notebook file and explore how to create Recommender system step-by-step.

Project has next few steps:

1. Prepare data

First is looking at datasets and prefiltering data

data

2. Matching model

Learn first-layer model as baseline. In MainRecommender class we have two base models from implicit lib - ItemItemRecommender and AlternatingLeastSquares:

implicit

ALS used to find similar users, items and als recommendations. ItemItemRecommender used to find own item recommendations among user's purchases.

3. Evaluate Top@k Recall

For first-layer model we have taken Recall@k metric because it is show the proportion of correct answers from real purchases. With this approach we going to significantly cut dataset size for second-layer model.

Here we are evaluating different types of recommendations:

types_recs

And are selecting optimal value of Recall:

recall

4. Ranking model

In that step we are making new X_train dataset with target based on purchases:

target

Here we are choosing classifier from LightGBM and CatBoost, evaluate it by Precision@k at test data. In this step we have not impressive result.

5. Feature engineering for ranking

Adding new features for ranking model based on user, item and paired user-item data.

paired

Controling overfitting for CatBoost and cutting extra estimators:

catboost

Ranking model gave us double the growth compared to the baseline..

As we see the best feature importance is paired user-item features:

catfeature_importance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.