GithubHelp home page GithubHelp logo

ra312 / personalization Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 1.36 MB

A project to create personalised recommendation for restaurants

License: Apache License 2.0

Python 8.59% Jupyter Notebook 91.10% Dockerfile 0.19% Shell 0.12%
machine-learning-algorithms search

personalization's Introduction

personalization


An end-to-end demo machine learning pipeline to provide an artifact for a real-time inference service

Aim

We want to create a machine learning training code which satisfies the following properties that given data can train the model and save it to artifact

Solution

Our implementation of the package 'personalization' We choose to use Polars to read data, it is roughly 2-3 times faster than Pandas and supports nice API for aggregations and features creation. For the model part, we decided to take lightGBM due to ts speed, small size (model artifact size up to 50 Mb on 300 million rows of search data) and explainability. The user should choose lightGBM parameters carefully. We tested an example lightgbm params in notebooks/train.ipynb.

Offline evaluation

The offline evaluation has been done in notebooks/train.ipynb, we can see significant increase in NDCG levels across venues with our model against the baseline.

CICD: code style and PyPI

The code is checked with pre-commit configs, tested and published in Github Actions, current coverage is around 80 percent.

The inference service code can be found here https://github.com/ra312/model-server

How to run

  1. Obtain sessions.csv and venues.csv and move them to the root folder
  2. Install personalization
    python -m pip instal personalization
  1. Run the following command in shell to train pipeline and get artifact:
python3 -m personalization \
    --sessions-bucket-path sessions.csv \
    --venues-bucket-path venues.csv \
    --objective lambdarank \
    --num_leaves 100 \
    --min_sum_hessian_in_leaf 10 \
    --metric ndcg --ndcg_eval_at 10 20 \
    --learning_rate 0.8 \
    --force_row_wise True \
    --num_iterations 10 \
    --trained-model-path trained_model.joblib

TODO

Next steps:

  1. Scalability(e.g. use Flyte)
  2. Data: add support to ingest sessions and venues data from a database
  3. Versioning: add MLFlow integration

PyPI version Test Status CI Status codecov Join the chat at https://gitter.im/ra312/personalization License Downloads Code style: black Imports: isort CI


Development

  • Clone this repository
  • Requirements:
  • Create a virtual environment and install the dependencies
poetry install
  • Activate the virtual environment
poetry shell

Testing

pytest

Pre-commit

Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality checks to make sure the changeset is in good shape before a commit/push happens.

You can install the hooks with (runs for each commit):

pre-commit install

Or if you want them to run only for each push:

pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:

pre-commit run --all-files

This project was generated using the wolt-python-package-cookiecutter template.

personalization's People

Contributors

ra312 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.