GithubHelp home page GithubHelp logo

miftahulridwan / temporal-performance-evaluation-cf-algorithms Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 170.46 MB

Temporal performance evaluation of Collaborative Filtering Algorithms on Netflix Prize dataset.

Jupyter Notebook 100.00%
netflix-prize collaborative-filtering recommender-systems python

temporal-performance-evaluation-cf-algorithms's Introduction

Temporal Performance Evaluation on Collaborative Filtering Algorithms

This project is a Master Thesis project and partial fulfillment of the requirements for the degree of Master of Science in Data Science and Society, Tilburg University. In this project, we evaluate 4 collaborative filtering algorithms (User-based k-NN, Item-based k-NN, Singular Value Decomposition (SVD) and SVD++ algorithms) by incorporating a sequence of training and evaluation process on expanding and dynamic dataset. The dataset we are using is the Netflix prize dataset that publicly available on Kaggle. The result of this project is dependent to this dataset. The insights from this project benefit the businesses to design and/or improve their existing Recommender Systems to provide better recommendation to the customer, thus reduce our exposure to information overload and, on a greater extent, improve our overall quality of life. We also provide the full thesis text in the root folder

Table of Directory

  1. Dataset
  2. Figures

Abstract

In previous work on CF, researchers are mainly focused on comparing algorithms as well as proposing new and/or enhanced state-of-the-art algorithms claiming to outperform the existing ones. Experiment setting often started with splitting the data into training and test set, the algorithms are then trained and evaluated against the unseen test set. However, CF algorithms operate in a rather different setting once it is deployed in the production stage. As the number of user and the business grow over time, the algorithms should be trained periodically using all the available data to provide the user with up-to-date recommendation.

In this work, we evaluate CF algorithms on Netflix Prize dataset, by mimicking how they are deployed in production stage: incrementally updating the training and test set, as well as iteratively training and measuring the performances over the course of observation period. The algorithms are User-based k-NN, Item-based k-NN, SVD and SVD++ algorithms. By setting the update interval to monthly system update, we found that neither User-based k-NN nor Item-based k-NN algorithms' performances overlap with SVD and SVD++ algorithms. However, we found that SVD++ algorithm does not consistently outperformed SVD algorithm as previously suggested in the literature. Thus, our work indicate the urge to always perform temporal evaluation before claiming one algorithms to outperform the others.

Environment and Library

We initially plan to deploy our code in Google Collaboratory environment. However, due to technical limitation and long processing hours needed by some algorithm, we partially run our experiment on Macbook Pro with 2.3 GHz 8-Core Intel Core i9 and 16GB of RAM. The code is written in Python using several libraries, namely:

  1. Pandas
  2. Numpy
  3. Matplotlib
  4. Tqdm
  5. Surprise

temporal-performance-evaluation-cf-algorithms's People

Contributors

miftahulridwan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.