GithubHelp home page GithubHelp logo

rec-engine-cs205l-w19's Introduction

Movie Recommendation Engine

Stanford CS205L WIN 2018-2019

Major course project:

Creation and Analysis of Movie Recommendation Engine Using Continuous Mathematical Methods

Index

Background

Recommendation systems are all around us in the modern world. A recommendation system is any system that attempts to predict a user's preferences and suggest a product for them to consume. Recommender systems are increasingly important for predicting users’ preferences for a variety of content including movies, books, games, products, and more. These recommender systems are a specialized subset of information filtering systems, which predict a user’s preference for a given item. The most common examples are Spotify's "Made for you" playlist, Amazon's "Recommendations for you" and "Customers who shopped for ... also shopped for ..." product suggestions, and Netflix' recommendations and "Because you watched ..." suggestions.

While these systems have become ubiquitous with the rapid collection of massive data, they are still far from optimized. There are three main approaches to current recommender systems:

  1. Collaborative Filtering - predictions based on the preferences of similar users
  2. Content Based Filtering - predictions based on the preferences of the same user on similar content in the past
  3. Hybrid Recommender Systems - a combination of collaborative and content based

This project will use a collaborative filtering approach to predict a given user’s movie taste preferences based on their movie reviews and the reviews of other users. We will frame the movie recommendation problem as a matrix problem. Given the high cardinality of both the movies list and the users list, the matrix will be a sparse one (with many entries not having any ratings). This will provide a great opportunity to apply and realize some of the benefits of techniques learned in class including dimensionality reduction and other applications of SVD. To evaluate the predictiveness of our model, we plan to use the “sum of square” of errors between predicted rating and given rating as the performance metric.

Challenges

The primary challenge is that the user x ratings matrix (A) is very large and sparse. Only ~0.2% of all entries are non-zero, but the total size of the matrix is ~100GB. In order to overcome this challenge, we will try to do all the required computation using matrix methods that avoid using the complete ratings matrix and allow us to compute one column or row vector at a time.

Data

The data used in this project is from MovieLens. The data can be downloaded here. This page offers a detailed description of the data.

Software

The entire project was completed in Python 3.7.2_1 and Cython 0.29.6 using standard libraries including:

We also used the Surprise library for benchmarks (code found here).

Please see bibliography within formal report for academic resources. Other references listed here:

Contact Info

Please reach out with any comments, questions, suggestions, ideas, or anything else.

License

This repository contains content created by third parties, which is distributed under the license provided by those parties. Content created by Annies Abduljaffar and Matt Vail is provided under the GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007.

rec-engine-cs205l-w19's People

Contributors

aa-code-world avatar mattvail734 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.