GithubHelp home page GithubHelp logo

movie-recommendation's Introduction

Building a Recommender System

1. Summary

The goal of this project is to build a recommendation engine for the users. In this context, there are plenty of different ways to build this engine. It might depend on the user habits, contents or genres.

In order to build that engine, I benefit from the movie data set from the webpage Kaggle. This is a detailed dataset about movies, users and their ratings to each movie. Depending on the scale of the projects, there are smaller or larger datasets.

To evaluate my model, I didn't use traditional metrics. It was only manual check for the movies and the suggestions. Not using a supervised model, made the evaluation process a bit harder but logical interpretation of the suggestions played a bigger role.

When building my model, I was also restricted by the limitations of the data given by Kaggle.com. For instance, the way data is stored resulted in big amount of time spent in the reshaping the tables.

2. Data

The data that I used included below CSV files. They are all available under the link provided above.

  • credit: contains cast and crew information for all movies in the movies_metadata.csv file.
  • keyword: contains keywords for all the movies in the movies_metadata.csv file
  • links: Contains IMDB and TMDB IDs of all movies featured in the ratings.csv file (About 45,000 movies).
  • links_small: Contains IMDB and TMDB IDs of all movies featured in the ratings.csv file (For the smaller portion).
  • movies_metadata: Contains Casd Crew Information for all movies in the movies_metadata.csv file.
  • movies: CSV file created after the second notebook
  • ratings: Contains Cast and Crew Information for all movies in the movies_metadata.csv file.
  • ratings_small: Contains 100 ratings from 700 users on 9,000 +91 88708 42439. Is a subset of the ratings available in the Full MovieLens dataset.

3. Recommender Systems:

3.1 Basic Recommender

IMDB uses a weighted rating formula for the movies. This formula is very helpful in order to understand the actual ratings of the movies. Thanks to the below calculation we also take into consideration the amount of reviews on the movie.

(WR)=(v/(v+m))R+(m/(v+m))C

  • R = average for the movie (mean) = (Rating)
  • v = number of votes for the movie = (votes)
  • m = minimum votes required to be listed in the Top 50 (currently 1000)
  • C = the mean vote across the whole report

Utilizing this formula, we can calculate a weighted rating for each movie. After sorting them from the highest to lowest WR, we end up with the best movies per genre.

3.2 Content Based Recommender

Another recommendation is built using the movie descriptions. From each description, I used TF-IDF to check the words creating bins. At the end, I was able to recommend movies depending on the descriptions.

When we think about this approach, it is a more detailed recommendation on just suggesting from a variety of genres. When we assume that the descriptions are summaries of each movie without giving away the key concepts, we can actually get a better sense of the similarities utilizing this information.

movie-recommendation's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.