GithubHelp home page GithubHelp logo

movie-recommendation-system's Introduction

Movie Recommender

This document describes the Movie Recommender API and it's features.

Was developed on a machine with sbt 1.8.2, scala 2.13.7, java version 1.8.0_351

Core functionality

The service comprises two parts

  • A movie recommendation service which generates a recommendations db - built in Spark
  • The API service which exposes a RESTful API to serve these recommendations - built in Cats IO

The API is dependent on the existence of db.json generated by the Spark job to serve results

To save time the recommendations are being read from a raw JSON file, this is obviously not suitable for production and in order to scale this solution more components (Redis for example) would need to be bought into the fold.

Questions around how much the movie database is being updated would also determine how often we need to run the Spark job, as it only needs to be run periodically. Real time recommendations for new additions would be a different problem, requiring a streaming hashable lookup solution or similar

To run the tests

from movie-recommendation-system root run

  • make test

This will run all the service tests via sbt test

Build steps

Generate Recommendations

First we need to actually generate the recommendations from the metadatas.json file

N.B: Spark can be quite picky about Java/Scala versions - so we run it in docker

To run the recommendations job, from movie-recommendation-system root run:

  • make build_recs_docker

which will attempt in docker

If you wish to run it on your local machine instead

  • make build_recs

This will output to a db.json of recommendations via sbt build_recs

Notes: This is written as a Spark ML pipeline using feature hashing and a similarity matrix

In a real production environment we would not be using a text file for the lookup but most likely a key value store like Redis. Places where this has impacted design choices are outlined in the code

Spark was chosen as we can scale this appropriately - from experience even using a relatively naive algorithm such as an n^2 Euclidean comparison like we've chosen, a 20 node cluster can deliver recommendations for 6 million items in about 2.5 hours.

The method we've used could be made faster and more computationally feasable for huge datasets by using local sensitivity hashing or another probabilistic data structure / join algorithm

Run the API

To run the recommendations API, from movie-recommendation-system root run

  • make run_server

Starts up the web server on http://0.0.0.0:8080 via sbt run_server

Endpoints

The API is serviced by the following RESTful endpoints:

/ - GET

/<int:movie_id> - return 3 recommendations and their relevance for a given movie_id

Example GET requests:

  • curl -X GET http://0.0.0.0:8080/2 Returns {"id":2,"recommended":[15,113,58],"relevance":[1,2,3]} So if you like The Godfather, you may like Casino, The Godfather II and Goodfellas (id's 15, 113 and 58) Pretty good!

  • curl -X GET http://0.0.0.0:8080/140 Returns {"id":20,"recommended":[31,45,128],"relevance":[1,2,3]} So if you like The Exorcist, you may like The Shining, The Thing and The Help

  • curl -X GET http://0.0.0.0:8080/1" Returns {"id":1,"recommended":[58,105,124],"relevance":[1,2,3]} So if you like The Shawshank Redemption, you may like Goodfellas, The Help and Dogville.

  • curl -X GET http://0.0.0.0:8080/96" Returns {"id":96,"recommended":[70,6,24],"relevance":[1,2,3]} So if you like The Wizard of Oz you may like The Princess Bride, The General and The Message

TODOS

There are some TODO's still in the code which would be done is given more time - exception handling could be cleaner in some cases

The Spark code needs testing - building the test infrastructure around Spark is quite tedious, so it has been left. Snapshot tests would probably fit best and have them run over a small data input with low level of parallelism as to not slow down CI

More requirements would be needed to be gathered in order to progress on others. Educated guesses/assumptions have been made where possible

movie-recommendation-system's People

Contributors

tjadunn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.