Movie Recommender

This document describes the Movie Recommender API and it's features.

Was developed on a machine with sbt 1.8.2, scala 2.13.7, java version 1.8.0_351

Core functionality

The service comprises two parts

A movie recommendation service which generates a recommendations db - built in Spark
The API service which exposes a RESTful API to serve these recommendations - built in Cats IO

The API is dependent on the existence of db.json generated by the Spark job to serve results

To save time the recommendations are being read from a raw JSON file, this is obviously not suitable for production and in order to scale this solution more components (Redis for example) would need to be bought into the fold.

Questions around how much the movie database is being updated would also determine how often we need to run the Spark job, as it only needs to be run periodically. Real time recommendations for new additions would be a different problem, requiring a streaming hashable lookup solution or similar

To run the tests

from movie-recommendation-system root run

make test

This will run all the service tests via sbt test

Build steps

Generate Recommendations

First we need to actually generate the recommendations from the metadatas.json file

N.B: Spark can be quite picky about Java/Scala versions - so we run it in docker

To run the recommendations job, from movie-recommendation-system root run:

make build_recs_docker

which will attempt in docker

If you wish to run it on your local machine instead

make build_recs

This will output to a db.json of recommendations via sbt build_recs

Notes: This is written as a Spark ML pipeline using feature hashing and a similarity matrix

In a real production environment we would not be using a text file for the lookup but most likely a key value store like Redis. Places where this has impacted design choices are outlined in the code

Spark was chosen as we can scale this appropriately - from experience even using a relatively naive algorithm such as an n^2 Euclidean comparison like we've chosen, a 20 node cluster can deliver recommendations for 6 million items in about 2.5 hours.

The method we've used could be made faster and more computationally feasable for huge datasets by using local sensitivity hashing or another probabilistic data structure / join algorithm

Run the API

To run the recommendations API, from movie-recommendation-system root run

make run_server

Starts up the web server on http://0.0.0.0:8080 via sbt run_server

Endpoints

The API is serviced by the following RESTful endpoints:

`/` - GET

/<int:movie_id> - return 3 recommendations and their relevance for a given movie_id

Example GET requests:

curl -X GET http://0.0.0.0:8080/2 Returns {"id":2,"recommended":[15,113,58],"relevance":[1,2,3]} So if you like The Godfather, you may like Casino, The Godfather II and Goodfellas (id's 15, 113 and 58) Pretty good!
curl -X GET http://0.0.0.0:8080/140 Returns {"id":20,"recommended":[31,45,128],"relevance":[1,2,3]} So if you like The Exorcist, you may like The Shining, The Thing and The Help
curl -X GET http://0.0.0.0:8080/1" Returns {"id":1,"recommended":[58,105,124],"relevance":[1,2,3]} So if you like The Shawshank Redemption, you may like Goodfellas, The Help and Dogville.
curl -X GET http://0.0.0.0:8080/96" Returns {"id":96,"recommended":[70,6,24],"relevance":[1,2,3]} So if you like The Wizard of Oz you may like The Princess Bride, The General and The Message

TODOS

There are some TODO's still in the code which would be done is given more time - exception handling could be cleaner in some cases

The Spark code needs testing - building the test infrastructure around Spark is quite tedious, so it has been left. Snapshot tests would probably fit best and have them run over a small data input with low level of parallelism as to not slow down CI

More requirements would be needed to be gathered in order to progress on others. Educated guesses/assumptions have been made where possible

tjadunn / movie-recommendation-system Goto Github PK

movie-recommendation-system's Introduction

Movie Recommender

Core functionality

To run the tests

Build steps

Generate Recommendations

Run the API

Endpoints

`/` - GET

TODOS

movie-recommendation-system's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

tjadunn / movie-recommendation-system Goto Github PK

movie-recommendation-system's Introduction

Movie Recommender

Core functionality

To run the tests

Build steps

Generate Recommendations

Run the API

Endpoints

/ - GET

TODOS

movie-recommendation-system's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`/` - GET