GithubHelp home page GithubHelp logo

riolaf05 / spark-elasticsearch-recommendation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 9 KB

Recommendation system using Alternating Least Squares(ALS) and Cosine Similarity on PySpark and Elasticsearch

Jupyter Notebook 100.00%
elasticsearch spark machine-learning collaborative-filtering recommendation-system pyspark docker

spark-elasticsearch-recommendation's Introduction

spark-elasticsearch-recommendation

Recommendation system using Alternating Least Squares(ALS) and Cosine Similarity on PySpark and Elasticsearch

Theory

There are basically three types of recommendation systems:

  • Content-Based Filtering
  • Collaborative Filtering
  • Hybrid

image

Content-Based Filtering

The attributes or characteristics of the items are taken into account to carry out the recommendation. For example, if we’re looking to recommend songs, we’ll look at the genre, duration, singer, and various other attributes that make up the item.

Pro:

  • Requieres less data
  • It is not necessary to identify users with similar preferences.
  • It does not suffer from the cold start problem, a known issue in recommender systems that addresses the algorithm’s inability to recommend items or users for which it does not have enough information.

Cons:

  • Suffer from a lack of diversity, that is, they can only recommend items that are strictly similar.
  • Depend on the data filled in correctly and on the correct feeding of the systems.
  • If items have the same characteristics, they will be treated as equal.

Collaborative Filtering

Analyze the preferences of other users to make recommendations, divided into two types:

Memory Based

Similarity matrices between all users or items. By identifying this similarity, it is possible to recommend new items.

There are several ways of computing similarity between vectors, such as euclidean, minkowski, jaccard etc., cosine similarity (which is a measure of similarity between two vectors).

image

the most similar a vector can be to the other is when the angle between them is 0º, where the cosine has a value of 1.

For instance, user-movie (or movie-user) interaction matrix (where each entry records an interaction of a user i and a movie j), in a real world setting because the vast majority of movies receive very few or even no ratings at all by users, is an extremely sparse matrix:

image

With such a sparse matrix, what ML algorithms can be trained and reliable to make inference? To find solutions we use Matrix factorization.

Matrix factorization is a factorization of a matrix into a product of matrices:

image

One matrix can be seen as the user matrix where rows represent users and columns are attributes or characteristics (latent factor). The other matrix is the item matrix where rows are attributes or characteristics and columns represent items.

This allows model to predict better personalized movie ratings for users, e.g. less-known movies can have rich latent representations as much as popular movies.

TODO: Alternating Least Square (ALS) with Spark ML

Setup

Setup 3-node Spark cluster and single node Elasticsearch with:

docker-compose up -d --build

Then run Jupyter notebook.

References

Tutorials

spark-elasticsearch-recommendation's People

Contributors

riolaf05 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.