GithubHelp home page GithubHelp logo

dragon-fury / streaming-matrix-factorization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brkyvz/streaming-matrix-factorization

0.0 2.0 0.0 136 KB

Distributed Streaming Matrix Factorization implemented on Spark for Recommendation Systems

License: Apache License 2.0

Scala 68.65% Shell 31.35%

streaming-matrix-factorization's Introduction

Streaming Matrix Factorization for Spark

This library contains methods to train a Matrix Factorization Recommendation System on Spark. For user u and item i, the rating is calculated as:

r = U(u) * P^T^(i) + bu(u) + bp(i) + mu,

where r is the rating, U is the User Matrix, P^T^ is the transpose of the product matrix, U(u) corresponds to the uth row of U, bu(u) is the bias of the uth user, bp(i) is the bias of the ith product and mu is the average global rating.

Gradient Descent is used to train the model.

Installation

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages brkyvz:streaming-matrix-factorization:0.1.0

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "brkyvz/streaming-matrix-factorization:0.1.0"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
		  
libraryDependencies += "brkyvz" % "streaming-matrix-factorization" % "0.1.0"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>brkyvz</groupId>
    <artifactId>streaming-matrix-factorization</artifactId>
    <version>0.1.0</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Usage

To train a streaming model, use the StreamingLatentMatrixFactorization class. The following usage will train a Model that would predict ratings between 1.0, and 5.0 with rank 20:

import com.brkyvz.spark.recommendation.StreamingLatentMatrixFactorization
import org.apache.spark.ml.recommendation.ALS.Rating
import org.apache.spark.streaming.dstream.DStream

val ratingStream: DStream[Rating[Long]] = ... // Your input stream of Ratings
// numUsers and numProducts are the number of users and products respectively
val algorithm = new StreamingLatentMatrixFactorization(numUsers, numProducts)
algorithm.trainOn(ratingStream)

val testStream: DStream[(Long, Long)] = ... // stream of (user, product) pairs to predict on
val predictions: DStream[Rating[Long]] = algorithm.predictOn(testStream)

You can also predict on a static RDD

val latestModel = algorithm.latestModel()
val testData: RDD[(Long, Long)] = ... // RDD of (user, product) pairs to predict on
val predictions: RDD[Rating[Long]] = latestModel.predict(testData)

You can also train on a static RDD and then predict on a DStream or RDD

import com.brkyvz.spark.recommendation.StreamingLatentMatrixFactorization
import org.apache.spark.ml.recommendation.ALS.Rating
import org.apache.spark.streaming.dstream.DStream

val ratings: RDD[Rating[Long]] = ... // Your input stream of Ratings
// numUsers and numProducts are the number of users and products respectively
val algorithm = new LatentMatrixFactorization(numUsers, numProducts)
algorithm.trainOn(ratings)

val testStream: DStream[(Long, Long)] = ... // stream of (user, product) pairs to predict on
val predictions: DStream[Rating[Long]] = algorithm.predictOn(testStream)

streaming-matrix-factorization's People

Contributors

brkyvz avatar

Watchers

James Cloos avatar Sesha Kumar PG avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.