GithubHelp home page GithubHelp logo

danielshir / bigdatariver Goto Github PK

View Code? Open in Web Editor NEW

This project forked from radek1st/bigdatariver

0.0 2.0 0.0 10.87 MB

Simple demo implementation of Lambda and Kappa architectures using Python, Docker, Kafka, Spark and Cassandra

Home Page: https://github.com/radek1st/BigDataRiver/blob/master/YowData-RadekOstrowski.pdf

License: MIT License

Python 19.13% Shell 0.46% Jupyter Notebook 80.42%

bigdatariver's Introduction

BigDataRiver

This project shows how one could simply implement Lambda nad Kappa architectures for making product recommendations for an imaginary e-commerce store. It is written in Python and employs Kafka, Spark (in Jupyter notebook), Cassandra and Falcon. All the components are tied in with Docker and their relationships are captured in docker-compose.

Presentation

Here is the video and the slides from my presentation at Yow! Data Sydney. I've also used the same slides at Sydney Docker Meetup.

Setup

For the ease of deployment Docker Compose script is used. It still needs some manual steps, however.

  • Docker
    • Edit docker-compose.yml file and replace paths in volumes to match your environment
    • To start all the services run this command from the main project folder: docker-compose up
  • Simulation of user clicks/actions
    • In a terminal, go to the data folder and start a feeder script POSTing JSON messages to Falcon: ./user-simulator.py
  • Cassandra
    • In another terminal, connect to Cassandra instance with command like: docker exec -it bigdatariver_cassandra_1 bash
      • Once inside, initialise Cassandra's keyspace: cqlsh -f bdr/init.sql
      • You can also run cqlsh and start issuing CQL statements directly against Cassandra
  • Spark Notebook
    • In a browser, navigate to http://localhost:8888/ and choose Lambda - Stream - Users who bought X also bought. Shift+Enter all the cells or choose from the top menu: Cell->Run All
    • Once Spark Streaming is running and the data feeder is started, you should see the recommendation table become populated in Cassandra
    • Repeat the same for other notebooks if required:
      • Lambda - Batch- Users who bought X also bought
      • Kappa - Users who bought X also bought
      • Kappa - Collaborative Filtering
  • Falcon
    • Once every gear is in motion, you can finally get the recommendations. Open a browser (or otherwise issue GET request) to hit Falcon and get recommendations like this:
      • Lambda: http://127.0.0.1:8000/bdr?product-lambda=59 should return response like {"product":59, "recommendedProducts":[29,49,99,19,62]}
      • Kappa: http://127.0.0.1:8000/bdr?product-kappa=41 should return response like {"product":41, "recommendedProducts":[21,5,95,83,37]}
      • Kappa Collaborative Filtering user customised recommendation: http://127.0.0.1:8000/bdr?user=2105 with response like {"user":2105, "recommendedProducts":[77,5,95,83,37]}

Troubleshooting

If required to connect to Kafka from local host (outside of Docker), add kafka hostname to your /etc/hosts like this:

more /etc/hosts
127.0.0.1       localhost kafka

bigdatariver's People

Contributors

radek1st avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.