GithubHelp home page GithubHelp logo

sureshb208 / streamify Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ankurchavda/streamify

0.0 1.0 0.0 19 KB

A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform and much more!

Shell 39.48% HCL 25.80% Python 34.72%

streamify's Introduction

Setup

Terraform

In order to spin up our infra, we will be using Terraform.

cd terraform
terraform init
terraform apply

This should spin up a VPC network and two n1-standard-2 VM instances. One for Kafka, and one for Spark.

VM Setup

Create an ssh key in you local system in the .ssh folder

ssh-keygen -t rsa -f ~/.ssh/KEY_FILENAME -C USER -b 2048

Add the public key to you VM instance using this link

Create a config file in your .ssh folder

touch ~/.ssh/config

Add the following content in it after replacing with the relevant values below.

Host streamify-kafka
    HostName <External IP Address>
    User <username>
    IdentityFile <path/to/home/.ssh/gcp>

Host streamify-spark
    HostName <External IP Address>
    User <username>
    IdentityFile <path/to/home/.ssh/gcp>

SSH into the server using the below commands in two separate terminals

ssh streamify-kafka
ssh streamify-spark

Repo Clone

Clone the git repo into you VMs

Run the scripts in VM to install anaconda, docker and docker-compose, spark in your VM

bash scripts/vm_setup.sh username
bash scripts/spark_setup.sh username

Test Kafka-Spark Connection

  1. Open the port 9092 on your Kafka server using these steps
  2. Set the environment variable KAFKA_ADDRESS to the external IP of your VM machine in both the Spark and the Kafka VM machines:
    export KAFKA_ADDRESS=IP.ADD.RE.SS
  3. Run docker-compose up in the kafka folder in the Kafka VM.
  4. Run the following command in the kafka/test_connection folder to start producing -
    python produce_taxi_json.py
  5. Move to the Spark VM and run the following command in the spark_streaming/test_connection folder -
    spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.3 stream_taxi_json.py

TODO

  1. Fix Spark Script with py4j eval
  2. Make setup easier with Makefile. Possibly a one-click setup.

streamify's People

Contributors

ankurchavda avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.