GithubHelp home page GithubHelp logo

aminebenami / spark-high-availability-zookeeper Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 68 KB

Spark Standalone Cluster With Zookeeper using Alpine OS, Java8, Python 3.6, spark 2.2 and hadoop 2.7 + examples of java applications

Dockerfile 2.70% Shell 9.74% Java 87.35% Python 0.21%

spark-high-availability-zookeeper's Introduction

Spark-High-Availability-Zookeeper

Launch Basic Spark Container

To get the image
docker pull foodytechnologies/spark-openjdk8-alpine
To run simple container
docker run -p 4040:4040 -dti --privileged foodytechnologies/spark-openjdk11-alpine

Setup Cluster

It's a Spark Standalone Cluster With Zookeeper composed of two zookeeper server, two spark masters, two slaves with each of them 5 workers and 1 application submitter
docker-compose up -d --scale LocalClusterNetwork.spark.Slave=2

Launch Applications on Spark Cluster

  • To launch a local python application
    docker exec -ti ApplicationSubmitter sh StartApplication.sh /apps/python-apps/example.py

  • To Launch a local java application

Compile your jobs source:

$ cd ./data/dockervolumes/applications/java-apps/
$ mvn package

Docker compose will mount local ./data/dockervolumes/applications directory on /apps directory of Application and slaves containers.
We can also pass files/data as arguments to jobs by placing them on local directory ./data/dockervolumes/data (we should give directory write authorization if jobs will save some files on it) Docker compose will bind this local directory on /data directory of started containers


Examples

Manipulate a json file and generate a new one:
docker exec -ti ApplicationSubmitter sh StartApplication.sh --class com.databootcamp.sparkjobs.BasicLoadJson /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar /data/tweets.json /data/HaveTweets

Basic Flat Map by reading a file:
docker exec -ti ApplicationSubmitter sh StartApplication.sh --class com.databootcamp.sparkjobs.BasicFlatMap /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar /data/spark.txt

Basic Avg:
docker exec -ti ApplicationSubmitter sh StartApplication.sh --class com.databootcamp.sparkjobs.BasicAvg /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar

Expose Context with thrift server:

  • start standalone thrift server and expose a context in temporary view:
    docker exec -ti ApplicationSubmitter sh StartApplication.sh --class com.databootcamp.sparkjobs.ExposeContextWithLiveThrift /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar `docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}},{{end}}' ApplicationSubmitter | cut -d',' -f1` 10011 /data/tweets.json exposethecontext
  • start thrift client and read context:
    beeline -u jdbc:hive2://IP_TO_SPARK_EXECUTOR:10011
    0: jdbc:hive2://172.28.0.5:10011> show tables;

Export json file to Hive table: to keep data and meta-data we should configure hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/
docker exec -ti ApplicationSubmitter sh StartApplication.sh --class com.databootcamp.sparkjobs.SaveHive /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar /data/tweets.json tweets

Read Streaming from Flume Agent (polling mode) and save in Hbase table (in batch mode):

- Start Flume Service:

please refer to the example described in the "Flume" repository: MedAmineBB/Flume

- Start HBase Cluser Service:

please refert to the how to describe in the "Hbase" repository: MedAmineBB/HBaseWithHDFS by applying section "Launch Hdfs and Hbase"

- Connect Hbase, Spark and Flume containers:
$ # Create a network where we will expose all dockers that shall communicate in this example
$ docker network create -d bridge --subnet 172.28.0.0/16 bridge_nw
$ # Expose Hbase layer (zookeeper, HMaster and Region Servers)
$ docker network connect bridge_nw zoo1
$ docker network connect bridge_nw zoo2
$ docker network connect bridge_nw zoo3
$ docker network connect bridge_nw rs1
$ docker network connect bridge_nw rs2
$ docker network connect bridge_nw rs3
$ docker network connect bridge_nw hm1
$ docker network connect bridge_nw hm2
$ # Expose Flume layer
$ docker network connect bridge_nw relayer
$ # Expose Spark AllplicationSubmitter, Slaves, masters
$ docker network connect ApplicationSubmitter
$ docker network connect ownspark_LocalClusterNetwork.spark.Slave_COMPLETE_THAT
$ docker network connect Master0
$ docker network connect Master1

- Start Spark Job (from streaming data to batch data):
EXTRA_JARS=`docker exec -ti ApplicationSubmitter sh -c 'ls -p /apps/java-apps/target/libs/*.jar | tr "\n" ","'` sh -c 'docker exec -ti ApplicationSubmitter sh StartApplication.sh --jars $EXTRA_JARS --class com.databootcamp.sparkjobs.StreamingFromFlumeToHBase /apps/java-apps/target/sparkjobs-1.0.0-SNAPSHOT.jar relayer 4545 zoo1,zoo2,zoo3 2181 databootcamp netcat data'

spark-high-availability-zookeeper's People

Contributors

aminebenami avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.