trungtv / mapr-spark-structuredstreaming-uber Goto Github PK
View Code? Open in Web Editor NEWThis project forked from caroljmcdonald/mapr-spark-structuredstreaming-uber
This project forked from caroljmcdonald/mapr-spark-structuredstreaming-uber
This example runs on MapR 6.0.1 with Spark 2.2.1 Install and fire up the Sandbox using the instructions here: http://maprdocs.mapr.com/home/SandboxHadoop/c_sandbox_overview.html. ____________________________________________________________________ Step 1: Log into Sandbox, create data directory, MapR-ES Topic and MapR-DB table: Use an SSH client such as Putty (Windows) or Terminal (Mac) to login. See below for an example: use userid: mapr and password: mapr. For VMWare use: $ ssh mapr@ipaddress For Virtualbox use: $ ssh [email protected] -p 2222 after logging into the sandbox At the Sandbox unix command line: Create a directory for the data for this project mkdir /mapr/demo.mapr.com/data In a sandbox unix terminal window use the mapr command line interface to create a stream, a topic, get info and create a table: maprcli stream create -path /apps/uberstream -produceperm p -consumeperm p -topicperm p maprcli stream topic create -path /apps/uberstream -topic ubers Create the MapR-DB Table maprcli table create -path /apps/ubertable -tabletype json -defaultreadperm p -defaultwriteperm p to get info on the ubers topic : maprcli stream topic info -path /apps/uberstream -topic ubers ____________________________________________________________________ Step 2: Copy the data and jar files You can build this project with Maven using IDEs like Eclipse or NetBeans, and then copy the JAR file to your MapR Sandbox, or you can install Maven on your sandbox and build from the Linux command line, for more information on maven, eclipse or netbeans use google search. After building the project on your laptop, you can use scp to copy your JAR file from the project target folder to the MapR Sandbox: From your laptop command line or with a scp tool : use userid: mapr and password: mapr. For VMWare use: $ scp nameoffile.jar mapr@ipaddress:/home/mapr/. For Virtualbox use: $ scp -P 2222 target/*.jar [email protected]:/home/mapr/. Copy the data file from the project data folder to the sandbox using scp to this directory /user/mapr/data/uber.csv on the sandbox: For Virtualbox use: $ scp -P 2222 data/*.csv [email protected]:/mapr/demo.mapr.com/data/. ____________________________________________________________________ Step 3: Run the Spark k-means program which will create and save the machine learning model: From the Sandbox command line run the Spark program which will create and save the machine learning model: /opt/mapr/spark/spark-2.2.1/bin/spark-submit --class sparkml.ClusterUber --master local[2] mapr-spark-structuredstreaming-uber-1.0.jar This will read from the file mfs:///mapr/demo.mapr.com/data/uber.csv and save the model in mfs:///mapr/demo.mapr.com/data/ubermodel You can optionally pass the files as input parameters <file modelpath> (take a look at the code to see what it does) you can also copy paste the code from ClusterUber.scala into the spark shell /opt/mapr/spark/spark-2.2.1/bin/spark-shell --master local[2] - For Yarn you should change --master parameter to yarn-client - "--master yarn-client" _________________________________________________________________________________ After creating and saving the k-means model you can run the Streaming code: Step 4: Run the Java client to publish events to the topic java -cp ./mapr-spark-structuredstreaming-uber-1.0.jar:`mapr classpath` streams.MsgProducer This client will read lines from the file in "/mapr/demo.mapr.com/data/uber.csv" and publish them to the topic /apps/uberstream:ubers. You can optionally pass the file and topic as input parameters <file topic> Optional: run the MapR Streams Java consumer to see what was published : java -cp mapr-spark-structuredstreaming-uber-1.0.jar:`mapr classpath` streams.MsgConsumer _________________________________________________________________________________ Step 5: Run the the Spark Structured Streaming client to consume events enrich them and write them to MapR-DB (in separate consoles if you want to run at the same time) /opt/mapr/spark/spark-2.2.1/bin/spark-submit --class streaming.StructuredStreamingConsumer --master local[2] \ mapr-spark-structuredstreaming-uber-1.0.jar This spark streaming client will consume from the topic /apps/uberstream:ubers, enrich from the saved model at /mapr/demo.mapr.com/data/ubermodel and write to the table /apps/ubertable. You can optionally pass the input parameters <topic model table> You can use ctl-c to stop ____________________________________________________________________ Step 6: Spark SQL and MapR-DB In another window while the Streaming code is running, run the code to Query from MapR-DB /opt/mapr/spark/spark-2.2.1/bin/spark-submit --class sparkmaprdb.QueryUber --master local[2] \ mapr-spark-structuredstreaming-uber-1.0.jar ____________________________________________________________________ Step 7: Use the Mapr-DB shell to query the data start the hbase shell and scan to see results: $ /opt/mapr/bin/mapr dbshell maprdb mapr:> jsonoptions --pretty true --withtags false maprdb mapr:> find /apps/ubertable --limit 5 maprdb mapr:> find /apps/ubertable --where '{ "$eq" : {"cid":9} }' --limit 10 maprdb mapr:> find /apps/ubertable --where '{"$and":[{"$eq":{"base":"B02617"}},{ "$like" : {"_id":"9%"} }]}' --limit 10 _________________________________________________________________________________ to delete after using : Later to delete a topic: maprcli stream topic delete -path /apps/uberstream -topic ubers Later to delete a table: maprcli table delete -path /apps/ubertable to delete the checkpoint directory hadoop fs -rmr /tmp/uberdb _________________________________________________________________________________
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.