GithubHelp home page GithubHelp logo

mtomas / spark-project-template Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nfo/spark-project-template

0.0 1.0 0.0 124 KB

Template of a Spark project, with IDEA support, bundling with assembly, examples, ...

License: Do What The F*ck You Want To Public License

Scala 100.00%

spark-project-template's Introduction

Steps

  • Install SBT: brew install sbt
  • Copy this directory
  • Remove the .git setup: rm -rf .git
  • Change the project name and version in build.sbt
  • Change the project name in src/main/scala/myproject/Main.scala
  • Run an sbt console just to check if everything's ok: sbt console. It could take a few minutes the first time. It will will create the directories project/target and target (which are .gitignored). The result is a scala 2.10.4 console, with all the project dependencies loaded.
  • Optionnaly intitialize a new git project: git init

Import the project in IntelliJ IDEA

  • Start IntelliJ IDEA. In the Welcome window, click on "Import Project"
  • Enter the project path
  • Choose the external model SBT
  • Check
    • Use auto-import
    • Download sources and docs
  • Choose a Project SDK with a version >= 1.7.
  • Move the Main and Schema classes to your package name
  • Right-click on the Main class and click Run Main

Build and run your Spark job on a Spark cluster

We use sbt-assembly to bundle the application in a fat JAR, ready to be submitted to a Spark cluster. The JAR must not include the Spark components (spark-core, spark-sql, hadoop-client, etc) and their dependencies.

To build the JAR:

  • first exit Intellij IDEA if it's configured to track the changes to build.sbt
  • edit build.sbt, to switch the spark-* and hadoop-* dependencies (see the comments inside build.sbt)
  • run sbt assembly. The generated is in target/scala-2.10/{projectname}-assembly-{version}.jar

TODO: try to remove the manual part of editing build.sbt.

To submit the JAR:

  • scp the JAR on the spark master
  • ssh on the spark master
  • to prevent the job from stopping if you disconnect from the server, run: screen
  • submit the JAR with the command: ~/spark/bin/spark-submit --master spark://ec2-w-x-y-z.eu-west-1.compute.amazonaws.com:7077 --class io.basilic.MySparkJob ~/MyProject-assembly-1.0.jar > /mnt/job.out &> /mnt/job.err
  • tail logs with: tail -f /mnt/job.{out,err}

Treats

Starting a Spark Cluster on EC2

TODO: write this paragraph

TODO: By default, spark-ec2 runs with hadoop-client on 1.0.4.
  One can also run the cluster on 2.0.x with `--hadoop-major-version=2`,
  which is an alpha version. @see http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client
  spark-ec2 does not provide a way to use the stable 2.4
  It would be nice to find a way to run spark-ec2 with the hadoop-client 2.4.x.
  @see https://groups.google.com/d/msg/spark-users/pHaF01sPwBo/faHr-fEAFbYJ

spark-project-template's People

Contributors

nfo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.