GithubHelp home page GithubHelp logo

dpnice / sparkrdma Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mellanox/sparkrdma

0.0 1.0 0.0 153 KB

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

License: Apache License 2.0

Java 41.92% Scala 58.08%

sparkrdma's Introduction

SparkRDMA ShuffleManager Plugin

SparkRDMA is a high performance ShuffleManager plugin for Apache Spark that uses RDMA (instead of TCP) when performing Shuffle data transfers in Spark jobs.

This open-source project is developed, maintained and supported by Mellanox Technologies.

Performance results

Example performance speedup for HiBench TeraSort: Alt text

Running TeraSort with SparkRDMA is x1.41 faster than standard Spark (runtime in seconds)

Testbed:

175GB Workload

15 Workers, 2x Intel Xeon E5-2697 v3 @ 2.60GHz, 28 cores per Worker, 256GB RAM, non-flash storage (HDD)

Mellanox ConnectX-4 network adapter with 100GbE RoCE fabric, connected with a Mellanox Spectrum switch

Wiki pages

For more information on configuration, performance tuning and troubleshooting, please visit the SparkRDMA GitHub Wiki

Runtime requirements

  • Apache Spark 2.0.0/2.1.0/2.2.0
  • Java 8
  • An RDMA-supported network, e.g. RoCE or Infiniband

Installation

Obtain SparkRDMA and DiSNI binaries

Please use the "Releases" page to download pre-built binaries.
If you would like to build the project yourself, please refer to the "Build" section below.

The pre-built binaries are packed as an archive that contains the following files:

  • spark-rdma-1.0-for-spark-2.0.0-jar-with-dependencies.jar
  • spark-rdma-1.0-for-spark-2.1.0-jar-with-dependencies.jar
  • spark-rdma-1.0-for-spark-2.2.0-jar-with-dependencies.jar
  • libdisni.so

libdisni.so must be installed on every Spark Master and Worker (usually in /usr/lib)

Configuration

Provide Spark the location of the SparkRDMA plugin jars by using the extraClassPath option. For standalone mode this can be added to either spark-defaults.conf or any runtime configuration file. For client mode this must be added to spark-defaults.conf. For Spark 2.0.0 (Replace with 2.1.0 or 2.2.0 according to your Spark version):

spark.driver.extraClassPath   /path/to/SparkRDMA/target/spark-rdma-1.0-for-spark-2.0.0-jar-with-dependencies.jar
spark.executor.extraClassPath /path/to/SparkRDMA/target/spark-rdma-1.0-for-spark-2.0.0-jar-with-dependencies.jar

Running

To enable the SparkRDMA Shuffle Manager plugin, add the following line to either spark-defaults.conf or any runtime configuration file:

spark.shuffle.manager   org.apache.spark.shuffle.rdma.RdmaShuffleManager

Build

Building the SparkRDMA plugin requires Apache Maven and Java 8

  1. Obtain a clone of SparkRDMA

  2. Build the plugin for your Spark version (either 2.0.0, 2.1.0 or 2.2.0), e.g. for Spark 2.0.0:

mvn -DskipTests clean package -Pspark-2.0.0
  1. Obtain a clone of DiSNI for building libdisni:
git clone https://github.com/zrlio/disni.git
cd disni
git checkout tags/v1.3 -b v1.3
  1. Compile and install only libdisni (the jars are already included in the SparkRDMA plugin):
cd libdisni
autoprepare.sh
./configure --with-jdk=/path/to/java8/jdk
make
make install

Community discussions and support

For any questions, issues or suggestions, please use our Google group: https://groups.google.com/forum/#!forum/sparkrdma

Contributions

Any PR submissions are welcome

sparkrdma's People

Contributors

finlaym avatar yuvaldeg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.