GithubHelp home page GithubHelp logo

dsdinter / sparkling-water Goto Github PK

View Code? Open in Web Editor NEW

This project forked from h2oai/sparkling-water

0.0 2.0 0.0 5.93 MB

Sparkling Water provides H2O functionality inside Spark cluster

License: Apache License 2.0

Shell 4.38% Scala 73.18% HTML 22.44%

sparkling-water's Introduction

Sparkling Water

Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.

Requirements

  • Linux or OS X (Windows support is pending)
  • Java 7
  • Spark 1.2.0
    • SPARK_HOME shell variable must point to your local Spark installation

Contributing

Look at our list of JIRA tasks for new contributors or send your idea to [email protected].


Issues

For issues reporting please use JIRA at http://jira.h2o.ai/.


Mailing list

Follow our H2O Stream.


Downloads of binaries


Making a build

Use the provided gradlew to build project:

./gradlew build

To avoid running tests, please, use -x test option


Sparkling shell

The Sparkling shell provides a regular Spark shell that supports creation of an H2O cloud and execution of H2O algorithms.

First, build a package containing Sparkling water:

./gradlew assemble

Configure the location of Spark cluster:

export SPARK_HOME="/path/to/spark/installation"
export MASTER="local-cluster[3,2,1024]"

In this case local-cluster[3,2,1024] points to embedded cluster of 3 worker nodes, each with 2 cores and 1G of memory.

And run Sparkling Shell:

bin/sparkling-shell

Sparkling Shell accepts common Spark Shell arguments. For example, to increase memory allocated by each executor use the spark.executor.memory parameter: bin/sparkling-shell --conf "spark.executor.memory=4g"


Running examples

Build a package that can be submitted to Spark cluster:

./gradlew assemble

Set the configuration of the demo Spark cluster, for example; local-cluster[3,2,1024]

export SPARK_HOME="/path/to/spark/installation"
export MASTER="local-cluster[3,2,1024]"

In this example, the description local-cluster[3,2,1024] causes the creation of an embedded cluster consisting of 3 workers.

And run the example:

bin/run-example.sh

For more details about the demo, please see the README.md file in the examples directory.


Additional Examples

You can find more examples in the examples folder.


Docker Support

See docker/README.md to learn about Docker support.


FAQ

  • Where do I find the Spark logs?

Look for $SPARK_HOME/work/app-XXX. The last part of the address is the name of your application.

  • Spark is too slow during start or H2O is not able to cluster.

Configure the Spark variable SPARK_LOCAL_IP. For example:

export SPARK_LOCAL_IP='127.0.0.1'
  • How do I increase the amount of memory assigned to the Spark executors in Sparkling Shell?

Sparkling Shell accepts common Spark Shell arguments. For example, to increase the amount of memory allocated by each executor, use the spark.executor.memory parameter: bin/sparkling-shell --conf "spark.executor.memory=4g"

sparkling-water's People

Contributors

mmalohlava avatar jessica0xdata avatar bghill avatar dsdinter avatar hsaputra avatar tomkraljevic avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.