- Requirements
- Contributing
- Issues
- Mailing List
- Binary downloads
- Making a build
- Sparkling Shell
- Running Examples
- Additional Examples
- Docker Support
- FAQ
Sparkling Water integrates H2O's fast scalable machine learning engine with Spark.
- Linux or OS X (Windows support is pending)
- Java 7
- Spark 1.2.0
SPARK_HOME
shell variable must point to your local Spark installation
Look at our list of JIRA tasks for new contributors or send your idea to [email protected].
For issues reporting please use JIRA at http://jira.h2o.ai/.
Follow our H2O Stream.
Use the provided gradlew
to build project:
./gradlew build
To avoid running tests, please, use
-x test
option
The Sparkling shell provides a regular Spark shell that supports creation of an H2O cloud and execution of H2O algorithms.
First, build a package containing Sparkling water:
./gradlew assemble
Configure the location of Spark cluster:
export SPARK_HOME="/path/to/spark/installation"
export MASTER="local-cluster[3,2,1024]"
In this case
local-cluster[3,2,1024]
points to embedded cluster of 3 worker nodes, each with 2 cores and 1G of memory.
And run Sparkling Shell:
bin/sparkling-shell
Sparkling Shell accepts common Spark Shell arguments. For example, to increase memory allocated by each executor use the
spark.executor.memory
parameter:bin/sparkling-shell --conf "spark.executor.memory=4g"
Build a package that can be submitted to Spark cluster:
./gradlew assemble
Set the configuration of the demo Spark cluster, for example; local-cluster[3,2,1024]
export SPARK_HOME="/path/to/spark/installation"
export MASTER="local-cluster[3,2,1024]"
In this example, the description
local-cluster[3,2,1024]
causes the creation of an embedded cluster consisting of 3 workers.
And run the example:
bin/run-example.sh
For more details about the demo, please see the README.md file in the examples directory.
You can find more examples in the examples folder.
See docker/README.md to learn about Docker support.
- Where do I find the Spark logs?
Look for
$SPARK_HOME/work/app-XXX
. The last part of the address is the name of your application.
- Spark is too slow during start or H2O is not able to cluster.
Configure the Spark variable
SPARK_LOCAL_IP
. For example:
export SPARK_LOCAL_IP='127.0.0.1'
- How do I increase the amount of memory assigned to the Spark executors in Sparkling Shell?
Sparkling Shell accepts common Spark Shell arguments. For example, to increase the amount of memory allocated by each executor, use the
spark.executor.memory
parameter:bin/sparkling-shell --conf "spark.executor.memory=4g"