GithubHelp home page GithubHelp logo

openshift-spark's Introduction

Build status Docker build Layers info

Apache Spark images for OpenShift

This repository contains several files for building Apache Spark focused container images, targeted for usage on OpenShift Origin.

By default, it will build the following images into your local Docker registry:

  • openshift-spark, Apache Spark, Python 2.7
  • openshift-spark-py36, Apache Spark, Python 3.6

For Spark versions, please see the image.yaml file.

Instructions

Build

Create all images and save them in the local Docker registry.

make

Push

Tag and push the images to the designated reference.

make push SPARK_IMAGE=[REGISTRY_HOST[:REGISTRY_PORT]/]NAME[:TAG]

Customization

There are several ways to customize the construction and build process. This project uses the GNU Make tool for the build workflow, see the Makefile for more information. For container specification and construction, the Container image creation tool concreate is used as the primary point of investigation, see the image.yaml file for more information.

Partial images without an Apache Spark distribution installed

This repository also supports building 'incomplete' versions of the images which contain tooling for OpenShift but lack an actual Spark distribution. An s2i workflow can be used with these partial images to install a Spark distribution of a user's choosing. This gives users an alternative to checking out the repository and modifying build files if they want to run a custom Spark distribution. By default, the partial images built will be

  • openshift-spark-inc, Apache Spark, Python 2.7
  • openshift-spark-inc-py36, Apache Spark, Python 3.6

Build

To build the partial images, use make with Makefile.inc

make -f Makefile.inc

Push

Tag and push the images to the designated reference.

make -f Makefile.inc push SPARK_IMAGE=[REGISTRY_HOST[:REGISTRY_PORT]/]NAME[:TAG]

Image Completion

To produce a final image, a source-to-image build must be performed which takes a Spark distribution as input. This can be done in OpenShift or locally using the s2i tool if it's installed. The final images created can be used just like the openshfit-spark and openshift-spark-py36 images described above.

To complete the Python 2.7 image using the s2i tool

$ mkdir build_input
$ wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz -O build_input/spark-2.3.0-bin-hadoop2.7.tgz
$ wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz.md5 -O build_input/spark-2.3.0-bin-hadoop2.7.tgz.md5
$ s2i build build_input radanalyticsio/openshift-spark-inc openshift-spark

To complete the Python 2.7 image using OpenShift, for example:

$ oc new-build --name=openshift-spark --docker-image=radanalyticsio/openshift-spark-inc --binary
$ oc start-build openshift-spark --from-file=https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

(note that the value of `--from-file` could also be the `build_input` directory from the s2i example above)

This will write the completed image to an imagestream called openshift-spark in the current project

A 'usage' command for all images

Note that all of the images described here will respond to a 'usage' command for reference. For example

$ docker run --rm openshift-spark:latest usage

openshift-spark's People

Contributors

mattf avatar tmckayus avatar jkremser avatar sub-mod avatar willb avatar elmiko avatar jayunit100 avatar soltysh avatar matzew avatar rnowling avatar dfeddema avatar rebeccasimmonds19 avatar

Watchers

Zh avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.