GithubHelp home page GithubHelp logo

theoctober19th / charmed-spark-rock Goto Github PK

View Code? Open in Web Editor NEW

This project forked from canonical/charmed-spark-rock

1.0 0.0 0.0 104 KB

This repository contains the packaging metadata for creating a ROCK for Apache Spark

Shell 78.32% Python 7.59% Scala 0.85% Makefile 12.81% Dockerfile 0.43%

charmed-spark-rock's Introduction

Introduction to Charmed Spark ROCK (OCI Image)

Container Registry Release

Charmed Spark is a set of Canonical supported artifacts (including charms, ROCK OCI images and SNAPs) that makes operating Spark workloads on Kubernetes seamless, secure and production-ready.

The solution helps to simplify user interaction with Spark applications and the underlying Kubernetes cluster whilst retaining the traditional semantics and command line tooling that users already know. Operators benefit from straightforward, automated deployment of Spark components (e.g. Spark History Server) to the Kubernetes cluster, using Juju.

Deploying Spark applications to Kubernetes has several benefits over other cluster resource managers such as Apache YARN, as it greatly simplifies deployment, operation, authentication while allowing for flexibility and scaling. However, it requires knowledge on Kubernetes, networking and coordination between the different components of the Spark ecosystem in order to provide a scalable, secure and production-ready environment. As a consequence, this can significantly increase complexity for the end user and administrators, as a number of parameters need to be configured and prerequisites must be met for the application to deploy correctly or for using the Spark CLI interface (e.g. pyspark and spark-shell).

Charmed Spark helps to address these usability concerns and provides a consistent management interface for operations engineers and cluster administrators who need to manage enablers like Spark History Server.

Features

The Charmed Spark Rock comes with some built-in tooling embedded:

Version

ROCKs will be named as <version>-<series>_<risk>.

<version> is the software version; <series> is the Ubuntu LTS series that ROCKs supports; and the is the type of release, if it is edge, candidate or stable. Example versioning will be 3.4-22.04_stable which means Charmed Spark is a version 3.4.x of the software, supporting the 22.04 Ubuntu release and currently a 'stable' version of the software. See versioning details here.

Channel can also be represented by combining <version>_<risk>

Release

Charmed Spark ROCK are available at

https://github.com/canonical/charmed-spark-rock/pkgs/container/charmed-spark

ROCKS Usage

Using Charmed Spark OCI Image in K8s Job Execution

The image can be used straight away when running Spark on Kubernetes by setting the appropriate configuration property:

spark.kubernetes.container.image=ghcr.io/canonical/charmed-spark:3.4-22.04_edge

Using spark8t CLI

The spark8t CLI tooling interacts with the K8s API to create, manage and delete K8s resources representing the Spark service account. Make sure that the kube config file is correctly loaded into the container, e.g.

docker run --name chamed-spark -v /path/to/kube/config:/var/lib/spark/.kube/config ghcr.io/canonical/charmed-spark:3.4-22.04_edge

Note that this will start the image and a long-living service, allowing you to exec commands:

docker exec charmed-spark spark-client.service-account-registry list

If you prefer to run one-shot commands, without having the Charmed Spark image running, use \; exec prefix, e.g.

docker run -v ... ghcr.io/canonical/charmed-spark:3.4-22.04_edge \; exec spark-client.service-account-registry list

For more information about spark-client API and spark8t tooling, please refer to here.

Starting Pebble services

Charmed Spark Rock Image is delivered with Pebble already included in order to manage services. If you want to start a service, use the \; start <service-name> prefix.

Starting History Server

docker run ghcr.io/canonical/charmed-spark:3.4-22.04_edge \; start history-server

Running Jupyter Lab

In the Charmed Spark bundle we also provide the charmed-spark-jupyter image, specifically built for running JupterLab server integrated with Spark where any notebook will also start dedicated executors and inject a SparkSession and/or SparkContext within the notebook.

To start a JupyterLab server using the charmed-spark-jupyter image, use

docker run \
  -v /path/to/kube/config:/var/lib/spark/.kube/config \
  -p <port>:8888
  ghcr.io/canonical/charmed-spark-jupyter:3.4-22.04_edge \
  --username <spark-service-account> --namespace <spark-namespace>

Make sure to have created the <spark-service-account> in the <spark-namespace> with the spark8t CLI beforehand. You should be able to access the jupyter server at http://0.0.0.0:<port>.

You can provide extra-arguments to further configure the spark-executors by providing more spark8t commands. The mount of the local kubeconfig file is necessary to provide the ability to the JupyterLab server to act as a Spark driver and request resources on the K8s cluster.

Developers and Contributing

Please see the CONTRIBUTING.md for guidelines and for developer guidance.

Bugs and feature request

If you find a bug in this ROCK or want to request a specific feature, here are the useful links:

Licence statement

Charmed Spark is free software, distributed under the Apache Software License, version 2.0.

Trademark Notice

Apache®, Apache Spark, Spark®, and the Spark logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

charmed-spark-rock's People

Contributors

deusebio avatar averma-canonical avatar welpaolo avatar theoctober19th avatar taurus-forever avatar juditnovak avatar jardon avatar

Stargazers

Nabin Kandel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.