GithubHelp home page GithubHelp logo

janetkuo / spark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache-spark-on-k8s/spark

0.0 3.0 0.0 266.46 MB

Apache Spark enhanced with native Kubernetes scheduler back-end

License: Apache License 2.0

Shell 0.56% Batchfile 0.09% R 3.22% Makefile 0.03% C 0.01% Java 9.97% Scala 77.52% JavaScript 0.49% CSS 0.09% HTML 0.03% PowerShell 0.01% Python 7.60% Roff 0.11% ANTLR 0.12% PLpgSQL 0.01% SQLPL 0.02% Thrift 0.12%

spark's Introduction

Apache Spark On Kubernetes

This repository, located at https://github.com/apache-spark-on-k8s/spark, contains a fork of Apache Spark that enables running Spark jobs natively on a Kubernetes cluster.

What is this?

This is a collaboratively maintained project working on SPARK-18278. The goal is to bring native support for Spark to use Kubernetes as a cluster manager, in a fully supported way on par with the Spark Standalone, Mesos, and Apache YARN cluster managers.

Getting Started

Why does this fork exist?

Adding native integration for a new cluster manager is a large undertaking. If poorly executed, it could introduce bugs into Spark when run on other cluster managers, cause release blockers slowing down the overall Spark project, or require hotfixes which divert attention away from development towards managing additional releases. Any work this deep inside Spark needs to be done carefully to minimize the risk of those negative externalities.

At the same time, an increasing number of people from various companies and organizations desire to work together to natively run Spark on Kubernetes. The group needs a code repository, communication forum, issue tracking, and continuous integration, all in order to work together effectively on an open source product.

We've been asked by an Apache Spark Committer to work outside of the Apache infrastructure for a short period of time to allow this feature to be hardened and improved without creating risk for Apache Spark. The aim is to rapidly bring it to the point where it can be brought into the mainline Apache Spark repository for continued development within the Apache umbrella. If all goes well, this should be a short-lived fork rather than a long-lived one.

Who are we?

This is a collaborative effort by several folks from different companies who are interested in seeing this feature be successful. Companies active in this project include (alphabetically):

  • Google
  • Haiwen
  • Hyperpilot
  • Intel
  • Palantir
  • Pepperdata
  • Red Hat

spark's People

Contributors

mateiz avatar rxin avatar pwendell avatar tdas avatar joshrosen avatar zsxwing avatar liancheng avatar cloud-fan avatar mengxr avatar srowen avatar marmbrus avatar ankurdave avatar yhuai avatar yanboliang avatar viirya avatar jegonzal avatar shivaram avatar scrapcodes avatar dongjoon-hyun avatar gatorsmile avatar aarondav avatar jkbradley avatar hyukjinkwon avatar holdenk avatar jerryshao avatar sarutak avatar andrewor14 avatar kayousterhout avatar chenghao-intel avatar adrian-wang avatar

Watchers

James Cloos avatar Janet Kuo avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.