GithubHelp home page GithubHelp logo

james-hadoop / awesome-spark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wypb/awesome-spark

0.0 2.0 0.0 74 KB

A curated list of awesome Apache Spark packages and resources.

License: Creative Commons Zero v1.0 Universal

awesome-spark's Introduction

Awesome Spark Awesome

A curated list of awesome Apache Spark packages and resources.

Table of Contents

Packages

Language Bindings

Notebooks and IDEs

  • Apache Zeppelin - Web-based notebook that enables interactive data analytics with plugable backends, integrated plotting, and extensive Spark support out-of-the-box.
  • Spark Notebook - Scalable and stable Scala and Spark focused notebook bridging the gap between JVM and Data Scientists (incl. extendable, typesafe and reactive charts).
  • sparkmagic - Jupyter magics and kernels for working with remote Spark clusters, for interactively working with remote Spark clusters through Livy, in Jupyter notebooks.

General Purpose Libraries

  • Succinct - Support for efficient queries on compressed data.

SQL Data Sources

Bioinformatics

  • ADAM - A set of tools designed to analyse genomics data.
  • Hail - A genetic analysis framework.

GIS

  • Magellan - Geospatial analytics using Spark.
  • GeoSpark - A cluster computing system for processing large-scale spatial data.

Time Series Analytics

  • Spark-Timeseries - A Scala / Java / Python library for interacting with time series data on Apache Spark.

Graph Processing

  • Mazerunner - Graph analytics platform on top of Neo4j and GraphX.
  • GraphFrames - Data frame based graph API.
  • neo4j-spark-connector - Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX / GraphFrames support.

Machine Learning Extension

Middleware

  • Livy - REST server with extensive language support (Python, R, Scala), ability to maintain interactive sessions and object sharing.
  • spark-jobserver - A simple Spark as a Service which supports objects sharing using so called named objects. JVM only.
  • Mist - HTTP and MQTT API intended to expose Spark to exeternal services.
  • Apache Toree - IPython protocol based middleware for interactive applications.

Utilities

  • silex - A bunch of tools varying from ML extensions to additional RDD methods.

Natural Language Processing

Streaming

  • Apache Bahir - A collection of the streaming connectors excluded from Spark 2.0 (Akka, MQTT, Twitter. ZeroMQ).

Resources

Books

MOOCS

Workshops

Projects Using Spark

  • Oryx 2 - A lambda architecture built on Apache Spark and Apache Kafka with specialization for real-time large scale machine learning.
  • Photon ML - A machine learning library supporting classical Generalized Mixed Model and Generalized Additive Mixed Effect Model.
  • PredictionIO - Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time.
  • Crossdata - Data integration platform with extended DataSource API and multi-user environment.

Blogs

  • Spark Technology Center - A great source of highly diverse posts related to Spark ecosystem. From practical advices to Spark commiter profiles.

Docker Images

Miscellaneous

License

Public Domain Mark
This work (Awesome Spark, by https://github.com/awesome-spark/awesome-spark), identified by Maciej Szymkiewicz, is free of known copyright restrictions.

Apache Spark, Spark, Apache, and the Spark logo are trademarks of The Apache Software Foundation. This compilation is not endorsed by The Apache Software Foundation.

awesome-spark's People

Contributors

eliasah avatar zero323 avatar cycorey avatar andypetrella avatar jimmyho avatar mbonaci avatar

Watchers

James Cloos avatar James John avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.