GithubHelp home page GithubHelp logo

hhy5277 / native_spark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rajasekarv/vega

0.0 1.0 0.0 188 KB

A new arguably faster implementation of Apache Spark from scratch in Rust

License: Apache License 2.0

Rust 99.98% Cap'n Proto 0.02%

native_spark's Introduction

native_spark

Join the chat at https://gitter.im/fast_spark/community

A new arguably faster implementation of Apache Spark from scratch in Rust. WIP

Just install Cap'n proto and you are good to go. Code is tested only on Linux and requires nightly version. It is tested for version 1.39 only, there are some breaking changes in specialization from version to version, so use 1.39 only for now.

Use this command: cargo +nightly-2019-09-11 build --release

Refer make_rdd.rs and other examples in example code to get the basic idea.

You need to have hosts.conf in the format present inside config folder in the home directory of all of the machines when running in distributed mode and all of them should be ssh-able from master. The master port can be configured in hosts.conf and 10500 in executors should be free. Ports 5000-6000 is reserved for shuffle manager. It will be handled internally soon.

Since File readers are not done, you have to use manual file reading for now (like manually reading from S3 or hack around local files by distributing copies of all files to all machines and make rdd using filename list).

Ctrl-C handling and panic handling is not done yet, so if there is some problem in runtime, executors won't be shut down automatically and you have to manually kill the processes.

One of the limitations of current implementation is that the input and return types of all closures and all input to make_rdd should be owned data.

Configuration

You can specify the local IP address using the environmental variable SPARK_LOCAL_IP.

ToDo

  • Error Handling(Priority)
  • Fault tolerance

RDD

Most of these except file reader and writer are trivial to implement

  • map
  • flat_map
  • filter
  • group_by
  • reduce_by
  • distinct
  • count
  • take_sample
  • union
  • glom
  • cartesian
  • pipe
  • map_partitions
  • for_each
  • collect
  • reduce
  • fold
  • aggregate
  • take
  • first
  • sample
  • save_as_text_file(can save only as text file in executors local file system)

Config Files

  • Replace hard coded values

native_spark's People

Contributors

0xflotus avatar alecmocatta avatar ariesdevil avatar edprince avatar gitter-badger avatar gzsombor avatar iduartgomez avatar marcelbuesing avatar nimpruda avatar rajasekarv avatar rodrigocfd avatar steven-joruk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.