GithubHelp home page GithubHelp logo

baeeq / summingbird Goto Github PK

View Code? Open in Web Editor NEW

This project forked from twitter/summingbird

0.0 3.0 0.0 72 KB

Streaming MapReduce with Scalding and Storm

Home Page: https://twitter.com/summingbird

License: Apache License 2.0

summingbird's Introduction

Summingbird Build Status

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.

While a word-counting aggregation in pure Scala might look like this:

  def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
    source.flatMap { sentence =>
      toWords(sentence).map(_ -> 1L)
    }.foreach { case (k, v) => store.update(k, store.get(k) + v) }

Counting words in Summingbird looks like this:

  def wordCount[P <: Platform[P]]
    (source: Producer[P, String], store: P#Store[String, Long]) =
      source.flatMap { sentence =>
        toWords(sentence).map(_ -> 1L)
      }.sumByKey(store)

The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in "batch mode" (using Scalding), in "realtime mode" (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.

Summingbird provides you with the primitives you need to build rock solid production systems.

Community and Documentation

To learn more and find links to tutorials and information around the web, check out the Summingbird Wiki.

The latest ScalaDocs are hosted on Summingbird's Github Project Page.

Discussion occurs primarily on the Summingbird mailing list. To join the mailing list, email [email protected]. The same address is used for posting once you've joined. Issues should be reported on the GitHub issue tracker. Simpler issues appropriate for first-time contributors looking to help out are tagged "newbie".

Follow @summingbird on Twitter for updates.

Maven

Summingbird modules are published on maven central. The current groupid and version for all modules is, respectively, "com.twitter" and 0.1.0-SNAPSHOT.

Current published artifacts are

  • summingbird-core_2.9.3
  • summingbird-core_2.10
  • summingbird-batch_2.9.3
  • summingbird-batch_2.10
  • summingbird-client_2.9.3
  • summingbird-client_2.10
  • summingbird-kryo_2.9.3
  • summingbird-kryo_2.10
  • summingbird-storm_2.9.3
  • summingbird-storm_2.10
  • summingbird-scalding_2.9.3
  • summingbird-scalding_2.10
  • summingbird-builder_2.9.3
  • summingbird-builder_2.10

The suffix denotes the scala version.

Authors

License

Copyright 2013 Twitter, Inc.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

summingbird's People

Contributors

johnynek avatar sritchie avatar singhala avatar bwallerstein avatar caniszczyk avatar

Watchers

James Cloos avatar baeeq avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.