GithubHelp home page GithubHelp logo

aalkilani / programmingwithscalding Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scalding-io/programmingwithscalding

1.0 2.0 0.0 563 KB

Programming Map-Reduce with Scalding

Home Page: http://scalding.io

License: Other

programmingwithscalding's Introduction

Source code for PACKT Book 'Programming MapReduce With Scalding'

Find more information at http://scalding.io/

The book consists of 9 chapters

  • Introduction to Map-Reduce - Introduction to Hadoop, Map Reduce, Pipelining, Cascading, Pig and Hive. Chapter presents benefits of higher level abstractions of Map Reduce (concepts and capabilities).

  • Get ready for Scalding - Theory about Scalding - the Scala Domain Specific Language utilising Cascading. Development environment setup including local hadoop cluster for development. Execute the first Hello World Scalding example.

  • Scalding by example - The core capabilities of scalding: i) Map-like functions, ii) Grouping/reducing functions iii) Join operations

  • Intermediate examples - A Scalding log processing flow for a News company, aggregating multiple sources will be presented. Through an example with multiple pipe-lines some more advanced concepts are presented.

  • Scalding Design Patterns - Interesting design patterns applicable to Scalding data processing applications. Using the 'External Operations' patters will enable us performing unit testing and structuring our applications in a modular way.

  • Testing & TDD - Best practices of first defining behaviour (Behaviour Driven Development) then tests (Test Driven Development) and then completing the implementation. How to write unit, integration tests and also apply Black-box testing methodologies in the context of Big Data.

  • Running Scalding in Production - Tips and tricks on how to execute and schedule jobs. Also how to co-ordinate the execution of Scalding/Scala/Java and even external system processes. Finally how to configure Scalding jobs using property files or Hadoop parameters, how to monitor and optimize jobs and other usefull tips.

  • Using external data stores - Interaction with external external SQL, NOSQL and in-memory applications like HBase, SQL, ElasticSearch etc.

  • Matrix Calculations and Machine Learning - Matrix calculations using the Matrix API and algebird to calculate text similarity (TF-IDF) and set similarity (Jaccard). Then another example on Mahout K-Means clustering and outlier detection.

programmingwithscalding's People

Contributors

antwnis avatar galarragas avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.