GithubHelp home page GithubHelp logo

dsdinter / scalding-tutorial Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cascading/scalding-tutorial

0.0 2.0 0.0 323 KB

The Scalding tutorial as a standalone SBT project

Scala 100.00%

scalding-tutorial's Introduction

Scalding Tutorial Project

Introduction

This is Twitter's tutorial tutorial for Scalding scalding adapted to run on Hadoop as a standalone job - i.e. without requiring scald.rb etc.

This was built as a Scala SBT project by the [Concurrent Inc] concurrent team, in order to integrate the scalding tutorial into the Cascading SDK. It is based on the excellent work done by Snowplow Analytics for porting the Wordcount example to SBT.

The versioning of the project follows the versions of the scalding release on which it is based.

Please note that this tutorial uses scala 2.10 and not 2.9.

Prerequisites

In order to use this tutorial, you need to have SBT and the hadoop command installed. Cascading and therefore scalding is compatible with a number of hadoop distributions. If you are unsure, if your distribution is compatible, please check the compatibility page.

You do not need to have a full hadoop cluster, in order to run this tutorial. The local mode of hadoop is sufficient.

Building

Assuming you already have SBT installed:

$ git clone git://github.com/Cascading/scalding-tutorial.git
$ cd scalding-tutorial
$ sbt assembly

The 'fat jar' is now available as:

target/scalding-tutorial-0.11.2.jar

Project structure

Some modifications have been done to the code, order to properly work in an SBT based build.

  • all code is now in src/main/scala/tutorial
  • the data files for the different parts live now in data
  • the classes in the matrix tutorial have been renamed to match the file names, so that the commandline invocation is similar to the original tutorial
  • the documentation of the examples has been adapted to match the new structure

Running the examples

Each part of the tutorial explains, how to run it properly. However the general way is always

$ yarn jar target/scalding-tutorial-0.11.2.jar <TutorialPart> --local <addtional arguments>

Copyright and license

Copyright 2012-2014 Concurrent Inc, with significant portions copyright 2012 Twitter, Inc. and Snowplow Analytics Inc.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

scalding-tutorial's People

Contributors

alexanderdean avatar fs111 avatar kadwanev avatar ajorgensen avatar epishkin avatar

Watchers

James Cloos avatar David Sabater Dinter avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.