GithubHelp home page GithubHelp logo

airtoxin / glue-mapreduce Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 256 KB

node.js mapreduce library that has concept of "once write run anywhere" for hadoop framework.

License: MIT License

JavaScript 100.00%

glue-mapreduce's Introduction

glue-mapreduce

Build Status Coverage Status Dependency Status Code Climate

node.js mapreduce library that has concept of "once write run anywhere" for hadoop framework.

##Motivation Scalable computing is more and more important for analyse and aggregate BigData, and Map-Reduce is scalable computing algorithm for distributed platform used in Hadoop or more another service.

It is important for think about scalable programing to stable running for increasing data day by day. But case of "Map-Reduce is exaggerated for current data size (MB~GB order), but ensure scalability." are very often request.

I solve these problems to run local Map-Reduce aggregation that have portablility to Hadoop platforms.

##Install npm i glue-mapreduce

TODO: publish

##How to use ###Script Important: This script runs on local or "Hadoop", so you should write the core of your Map-Reduce algorithm. Do not write any other algorithms.

var mr = new (require('glue-mapreduce'))();

// regist "local" input data
mr.input = function (callback) {
    var error = null;
    // data must be iterable
    var data = fs.readFileSync('somefile.txt').toString().split('\n');
    return callback(error, data);
};

// regist mapper
mr.mapper = function (mapLine, callback) {
    // mapper called per iteration of input data
    var error = null;
    var split = mapLine.split(' ');
    var key = split[0],
        val = split[1];

    return callback(error, [{k: key, v: val}]};
    // callback can be return multi key-value pairs
};

// regist reducer
mr.reducer = function (key, values, callback) {
    // reducer called per iteration of keys

    return callback(error, [{k: key, v: values.length}]);
    // callback can be return multi key-value pairs
};

// run Map-Reduce job
mr.run(data, function (results) {
    // this callback do not called on Hadoop
    /*
    results is array of key value pair
    [{
        k: 'key',
        v: 100 (reduced values)
    }, ...]
    */
});

###run options glue-mapreduce make a decision about whether to run with local Map-Reduce or Hadoop streaming mapper or reducer by command-line argument.

To run local Map-Reduce, node somemapreduce.js local or no arguments.

To run Hadoop Streaming Mapper, hadop jar hadoop-streaming.jar -mapper 'somemapreduce.js mapper' ...

To run Hadoop Streaming Reducer, hadoop jar hadoop-streaming.jar -reducer 'somemapreduce.js reducer' ...

Important: To quote command need to assign argument.

These behavior also can control by mr.mode variable. This valiable can be taken 'local', 'mapper' or 'reducer'. e.g. mr.mode = 'local' runs local Map-Reduce aggregation.

###Testing If you want to test your script, following command can be test hadoop mode.

node myscript.js map < myinput.txt | sort | node myscript.js red

###Contribute ####Testing npm test ####Coverage npm run coverage

glue-mapreduce's People

Contributors

airtoxin avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.