GithubHelp home page GithubHelp logo

mapreduce-deployment's Introduction

Golang implement of MapReduce

This is our EE447 final project, idea comes from MIT 6.824 course project. Contributors are @sun-lingyu, @yifanlu0227,@Nicholas0228

Required package & how to install

  • golang 1.15+

  • crypto/ssh : go get golang.org/x/crypto/[email protected]

  • python3-dev : sudo apt-get install python3-dev -y

  • python2-dev : sudo apt-get install python2.7-dev -y

Optional package & how to install

  • nltk (for word count & inverted index example ): pip3 install nltk -y ; pip2 install nltk==3.0.0 -y
  • numpy (for KNN example): pip install numpy

Usage

First run git clone https://github.com/yifanlu0227/mapreduce.git to download this resposity to your machines. Select one machine to be the coordinator, and others to be workers.

You should edit your worker's ip / username / password in mapreduce/src/main/mrcoordinator.go like following.

	hosts := []string{"192.168.0.132", "192.168.0.184", "192.168.0.33", "192.168.0.199"}
	command := "go run mrworker.go " + os.Args[1]
	mr.AwakenWorkers("root", "Ydhlw123", hosts, command)

And you should make sure the 1234 port and 8081 port are available, since we will use them for our RPC and http server.

Python support

Our MapReduce support python development, i.e., you can just provide a simple python file including map function and reduce function. You can refer to our provide example like word count mapreduce/src/main/wc.py .

def map(name, contents):
	lower = contents.upper()
	remove =  string.maketrans(string.punctuation, string.punctuation,) 
	lower1 = lower.translate(remove, string.punctuation,)
	without_punctuation = lower1.translate(remove, string.digits,)
	tokens = nltk.word_tokenize(without_punctuation)
	kva = []
	for p in tokens:
		lisdict = {}
		lisdict[p] = "1"
		kva.append(lisdict)
	return kva
	
def reduce(key, values):
	return str(len(values))

To run the this word count example with input file pg-*.txt , run this in terminal

go run mrcoordinator.go wc pg-*.txt

The KNN example

go run mrcoordinator.go knn dataset*.txt

The Inverted Index example

go run mrcoordinator.go inverted_index pg-*.txt

To see the output file, run

cat mr-out-* | sort | more

Experiment

word count:

wordcount


inverted index:

invertedindex


KNN large dataset:

wordcount

Visualization

worker perspective

worker


file perspective

file

file

Acknowledge

MIT 6.824

mapreduce-deployment's People

Contributors

yifanlu0227 avatar nicholas0228 avatar sun-lingyu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.