GithubHelp home page GithubHelp logo

mupd8's Introduction

MUPD8


NOTICE:

This repository has been archived and is not supported.

No Maintenance Intended


NOTICE: SUPPORT FOR THIS PROJECT HAS ENDED

This projected was owned and maintained by Walmart. This project has reached its end of life and Walmart no longer supports this project.

We will no longer be monitoring the issues for this project or reviewing pull requests. You are free to continue using this project under the license terms or forks of this project at your own risk. This project is no longer subject to Walmart's bug bounty program or other security monitoring.

Actions you can take

We recommend you take the following action:

  • Review any configuration files used for build automation and make appropriate updates to remove or replace this project
  • Notify other members of your team and/or organization of this change
  • Notify your security team to help you evaluate alternative options

Forking and transition of ownership

For security reasons, Walmart does not transfer the ownership of our primary repos on Github or other platforms to other individuals/organizations. Further, we do not transfer ownership of packages for public package management systems.

If you would like to fork this package and continue development, you should choose a new name for the project and create your own packages, build automation, etc.

Please review the licensing terms of this project, which continue to be in effect even after decommission.


MUPD8 implements the MapUpdate framework, a MapReduce-style framework for processing fast/streaming data.

Run mvn site (using Maven 3+) to generate the file

target/docbkx/html/quickstart.html

which provides a step-by-step walkthrough to create and run a simple example MUPD8 application. A copy of this file is also available at

http://walmartlabs.github.com/mupd8/quickstart.html

for readers who do not yet have a MUPD8 checkout.

This paper provides additional technical background:

Wang Lam, Lu Liu, STS Prasad, Anand Rajaraman, Zoheb Vacheri, and AnHai Doan. Muppet: MapReduce-Style Processing of Fast Data. PVLDB 5(12):1814-1825, 2012. (available as http://vldb.org/pvldb/vol5/p1814_wanglam_vldb2012.pdf)

mupd8's People

Contributors

ambujsingh avatar cwstege avatar dalmaer avatar jhalterman avatar mmonto7 avatar morshdy avatar psarda avatar sunweik avatar wlam avatar yiqianz avatar zheguang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mupd8's Issues

Writes to the backing database are batched, instead of being throttled

1432eb6

Copy pasting the comment here

This and the previous Cassandra writer change to the background writer thread creates a regression. The old code would throttle writes to maintain a steady speed of writes to the backing database. These changes remove the sleep after every write and put it after all the writes have gone in together. This is a bug that can overload the database. Please revert to the earlier behavior.

java.lang.InterruptedException Should it kill the source thread or not ?

Please allow Interrupted signal as kill signal for Source thread. In my opinion, it should.

Thanks,
Bhavesh

21 Jan 2015 16:19:37,399 ERROR SourceReader:request_source:dare-msgq00.sv.walmartlabs.com:9091,dare-msgq01.sv.walmartlabs.com:9091,dare-msgq02.sv.walmartlabs.com:9091:mupd_statsd__metric_events:logmon.heartbeat:32:KafkaSource_HBs - SourceThread: exception during reads. Swallowed to continue next read.
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:63)
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at com.walmart.platform.logevent.common.source.AbstractBaseMupd8KafkaSource.hasNext(AbstractBaseMupd8KafkaSource.java:51)
at com.walmartlabs.mupd8.AppRuntime$SourceThread$1$$anonfun$run$1.apply$mcV$sp(AppRuntime.scala:424)
at scala.util.control.Breaks.breakable(Breaks.scala:37)
at com.walmartlabs.mupd8.AppRuntime$SourceThread$1.run(AppRuntime.scala:418)
at java.lang.Thread.run(Thread.java:744)
21 Jan 2015 16:19:37,401 ERROR SourceReader:request_source:KafkaSource_HBs:14:logmon.heartbeat:mupd_statsd__metric_events - SourceThread: exception during reads. Swallowed to continue next read.

NPE Again

Please fix when you get chance following exception.

Thanks,
Bhavesh

Here is log:

21 Jan 2015 17:18:23,930 ERROR New I/O server worker #2-1 - HttpServerHandler exception.
java.lang.NullPointerException
at com.walmartlabs.mupd8.AppRuntime$$anonfun$22.apply(AppRuntime.scala:310)
at com.walmartlabs.mupd8.AppRuntime$$anonfun$22.apply(AppRuntime.scala:227)
at com.walmartlabs.mupd8.HttpServerHandler.handleHttpRequest(HttpServer.scala:117)
at com.walmartlabs.mupd8.HttpServerHandler.messageReceived(HttpServer.scala:100)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndFireMessageReceived(ReplayingDecoder.java:527)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
21 Jan 2015 17:18:25,102 ERROR New I/O server worker #2-2 - HttpServerHandler exception.
java.lang.NullPointerException
at com.walmartlabs.mupd8.AppRuntime$$anonfun$22.apply(AppRuntime.scala:310)
at com.walmartlabs.mupd8.AppRuntime$$anonfun$22.apply(AppRuntime.scala:227)
at com.walmartlabs.mupd8.HttpServerHandler.handleHttpRequest(HttpServer.scala:117)
at com.walmartlabs.mupd8.HttpServerHandler.messageReceived(HttpServer.scala:100)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndFireMessageReceived(ReplayingDecoder.java:527)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Move 'org.codehaus' to 'com.fasterxml'

In my current project, we moved to using 'com.fasterxml'.
Mupd8 still refers to 'org.codehaus' and conflicting.

I made code changes and following up with Ning Zhang for next steps.

NPE Please FIX it.

Please fix following NPE.

java.lang.NullPointerException
at com.walmartlabs.mupd8.AppRuntime$$anonfun$32.apply(AppRuntime.scala:364)
at com.walmartlabs.mupd8.AppRuntime$$anonfun$32.apply(AppRuntime.scala:364)
at scala.collection.LinearSeqOptimized$class.find(LinearSeqOptimized.scala:100)
at scala.collection.immutable.List.find(List.scala:84)

Thanks,
Bhavesh

Example for setup in a cluster

Hello,
I downloaded mupd8 and did everything in the quickstart tutorial which worked like a charm. Thanks for that. Now I want to continue and execute the example in a cluster environment, but I couldn't find any tutorials for that. So I don't know what I have to change in the config files or in the program itself.
Can anybody help me with this issue?!

Specify the number of task instances

I was able to specify and execute my own data source, map and update tasks. Now I was wondering how to specify the actual number of task instances of each task. I cannot find anything in the config file. Could you please explain this to me?

Load is distributed unequally between nodes

I'm running a mupd8 application on 25 nodes for testing purposes at the moment and I recognized that all map instances are scheduled to only one of these 25 nodes. This results in very poor performance, because my map task are very CPU intensive. While most of the other nodes are executing update tasks, some of them aren't executing anything. I cannot find any reasons for that. Also, no exceptions appear in my logging files.

Further details about my application:
-the key of each tuple is selected randomly
-each node executes mupd8 with 4 worker threads

If you need more details, just let me know.

NPE when start-up followed by unclean shutdown

In addition to that, Mupd8 library does throw NP, due to duplicated raws in DB/or previous unclean shutdown (issue with mupd8 code).

10 Feb 2015 19:16:06,627 ERROR MessageServer - MessageServerThread exception.
java.lang.NullPointerException
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$10.apply(MessageServer.scala:132)
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$10.apply(MessageServer.scala:128)
at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
at com.walmartlabs.mupd8.MessageServer.run(MessageServer.scala:128)
10 Feb 2015 19:16:06,688 ERROR MessageServer - MessageServerThread exception.
java.lang.NullPointerException
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$10.apply(MessageServer.scala:132)
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$10.apply(MessageServer.scala:128)
at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
at com.walmartlabs.mupd8.MessageServer.run(MessageServer.scala:128)

Thanks,
Bhavesh

Example documentation broken: OS 10.9 and compilation errors

Due to changes in external dependencies, it's no longer possible to follow the quick start guide successfully.

Mac OS 10.9 Mavericks introduced changes in bundled applications and there are some deeper dependencies in the Maven POM that need to be modified.

what happens after source reader fails to connect to a socket?

It seems source reader got stuck if failing to connect to s socket. We need to define the behavior and not stop mupd8 itself by this failure.

Following can be used as a test server

!/bin/bash

while true ; do
cat ~/git/mupd8/src/test/resources/testapp/T10.data
sleep 2
done | while true ; do
nc -l 1234
done

NPE in MessageServer Server Thread

2015-01-02 16:21:31,081 [ERROR] c.w.mupd8.MessageServer [MessageServer] - MessageServerThread exception.
java.lang.NullPointerException: null
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$7.apply(MessageServer.scala:107) ~[classes/:na]
at com.walmartlabs.mupd8.MessageServer$$anonfun$run$7.apply(MessageServer.scala:107) ~[classes/:na]
at grizzled.slf4j.Logger.info(slf4j.scala:128) ~[grizzled-slf4j_2.10-1.0.1.jar:1.0.1]
at grizzled.slf4j.Logging$class.info(slf4j.scala:268) ~[grizzled-slf4j_2.10-1.0.1.jar:1.0.1]
at com.walmartlabs.mupd8.MessageServer.info(MessageServer.scala:44) ~[classes/:na]
at com.walmartlabs.mupd8.MessageServer.run(MessageServer.scala:107) ~[classes/:na]

Enable source reader throttling on hot conductor backlogs

I have code that does source reader throttling when a hot conductor is backlogged. However it is not enabled, as it hasn't been tested.

A test will require 2 machines/JVMs each reading from a source. On of the JVMs should backlog an a key and the test must make sure that the source reader on the other JVM slows down.

JSONSource: exception in the socket reader initialization results in readLine() false-negative

If there's an exception thrown in the socket reader initialization (e.g. connection refused), the reader object will be initialized to None. But readLine() assumes the reader object to be properly instantiated at the first initialization, if the reader type is socket (the later re-initialization seems correctly handled by the exponential-wait bit). As a result readLine() produces false-negative for all subsequent hasNext() calls.

A long-term restructure of the SourceReader is still underway, and as it's not coming out soon, I will propose a fix based on current implementation of JONSSource.

The relevant log:

Jun 13, 2013 5:56:02 PM grizzled.slf4j.Logger error
SEVERE: JSONSource: socketReader hit exception
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:208)
    at com.walmartlabs.mupd8.JSONSource.socketReader(SourceReader.scala:75)
    at com.walmartlabs.mupd8.JSONSource.constructReader(SourceReader.scala:59)
    at com.walmartlabs.mupd8.JSONSource.com$walmartlabs$mupd8$JSONSource$$readLine(SourceReader.scala:119)
    at com.walmartlabs.mupd8.JSONSource$$anonfun$hasNext$1.apply(SourceReader.scala:95)
    at com.walmartlabs.mupd8.JSONSource$$anonfun$hasNext$1.apply(SourceReader.scala:94)
    at scala.Option.orElse(Option.scala:218)
    at com.walmartlabs.mupd8.JSONSource.hasNext(SourceReader.scala:93)
    at com.walmartlabs.mupd8.AppRuntime$SourceThread$1$$anonfun$run$3.apply$mcV$sp(Mupd8Main.scala:1221)
    at scala.util.control.Breaks.breakable(Breaks.scala:39)
    at com.walmartlabs.mupd8.AppRuntime$SourceThread$1.run(Mupd8Main.scala:1218)
    at java.lang.Thread.run(Thread.java:722)

Jun 13, 2013 5:56:02 PM grizzled.slf4j.Logger error
SEVERE: JSONSource: reader readLine failed
java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:274)
    at scala.None$.get(Option.scala:272)
    at com.walmartlabs.mupd8.JSONSource.com$walmartlabs$mupd8$JSONSource$$readLine(SourceReader.scala:108)
    at com.walmartlabs.mupd8.JSONSource$$anonfun$hasNext$1.apply(SourceReader.scala:95)
    at com.walmartlabs.mupd8.JSONSource$$anonfun$hasNext$1.apply(SourceReader.scala:94)
    at scala.Option.orElse(Option.scala:218)
    at com.walmartlabs.mupd8.JSONSource.hasNext(SourceReader.scala:93)
    at com.walmartlabs.mupd8.AppRuntime$SourceThread$1$$anonfun$run$3.apply$mcV$sp(Mupd8Main.scala:1221)
    at scala.util.control.Breaks.breakable(Breaks.scala:39)
    at com.walmartlabs.mupd8.AppRuntime$SourceThread$1.run(Mupd8Main.scala:1218)
    at java.lang.Thread.run(Thread.java:722)

External data source

Hello all,

I am trying mupd8 application on a cluster and its working fine with the example given.

What I would like to do, is to have the data source of the application streamed from a different host. The stream of packets contains string type variables. Could someone provide me with example because I tried to edit the configuration file and specify a host that is sending stream of data but it didn't work.

Thanks

Tutorial example not working

mvn archetype:generate
-DinteractiveMode=false
-DarchetypeGroupId=org.apache.maven.archetypes
-DarchetypeArtifactId=maven-archetype-quickstart
-DgroupId=com.walmartlabs.example
-DartifactId=mupd8_demo
-Dpackage=com.walmartlabs.example
-Dversion=1.0-SNAPSHOT

This step gives the following errors

Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:2.2:generate (default-cli) on project mupd8: Unable to add module to the current project as it is not of packaging type 'pom

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.