GithubHelp home page GithubHelp logo

Comments (7)

DarinJ avatar DarinJ commented on June 2, 2024

I hit the same thing. Removing the synchronized (this.scheduler) block in MesosTracker.idleCheck removed the deadlock, it's calling methods on scheduler that are synchronized on this (context is MesosSchedules here) which is ripe for deadlock. @tarnfeld what are you trying to guard here? It looks like your worried something could be added be the tracker between the idleCounter >= idleCheckMax and the scheduler.killTracker (maybe in assignTasks). I'm going to look into a better way to achieve this.

Also noticed a lot of synchronized methods that just do logging, is this necessary? Seems like a lot of unnecessary blocking.

@windancer055 I'd suggest trying one of the release say 0.0.9 or 0.1.0 I've had good luck with them, though they done have framework auth, but it's easy to back port that.

from hadoop.

RecursionTaoist avatar RecursionTaoist commented on June 2, 2024

hadoop-mapreduce1-project (cdh5.3.2) & mesos-0.24.0

Output from jStack:
...

Found one Java-level deadlock:

"830282351@qtp-1012114812-14":
waiting to lock monitor 0x00002ad34c0294f8 (object 0x00000000fd74bcd8, a org.apache.hadoop.mapred.JobTracker),
which is held by "IPC Server handler 3 on 7676"
"IPC Server handler 3 on 7676":
waiting to lock monitor 0x00002ad350bedec8 (object 0x00000000fd854530, a org.apache.hadoop.mapred.MesosScheduler),
which is held by "pool-1-thread-1"
"pool-1-thread-1":
waiting to lock monitor 0x00002ad34c0294f8 (object 0x00000000fd74bcd8, a org.apache.hadoop.mapred.JobTracker),
which is held by "IPC Server handler 3 on 7676"

Java stack information for the threads listed above:

"830282351@qtp-1012114812-14":
at org.apache.hadoop.mapred.JobTracker.getMapTaskReports(JobTracker.java:3939)
- waiting to lock <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.TaskGraphServlet.doGet(TaskGraphServlet.java:73)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1122)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
"IPC Server handler 3 on 7676":
at org.apache.hadoop.mapred.MesosScheduler.assignTasks(MesosScheduler.java:264)
- waiting to lock <0x00000000fd854530> (a org.apache.hadoop.mapred.MesosScheduler)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2969)
- locked <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:483)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
"pool-1-thread-1":
at org.apache.hadoop.mapred.JobTracker.taskTrackers(JobTracker.java:2595)
- waiting to lock <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.MesosTracker$3.run(MesosTracker.java:148)
- locked <0x00000000fd854530> (a org.apache.hadoop.mapred.MesosScheduler)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Found 1 deadlock.

from hadoop.

tarnfeld avatar tarnfeld commented on June 2, 2024

@tarnfeld what are you trying to guard here? It looks like your worried something could be added be the tracker between the idleCounter >= idleCheckMax and the scheduler.killTracker (maybe in assignTasks). I'm going to look into a better way to achieve this.

@DarinJ Hey! Thanks for taking a look into this, yeah the code/syncronized calls could definitely do with a tidy up. There's a bunch of old logging methods there too as you rightly mentioned that we can probably remove entirely.

Happy to help work on a patch, do you have something started already?

from hadoop.

DarinJ avatar DarinJ commented on June 2, 2024

@tarnfeld I removed the synchronized (this.scheduler) block in MesosTracker.idleCheck no more deadlock. I didn't spend a lot of time testing this and I may have introduced a race condition, I still had the bug I mentioned in #65 which after trying to fix created a situation of killing idle reducers waiting for the shuffle phase.

I've got some ideas on how to fix, and I can write them down for you. But I'm trying to spend more of my time working on Myriad these days.

from hadoop.

tarnfeld avatar tarnfeld commented on June 2, 2024

Sure if you could share your thoughts even just here in a command that'd be great.

from hadoop.

hermansc avatar hermansc commented on June 2, 2024

I stumbled upon this issue too. Did a kill -3 to expose the information about the deadlock. Removed the lines which @DarinJ talked about and it's now at least working.

Attached the diff file.
fix_deadlock_issue_66.txt

from hadoop.

tarnfeld avatar tarnfeld commented on June 2, 2024

I have a feeling that this issue may now be resolved on master, could you report back @hermansc? I think the commit that introduced this was removed.

from hadoop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.