Comments (7)
I hit the same thing. Removing the synchronized (this.scheduler) block in MesosTracker.idleCheck removed the deadlock, it's calling methods on scheduler that are synchronized on this (context is MesosSchedules here) which is ripe for deadlock. @tarnfeld what are you trying to guard here? It looks like your worried something could be added be the tracker between the idleCounter >= idleCheckMax and the scheduler.killTracker (maybe in assignTasks). I'm going to look into a better way to achieve this.
Also noticed a lot of synchronized methods that just do logging, is this necessary? Seems like a lot of unnecessary blocking.
@windancer055 I'd suggest trying one of the release say 0.0.9 or 0.1.0 I've had good luck with them, though they done have framework auth, but it's easy to back port that.
from hadoop.
hadoop-mapreduce1-project (cdh5.3.2) & mesos-0.24.0
Output from jStack:
...
Found one Java-level deadlock:
"830282351@qtp-1012114812-14":
waiting to lock monitor 0x00002ad34c0294f8 (object 0x00000000fd74bcd8, a org.apache.hadoop.mapred.JobTracker),
which is held by "IPC Server handler 3 on 7676"
"IPC Server handler 3 on 7676":
waiting to lock monitor 0x00002ad350bedec8 (object 0x00000000fd854530, a org.apache.hadoop.mapred.MesosScheduler),
which is held by "pool-1-thread-1"
"pool-1-thread-1":
waiting to lock monitor 0x00002ad34c0294f8 (object 0x00000000fd74bcd8, a org.apache.hadoop.mapred.JobTracker),
which is held by "IPC Server handler 3 on 7676"
Java stack information for the threads listed above:
"830282351@qtp-1012114812-14":
at org.apache.hadoop.mapred.JobTracker.getMapTaskReports(JobTracker.java:3939)
- waiting to lock <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.TaskGraphServlet.doGet(TaskGraphServlet.java:73)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1122)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
"IPC Server handler 3 on 7676":
at org.apache.hadoop.mapred.MesosScheduler.assignTasks(MesosScheduler.java:264)
- waiting to lock <0x00000000fd854530> (a org.apache.hadoop.mapred.MesosScheduler)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2969)
- locked <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:483)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
"pool-1-thread-1":
at org.apache.hadoop.mapred.JobTracker.taskTrackers(JobTracker.java:2595)
- waiting to lock <0x00000000fd74bcd8> (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.MesosTracker$3.run(MesosTracker.java:148)
- locked <0x00000000fd854530> (a org.apache.hadoop.mapred.MesosScheduler)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Found 1 deadlock.
from hadoop.
@tarnfeld what are you trying to guard here? It looks like your worried something could be added be the tracker between the idleCounter >= idleCheckMax and the scheduler.killTracker (maybe in assignTasks). I'm going to look into a better way to achieve this.
@DarinJ Hey! Thanks for taking a look into this, yeah the code/syncronized calls could definitely do with a tidy up. There's a bunch of old logging methods there too as you rightly mentioned that we can probably remove entirely.
Happy to help work on a patch, do you have something started already?
from hadoop.
@tarnfeld I removed the synchronized (this.scheduler) block in MesosTracker.idleCheck no more deadlock. I didn't spend a lot of time testing this and I may have introduced a race condition, I still had the bug I mentioned in #65 which after trying to fix created a situation of killing idle reducers waiting for the shuffle phase.
I've got some ideas on how to fix, and I can write them down for you. But I'm trying to spend more of my time working on Myriad these days.
from hadoop.
Sure if you could share your thoughts even just here in a command that'd be great.
from hadoop.
I stumbled upon this issue too. Did a kill -3
to expose the information about the deadlock. Removed the lines which @DarinJ talked about and it's now at least working.
Attached the diff file.
fix_deadlock_issue_66.txt
from hadoop.
I have a feeling that this issue may now be resolved on master, could you report back @hermansc? I think the commit that introduced this was removed.
from hadoop.
Related Issues (20)
- Skip extracting hadoop distribution on Task_Tracker creation HOT 1
- Is the offering logging misleading? HOT 3
- Map/Reduce slot allocation is not ideal for small clusters HOT 1
- Launch separate TaskTracker instances for Map and Reduce slots
- Support for the new ContainerInfo (and Mesos<>Docker) HOT 4
- Getting hadoop-mesos to work on older hadoop HOT 2
- Running multiple instances HOT 12
- Framework Authentication HOT 1
- Style Enforcement HOT 6
- Hadoop on Mesos uses only one node? HOT 2
- Maven cannot build package HOT 2
- Kill Task reports Finished and then waits for tasks to finish HOT 4
- Can not launch TaskTracker (Error occurred during initialization of VM) HOT 2
- Build failed for mesos with version>=0.27.0 HOT 1
- Can't launched Tasktracker
- The url pseudo distributed operation in configure tab is 404
- Need help in configuring hadoop on mesos cluster HOT 4
- Spark Mesos Docker containerizer cannot run
- Error: Could not find or load main class org.apache.hadoop.mapred.MesosExecutor HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop.