Comments (5)
See: https://www.usenix.org/legacy/publications/compsystems/1990/sum_ruane.pdf
Any operating system kernel has some form of process synchronization, allowing a process to wait for a particular condition. The traditional choice for UNIX systems, the event-wait mechanism, leads to race conditions on multiprocessors.
This problem was initially solved in Amdahl's UTS multiprocessing kernel by replacing the event-wait mechanism with Dijkstra semaphores. The kernel, however, became noticeably more complicated and less reliable when based on semaphores.
This has led us to develop a race-free multiprocessor event-wait mechanism with some novel properties. A few common synchronization techniques have emerged, the most complex of which was verified correct with the supertrace protocol validation system spin. A new scheduling approach with per-CPU run queues reduces the number of unnecessary context switches due to awakening all waiting processes. The overall approach is claimed to be simple, efficient, and reliable.
from weave.
After modeling the Backoff mechanism in a formal language and runtime formal verification by model checking, I believe I've narrowed down the deadlock.
There
weave/weave/channels/event_notifiers.nim
Lines 113 to 120 in b4c5a37
the parent worker loads its child worker state in register and exits if its Busy or ShouldWakeup.
But while this happens, the child can send its intent to sleep
weave/weave/channels/event_notifiers.nim
Lines 88 to 95 in b4c5a37
And then follow up with sleeping
weave/weave/channels/event_notifiers.nim
Lines 97 to 106 in b4c5a37
The parent thread then sends its signal to shutdown but it's never received.
Formal specification in TLA+
------------------- MODULE event_notifiers -----------------------
(*
Formal specification of the event_notifiers datastructure.
It allows a single consumer to be put to sleep and being woken up
by multiple producers so that the consumer is able to consume the incoming messages (steal requests).
It combines a prepare phase "intendToSleep" with commit phase "wait".
In between the consumer sends a message to its parent that it is going to sleep.
The commit phase is aborted when any producers signal an incoming message.
There should be no deadlock, i.e. an incoming message being signaled but the consumer stays sleeping
as in the runtime the main thread may be the only one awake and wouldn't be able to awaken its children
in that case.
*)
EXTENDS Integers, TLC, Sequences, FiniteSets
CONSTANTS NumThreads, ConsumerTID
ASSUME NumThreads > 1
ASSUME ConsumerTID > 0
ASSUME ConsumerTID < NumThreads
MaxID == NumThreads-1
ParentTID == ConsumerTID \div 2
producers == (0..MaxID) \ {ConsumerTID, ParentTID}
(* PlusCal options (-termination) *)
(* --algorithm event_notifier
variables
consumerState = "Busy";
signaled = FALSE; \* Simulate a condition variable
msgToParent = "None"; \* Simulate a message to parent. I.e. an opportunity for thread interleaving
macro intendToSleep() begin
consumerState := "IntendToSleep"
end macro
\* Everything in a macro happens atomically
macro atomicCompareExchange(result, current, expected, newVal) begin
if current = expected then
current := newVal;
result := TRUE;
else
result := FALSE;
end if;
end macro;
\* Consumer wait until it is signaled to wakeup
procedure wait()
variables casSuccess = FALSE;
begin
WCAS1: atomicCompareExchange(casSuccess, consumerState, "IntendToSleep", "Parked");
if casSuccess then
W2: while consumerState = "Parked" do
\* The while loop protects against spurious wakeups
WCV3: await signaled;
signaled := FALSE;
end while;
end if;
W4: assert consumerState \in {"Parked", "ShouldWakeup"};
W5: consumerState := "Busy";
W8: return;
end procedure;
\* Notify the potentially waiting consumer that it should wake up
procedure notify()
variables localConsumerState = "N/A"; \* local view of the consumer state
begin
N1: localConsumerState := consumerState;
N2: if localConsumerState \in {"Busy", "ShouldWakeup"} then
N3: return;
end if;
N4: consumerState := "ShouldWakeup";
N5: while TRUE do
NSIG6: signaled := TRUE;
N7: if consumerState /= "Busy" then
N8: skip;
else
N9: return;
end if;
end while;
end procedure;
procedure mayRequestWork()
begin MaySteal:
either
\* Sometimes you have enough work
NoSteal: skip;
or
\* Sometimes you don't and you steal
Steal: call notify();
end either;
ReqRET: return;
end procedure
procedure mayShareWork()
\* A parent can also share work with a
\* a child that sent it "Waiting"
begin MayShare:
either
\* sometimes the parent doesn't have work
NoWork: skip;
or
\* Sometimes it has some
Share0: if msgToParent = "Waiting" then
Share1: call notify(); \* wakeup the child
Share2: msgToParent := "None"; \* dequeue the child steal request
end if;
end either;
ShareRET: return;
end procedure;
\* Not fair because they might never steal
process producer \in producers
begin Coworkers:
call mayRequestWork();
end process;
\* a parent will always run at least the termination
fair process parent = ParentTID
\* The order of work sharing and work stealing is arbitrary
begin ParentWork:
either
PMayRW0: call mayRequestWork();
PMaySW0: call mayShareWork();
or
PMaySW1: call mayShareWork();
PMayRW1: call mayRequestWork();
end either;
\* But it will for sure tell the consumer to terminate at one point
Terminate: call notify();
end process;
process consumer = ConsumerTID
begin ConsumerWork:
either
\* if we have work we work on it
FoundWork: skip;
or
\* we signal our intent to sleep, tell our parent and then sleep
Sleeping0: intendToSleep();
Sleeping1: msgToParent := "Waiting";
Sleeping2: call wait();
end either;
end process;
end algorithm; *)
Error trace (keep an eye on the consumerState and localConsumerState)
from weave.
And it was not it, or it was hiding another one.
Unfortunately it disappear when adding full debug but when adding just debugTermination and looking into 8 happenings, the stacktrace is always the same:
Worker 1 parks
Worker 0 enter the barrier
Worker 0 receives the message
Deadlock
Note that rarely this doesn't lead to a deadlock. This combined with the formal verification in #54 suggest that the deadlock is at the system level, probably some ordering issue between sending a steal request and receiving the waiting message.
For example, between trySteal and declineAll (decline all processes the children wait status):
Lines 121 to 129 in a064692
from weave.
Okay so first of all it's not a deadlock it's a livelock.
Beginning of investigation, here is the stacktrace sampling after a minute of livelock.
Note: those are assumptions to get started with, and test various hypotheses:
-
Before the main entering the barrier, and dispatching tasks, all threads are starved for tasks and so immediately send a "WAITING" request to their parent
-
The main thread enters the barrier and tries to pop tasks.
Lines 89 to 113 in a064692
-
nextTask() is not a simple proc however, it also contains a while loop
Lines 133 to 156 in a064692
dispatchTask() is called repeatedly because there are no splittable tasks in nqueens so we can't conclude at this point if the dequeue is empty or not.
Another important point is thatshareWork
which normally wakeup and calls child worker should be triggered there.Lines 333 to 355 in a064692
We need to know more about that part. SinceshareWork
doesn't appear at all in stacktraces, either it has been inlined, or the call is cheap because the worksharingRequest queue is empty.
This is the only place that can wakeup the child worker in that loop. However that doesn't also explain why the main thread doesn't work on tasks or detect termination. -
It tries to dispatchTasks to the thief
Lines 224 to 246 in a064692
We know from the last branch thattakeTasks
failed to find a task. How come? -
Since it ran out of tasks,
dispatchTasks
calls decline and declineOwnLines 166 to 179 in a064692
Lines 125 to 164 in a064692
Stealing
state, or the master thread didn't flag the worker as waiting yet. We are in the first case. This is proven by the call torecvProxy
which receives steal requests on behalf of the idling worker
-
Then we find victim and relay the steal request. Looking into the underlying targeting, we know that that steal request will always target worker 1 as we take the myID() branch:
Lines 44 to 69 in a064692
Now we need to know when recvProxy
is called within this sync/barrier loop and also where does the main thread as the opportunity to tag runtimeIsQuiescent
or change to Stealing
status. And it's in nextTask() in while recv(req)
Lines 133 to 156 in a064692
In summary, here is the livelock as I understand it:
- Worker 0 sends a steal request with state
Working
, how? why? That can only happen in forceFuture:Lines 301 to 317 in a064692
- In the mean time, Worker 1 resigned and asked to sleep.
- Worker 0 finishes all tasks and proceed to the
sync(Weave)
barrier - It tries to pop up the first
nextTask
and will be livelocked happily ever after:Lines 89 to 103 in a064692
- In
nextTask
Lines 133 to 156 in a064692
The first stealRequest received should be the "Waiting" request of the child, which is consistent with the interleave Worker 1 sends WAITING request -> Worker 0 enters barrier -> Worker 0 receives WAITING request. The master thread flags its child as waiting. - The master thread now rechecks steal request, but now it can check steal requests in its child stead as it knows it is sleeping, so it retrieves its own request in its child channel.
- Remember, the request was sent in ForceFuture with the state flagged as
Working
so indeclineOwn
it's recirculated in the same state:Lines 125 to 164 in a064692
- A job well done, so now let's see if we still have other steal requests to handle, no? let's look at our child then, oh, there is one, oh our own? oh not in Working state, let's recirculate it then, over and over ...
Now questions:
- where does a worker change the state of its own requests from Working to Stealing?
-> In declineAllLines 158 to 166 in a064692
- when does a "runtimeBecomeQuiescent" / the lead thread sends a termination signal
-> only when it receives its own Steal Request in statusStealing
:Lines 125 to 145 in a064692
- are we doomed?
--> yes
Disclaimer: this analysis is not backed by anything but my VTune stacktraces, the code and the unreliable screenshot from the previous post (unreliable because we can't rely on stdout print from interleave thread execution). I may have missed something. In short, I have no log or timestamped call sequences.
from weave.
Fixed in #55 now there is the glibc stuff from #56 left ...
from weave.
Related Issues (20)
- [Request] Optional per-thread startup procedure HOT 1
- `blockUntilAny` equivalent
- GC Boehm Error with more than 8 threads HOT 2
- parallelFor has arguments that do not fit in the parallel tasks data buffer HOT 9
- a plea for an easier to understand error message HOT 3
- Recent Nim makes Weave crash HOT 8
- affinity_posix should get this patch (for Android/Termux)
- @mratsim
- Cache misses (much) worse for page-aligned (e.g. power of two) accesses
- Raytracing example doesnt compile with threads HOT 1
- UX: auto-transform captured seq to ptr UncheckedArray behind the scenes HOT 3
- functions with no arguments fail parsing HOT 1
- Is there a manual or tutorial for weave? HOT 2
- Weave doesn't compile on MacOS with arm64 HOT 2
- Error: undeclared identifier: 'gebb_ukernel_int_x86_AVX512' HOT 2
- Feature Request: Cores as init parameter, respect core perf levels
- Vastly simplify and accelerate memory management. HOT 1
- Use Lnear Congruential Generator to generate work-stealing permutation HOT 1
- randomly "off values" HOT 2
- error: special identifier '_' ignored HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weave.