This is a follow up of <a class="issue-link js-issue-link" data-error-text="Failed

I now consider <a class="issue-link js-issue-link" data-error-text="Failed to load tit

In <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id=

Also to note, both branches solve the dead-lock problem in <a class="issue-link js-iss

Supply blocks may reorder flow of execution,about raku/problem-solving

Comments (18)

patrickbkr commented on August 11, 2024 3

I now consider rakudo/rakudo#5202 ready for review. That PR passes make test, make spectest and the original deadlock in rakudo/rakudo#5141. I have turned that deadlocking code into a roast test in Raku/roast#833.

The PR contains one controversial change in behavior:
In patrickbkr/rakudo@5478392 the recursion breaking logic and Lock::Async::protect-or-queue-on-recursion was originally implemented. The commit message states: "A holder's recursion competes fairly with outside messages, thanks to the queueing being through Lock::Async." This is not true anymore, as the queued code objects are now synchronously called and not put on the thread pool anymore.

This issue and the PR are now blocking on feedback / approval. @jnthn I think you have the deepest insight in that area of Raku. Thus I'd like to have your approval before going forward with my changes.

from problem-solving.

patrickbkr commented on August 11, 2024 2

I'd like to make sure the state of this issue is clear.

jnthn has explained that we have to live with occasional reordering. That's known, intentional and we can't do much about it.

So the only problem left to be solved is the deadlock described in rakudo/rakudo#5141. I explored two approaches trying to solve the deadlock issue:

The remove-the-BlockAddWheneverAwaiter-completely approach. This solves the deadlock, but has the effect that recursing emits during whenever setup do not block anymore but are queued. This causes one spec test failure in a test that turned out to depend on non-deterministic behavior. A positive side-effect of this approach is that locks in general do not cause reordering anymore, only recursion that would otherwise deadlock causes reordering. In my opinion this makes it easier to understand what's going on.
The schedule-continuations-on-the-thread-pool approach. This solves the deadlock as well, but it introduces concurrency. One spec test failure made clear that this concurrency is problematic. I think this renders this approach unsuitable.

So to be able to continue working on this issue I'd like to either have:

Some feedback if the remove-the-BlockAddWheneverAwaiter-completely approach has feasability. If yes I could adapt the failing Supply.zip spec test and we could do a Blin run to see if there is more fallout.
Some input on how else I could approach this problem.

So with the above summary I now ask for feedback. Ping @jnthn, @vrurg, @niner

from problem-solving.

jnthn commented on August 11, 2024 1

This is unexpected. The order seems reversed. Why is this so?

A supply block (and thus react) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of the whenever block) before we process the next one. An await inside of a whenever block thus prevents execution of any other whenever block starting until the awaited event completes. (If one wants more concurrency, the answer is to instead use a nested whenever; were await to have different semantics then there'd not be a means to achieve back pressure).

The body of a supply or react block itself is much like an OO constructor: just as we expect the constructor to complete before we have a complete object ready to receive method calls, we expect the setup phase of the supply or react block to run to completion before it processes any other messages.

Thus, the behavior in that example seems as intended.

Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.

Yes, it's important for correctness that the subscription is established right away.

The currently implemented solution for this deadlock problem in rakudo/rakudo@26a9c31 and rakudo/rakudo@5478392 works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.

This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.

The core of the issue really is that supply and react are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit a supply to immediately emit in its body, rather than just subscribe to other underlying sources and emit based upon events from those. The BlockAddWheneverAwaiter approach, as noted in rakudo/rakudo#5141, seems to have shortcomings when one mixes supply blocks with use of a Supplier, for example.

I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard await mechanism take over, primarily on the basis that if supply blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.

from problem-solving.

patrickbkr commented on August 11, 2024

In rakudo/rakudo#5158 I tried replacing the continuation based solution strategy with a much simpler approach of just always executing whenever tapping after the supply block itself has run. That behavior is a lot easier to understand.

It does contradict the syntax though. Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed. Only executing them after having processed the supply block is thus highly counterintuitive.

Additionally it makes code like the following impossible to write:

# Simplified from t/spec/S17-promise/nonblocking-await.t
my $started = Promise.new;
my $ok = start react {
    whenever IO::Socket::Async.listen($s-address, $port) -> $conn {
        whenever $conn.Supply(:bin) -> $buf {
            await $conn.write: $buf;
            $conn.close;
        }
    }
    $started.keep;
}

In that piece of code a Promise is used to track the set up of the react block. When delaying whenever tapping after the processing of the react block, then that $started.keep happens too early.

I thus consider rakudo/rakudo#5158 a non-solution. At the moment I have run out of ideas of how to solve this.

from problem-solving.

patrickbkr commented on August 11, 2024

Thanks for your input!

This is unexpected. The order seems reversed. Why is this so?

A supply block (and thus react) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of the whenever block) before we process the next one. An await inside of a whenever block thus prevents execution of any other whenever block starting until the awaited event completes. (If one wants more concurrency, the answer is to instead use a nested whenever; were await to have different semantics then there'd not be a means to achieve back pressure).

That question was meant as the heading for the following paragraph. But thanks for explaining anyways! ;-)

Thus, the behavior in that example seems as intended.

Which example are you refering to? The one in the first post does trigger the await magic and manages to change the order in which the supply block and tapping are executed.

Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.

Yes, it's important for correctness that the subscription is established right away.

I agree.

The core of the issue really is that supply and react are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit a supply to immediately emit in its body, rather than just subscribe to other underlying sources and emit based upon events from those. The BlockAddWheneverAwaiter approach, as noted in rakudo/rakudo#5141, seems to have shortcomings when one mixes supply blocks with use of a Supplier, for example.

I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard await mechanism take over, primarily on the basis that if supply blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil

I can try implementing that. So that I get it right: I keep the BAWA (BlockAddWheneverAwaiter), but instead of putting the continuations in a list for later processing, I'd just instantly schedule the continuations on the ThreadPoolScheduler (not on the outer $*AWAITER, which would be a BlockingAwaiter and thus block the BAWA itself).

I'm curious if that would solve the problem at hand here and be more robust generally.

I guess that depends on whether we want to be pedantic. If the aspiration is to never have surprising order reversals, then queuing things on the thread pool isn't a step forward. I am confident though that having things in the thread pool will solve rakudo/rakudo#5141 (the issue that made me explore the supply / whenever implementation in the first place). In that deadlock situation a continuation earlier in the list blocks on a lock a later continuation holds. But there is no "A waits on B, B waits on A" situation. So if the other continuations would be allowed to run, things would work out.

It still feels wrong to give locks not being involved in the supply setup the same treatment. I think I'll start another experiment: I'll remove the entirety of the BlockAddWheneverAwaiter and see if the queue-on-recursion logic implemented in rakudo/rakudo@5478392 suffices to prevent deadlocks on construction. I think I have not yet seen a situation where a lock we need to resolve during construction has not been a recursion lock.

from problem-solving.

patrickbkr commented on August 11, 2024

Just removing the BlockAddWheneverAwaiter logic basically means having recursing emits during supply construction instantly return and thus killing back-pressure. All tests and spectests pass except for a single test:

tap-ok Supply.zip(
    Supply.from-list("a".."e"),
    Supply.from-list("f".."k"),
    Supply.from-list("l".."p")
  ),
  [<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
  "zipping with 3 supplies works";

The supply is done before a single value is emitted. That happens, because the first supply gets to emit all its values and done (each being put into the protect-or-queue-on-recursion queue) before the others. That queue is then processed in order causing the zip to be done as soon as all the messages of the first supply have been processed.
Technically it still works as documented (The docs state: "The resulting supply is done as soon as any of the given supplies are done."), but I'm unsure if that behavior is acceptable.

from problem-solving.

patrickbkr commented on August 11, 2024

I have now also tried making the BlockAddWheneverAwaiter queue its continuations on the thread pool. That gets make test pass and make spectest hang in S17-supply/[flat|batch|unique|merge|zip].t, but succeed otherwise.

I have now pushed two branches to GitHub:

https://github.com/patrickbkr/rakudo/tree/supply-whenever-no-bawa That contains a change removing the BlockAddWheneverAwaiter.
https://github.com/patrickbkr/rakudo/tree/supply-whenever-bawa-on-threadpool That contains a change to make the BlockAddWheneverAwaiter queue its continuations on the thread pool instead of collecting and running them later.

from problem-solving.

patrickbkr commented on August 11, 2024

Also to note, both branches solve the dead-lock problem in rakudo/rakudo#5141

from problem-solving.

patrickbkr commented on August 11, 2024

I'll see if I can get the put-continuations-on-thread-pool solution working (i. e. get the hangs to pass).

In general I like the solution removing the BlockAddWheneverAwaiter better. I believe its easier to understand what's going on as only locks of the supplies themself are affected. Also there are no continuations involved.

@jnthn What do you think?

from problem-solving.

patrickbkr commented on August 11, 2024

I found the reason for the hangs with the schedule-continuations-on-the-thread-pool approach. The hanging tests replace $*SCHEDULER with CurrentThreadScheduler which purposely blocks on calls to cue. Since we rely on the scheduler to have our continuations executed concurrently to prevent deadlocks the code can't resolve locks with a CurrentThreadScheduler.

The docs say the CurrentThreadScheduler is useful mostly for testing. So what's the point of having code run with the CurrentThreadScheduler? Is it forbidden to rely on the scheduler to resolve locks?

from problem-solving.

patrickbkr commented on August 11, 2024

With the schedule-continuations-on-the-thread-pool approach there is also a test failure in t/spec/S17-supply/flat.t

my $s = Supplier.new;
my $f = $s.Supply.flat;
my $seen1 = [];
my $t1 = $f.tap( { $seen1.push: $_ } );
$s.emit([1,2]);
is $seen1, [1,2], 'did we get the first emit (1)';

Cause: Emitting on a .flated supply involves setting up a whenever and processing values in src/core.c/Supply-factories.pm6:460. That in turn activates the BlockAddWheneverAwaiter which causes the value to be emitted asynchronously, causing the $s.emit and is in the test to be racy.

Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.

My conclusion from the above analysis is, that the schedule-continuations-on-the-thread-pool approach as I implemented it brings its own set of problems. I don't think these issues can be alleviated.

So the only approach I still have hope in is the remove-the-BlockAddWheneverAwaiter-completely approach I also explored. See this comment for the issue surrounding that approach.

Unless other ideas of approaching this come up, I think the solution space has been exhaustively explored. I'm now dependent on feedback whether we should go forward with one of the two explored approaches.

from problem-solving.

vrurg commented on August 11, 2024

I'm not much into supplies from their implementation side, so not much help on this. But removal of a non-obvious deadlock cause does sound good to me. What's not good is that the spectest mentioned can barely be changed without breaking backward compatibility. Thus I wonder if the option 1, would it be accepted eventually, can be a 6.e change? In which case the spectest would still define 6.c/d behavior and we'd have it different for 6.e.

Though it may also depend on the degree of change required for the spec to pass.

from problem-solving.

patrickbkr commented on August 11, 2024

What's not good is that the spec test mentioned can barely be changed without breaking backward compatibility.

I only partly agree.

tap-ok Supply.zip(
    Supply.from-list("a".."e"),
    Supply.from-list("f".."k"),
    Supply.from-list("l".."p")
  ),
  [<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
  "zipping with 3 supplies works";

The test strongly relies on the supplies being processed round robin, where each supply only gets to emit a single value each round. Supplies never gave any guarantee along those lines. That this works out on the current implementation is because of a delicate implementation detail.

The test looks as if zip is meant to process incoming values as long as at least one supply is still active. But that's different from the current implementation and the documentation.

So I'm confident to say that there is a mismatch between test and implementation. One of the two must be wrong.

I do agree though that the behavior of my proposed changes is sub optimal in this case (processing all values and the done from the first supply before working on the second).

from problem-solving.

niner commented on August 11, 2024

An alternative reading is, that the spec really does require a round-robin processing by Supply.zip and the current implementation of this method is just very fragile and relies on current implementation details of other parts of Supply. So as part of the fix, method zip could be adjusted to deal with the new situation. Maybe by creating a local queue for each of the child supplies and taking one element at a time out of this queue and asking the child supply for more if the queue is empty? This way it wouldn't matter if the child supply emits one or multiple values.

from problem-solving.

jnthn commented on August 11, 2024

The remove-the-BlockAddWheneverAwaiter-completely approach. This solves the deadlock, but has the effect that recursing emits during whenever setup do not block anymore but are queued. This causes one spec test failure in a test that turned out to #364 (comment). A positive side-effect of this approach is that locks in general do not cause reordering anymore, only recursion that would otherwise deadlock causes reordering. In my opinion this makes it easier to understand what's going on.

The schedule-continuations-on-the-thread-pool approach. This solves the deadlock as well, but it introduces concurrency. #364 (comment) made clear that this concurrency is problematic. I think this renders this approach unsuitable.

If I understand option 1 correctly, this is simply saying that it is left to Lock::Async to do the queueing? In that case I think it's effectively achieving what 2 would, but by far more elegant and well-exercised means (e.g. we don't have to build anything new).

So long as we process one message at a time in a given supply and react instance (where the running of the supply or react blody block is considered a message too), then it should be fine.

An alternative reading is, that the spec really does require a round-robin
processing by Supply.zip and the current implementation of this method is just
very fragile and relies on current implementation details of other parts of
Supply. So as part of the fix, method zip could be adjusted to deal with the
new situation.

That's a reasonable reading in my opinion. I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement Supply.zip that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.

from problem-solving.

patrickbkr commented on August 11, 2024

If I understand option 1 correctly, this is simply saying that it is left to Lock::Async to do the queueing?

Yes, that's exactly what happens.

So long as we process one message at a time in a given supply and react instance (where the running of the supply or react blody block is considered a message too), then it should be fine.

From all I know that invariant is not broken. Each supply block is still guarded by a lock and no code path enters that supply block without requesting that lock.

I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement Supply.zip that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.

I have requested a change on that draft PR. With that change make test and make spectest both succeeded. I have now created a PR with those changes in rakudo/rakudo#5202.

from problem-solving.

patrickbkr commented on August 11, 2024

I've merged the proposed fix and respective test in rakudo/rakudo#5202 and Raku/roast#833
We now have a full month time to find any regressions.

from problem-solving.

patrickbkr commented on August 11, 2024

Now that the changes have been merged, what do we do with this issue? Do I need to condense all the comments into some prose and do a problem-solving PR?

from problem-solving.

Supply blocks may reorder flow of execution about problem-solving HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs