GithubHelp home page GithubHelp logo

The MPSC channel has a deadlock about weave HOT 5 CLOSED

mratsim avatar mratsim commented on June 1, 2024
The MPSC channel has a deadlock

from weave.

Comments (5)

mratsim avatar mratsim commented on June 1, 2024

Not sure what caused the deadlock as it's just a streamlined implementation of the legacy channels, could it have been false sharing?
Anyway, unused after #21 and e41c927

from weave.

aprell avatar aprell commented on June 1, 2024

Is this deadlock reproducible with the C version?

from weave.

mratsim avatar mratsim commented on June 1, 2024

No it's only my own channel reimplementation.

When I switched to the 1-to-1 reimplementation of your Nim channels (04c355d) it disappeared.

from weave.

mratsim avatar mratsim commented on June 1, 2024

So I manage to get it again while changing the implementation of the MPSC queue to yet another one lock-free queue (suitable for batching for my memory manager)

The main difference is that it uses compare-and-swap which are more heavy on cache lines so maybe behave worse under contention. Steal backoff might help here

It's probably a livelock and not a deadlock, it may be unrelated but it's very annoying nonetheless.

It happens reliably even on just a hello world

weave/weave/async.nim

Lines 138 to 151 in e3a1f29

proc display_int(x: int) =
stdout.write(x)
stdout.write(" - SUCCESS\n")
proc main() =
init(Runtime)
spawn display_int(123456)
spawn display_int(654321)
sync(Runtime)
exit(Runtime)
# main()

The main thread always get stucks in this loop in the runtime barrier

weave/weave/runtime.nim

Lines 124 to 138 in 9f4abc4

# 2. Run out-of-task, become a thief and help other threads
# to reach the barrier faster
debug: log("Worker %d: globalsync 2 - becoming a thief\n", myID())
trySteal(isOutOfTasks = true)
ascertain: myThefts().outstanding > 0
var task: Task
profile(idle):
while not recv(task, isOutOfTasks = true):
ascertain: myWorker().deque.isEmpty()
ascertain: myThefts().outstanding > 0
declineAll()
if localCtx.runtimeIsQuiescent:
# Goto breaks profiling, but the runtime is still idle
break EmptyLocalQueue

adding some printing in declining or the steal request relay (or uncommenting the debugTermination log) will unstuck the runtime for a 20% perf loss :P.

proc declineAll*() =
var req: StealRequest
profile_stop(idle)
if recv(req):
if req.thiefID == myID() and req.state == Working:
req.state = Stealing
decline(req)
profile_start(idle)

weave/weave/thieves.nim

Lines 96 to 108 in 9f4abc4

proc findVictimAndRelaySteal*(req: sink StealRequest) =
# Note:
# Nim manual guarantees left-to-right function evaluation.
# Hence in the following:
# `req.findVictim().relaySteal(req)`
# findVictim request update should be done before relaySteal
#
# but C and C++ do not provides that guarantee
# and debugging that in a multithreading runtime
# would probably be very painful.
let target = findVictim(req)
debugTermination: log("Worker %d: relay steal request to %d from %d\n", myID(), target, req.thiefID)
target.relaySteal(req)

from weave.

mratsim avatar mratsim commented on June 1, 2024

False alarm, it's my queue that has an issue, I can reproduce the bug in an isolated bench.

I think I have an ABA problem with the producers' side using exchange and the consumer side using compare-exchange.

from weave.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.