As I have been working on a GTK backend for McCLIM, I have had regular deadlocks. As I have been investigating the root cause of these deadlocks, I believe I understand what is going on. However, a fix is complicated which is why I'm opening this issue so that I fill in some of the banks in my understanding before I start working on this.
The deadlock happens because I am creating gobject instances in one thread (in this case the repl thread). This results in *foreign-gobjects-lock*
being held. While this lock is being held, it then tries to acquire *gobject-gc-hooks-lock*
. However, this second lock is already held by the GTK thread.
Now, while *gobject-gc-hooks-lock*
was held by the GTK thread, the finaliser kicked in, and the first thing the finaliser tries to do is to acquire *foreign-gobjects-lock*
which is already held by the repl thread, resulting in a deadlock.
The simplest workaround I can think of, which I haven't tried yet, is to merge these two locks into a single one. This should fix this most common cause of this issue. However, it's not a proper solution since the issue could happen with any lock being held while the finaliser is run. *gobject-gc-hooks-lock*
just happens to be the most common one, since it's used very often.
The ideal solution would be to get rid of the lock in the finaliser. This is where I am not able to suggest a solution since I don't fully understand the architecture.
Another issue I have noted is that a lot of the global variables that control these things are accessed without holding any locks. This can read to corrupt data (especially on non-Intel architectures that has a much more relaxed cache guarantees).
Have these issues been discussed in the past?