I have integrated rpmalloc into our game-engine and have it working on Windows, XBOX*,

I have confirmed that it is as you suspected <a class="user-mention notranslate" data-

Would love to. Whatever method works - join the discord at <a href="https://discord.gg

Deadlock related to span->free_list_deferred about rpmalloc HOT 7 CLOSED

bbi-michaelanderson commented on September 27, 2024

Deadlock related to span->free_list_deferred

from rpmalloc.

Comments (7)

ak-mdufour commented on September 27, 2024

We noticed that on certain OSes, it's necessary for the acquiring thread to yield, so that another thread may release. We made it work everywhere by transforming this loop (and similar), adding a max iterations count and a second loop with a yield.

from rpmalloc.

mjansson commented on September 27, 2024

I will look into this tonight - but yeah, doing a busy spin and fallback to yield sounds like a reasonable option

from rpmalloc.

bbi-michaelanderson commented on September 27, 2024

I have confirmed that it is as you suspected @ak-mdufour, I was able to reproduce the deadlock and freeze the thread in the debugger (didn't know I could do that till now!) that was spinning blocking the lower priority thread from releasing the lock (core affinity blocked the thread from roaming to another core), with the high pri thread frozen I was able to break the deadlock. Now I just have to find a reliable way to have the scheduler yield to lower priority threads (for all platforms). I tried a 1us sleep but that doesn't seem to be enough time for the lower pri thread to get scheduled (at least on Switch). I could experiment with priority inversion as well, bump the thread priority up when the lock is acquired and down when spinning... For the record I changed the deferred free list to use an explicit atomic 32bit lock (instead of using the pointer as the lock) and then changed all the spin locks to use the following function:

static inline void _rpmalloc_yield() {
#if PLATFORM_PLAYSTATION || PLATFORM_SWITCH || PLATFORM_POSIX
	// Need to sleep for a period of time to allow other lower priority
	// threads a chance to run, don't like this arbitrary time value, ideally
        // we should find a more deterministic way to schedule the lower priority
        // thread. 
	usleep(1);
#elif PLATFORM_WINDOWS
	// Windows will let lower priority threads run for the remainder
	// of this threads time slice when Sleep(0) is used.
	Sleep(0);
#else
#  error "Platform not supported."
#endif
}

static inline void
_rpmalloc_acquire_lock(atomic32_t* lock) {
	// NOTE: (manderson) We could get deadlocked if a thread with a higher
	// priority preempts a lower priority thread that is holding the lock and
	// has a core affinity mask limited to the same core. This should prevent
	// that case by attempting to acquire the lock and periodically yielding
	// to give the lower priority thread a chance to release the lock.
	int64_t const attempts_per_yield = 1000;
	int64_t yields = 0;
	int64_t attempts = 1;
	while (!atomic_cas32_acquire(lock, 1, 0)) {
		_rpmalloc_spin();
		if ((attempts % attempts_per_yield) == 0) {
			yields++;
			_rpmalloc_yield();
		}
		attempts++;
	}
#if ENABLE_STATISTICS
	atomic_add64(&_lock_calls, 1);
	atomic_add64(&_lock_attempts, attempts);
	atomic_add64(&_lock_yields, yields);
	if (attempts > atomic_load64(&_peak_lock_attempts)) {
		atomic_store64(&_peak_lock_attempts, attempts);
	}
#endif
}

from rpmalloc.

mjansson commented on September 27, 2024

Yeah, I can see the issue with thread priority and core affinity ... will have a think about how this could be generalized, I like the approach with spin-then-yield.

(Also, careful about talking about certain platforms and their specific internals, you don't want to be breaking any NDAs here - which is also why a solution in the main repo would have to be platform agnostic)

from rpmalloc.

ak-mdufour commented on September 27, 2024

Sadly yielding offers no strong guarantee that the owning thread will be scheduled; a futex-based solution may be more robust, for platforms that offer it.

from rpmalloc.

bbi-michaelanderson commented on September 27, 2024

I ended up using a mutex. I didn't like the idea that spinning threads could end up taking longer to acquire the lock by arbitrarily backing off till the blocked thread got an opportunity to release the lock. The mutex is per heap so that still gives decent granularity. With that in place I haven't had any other problems, I added some code to periodically cache each heap's deferred spans and trim the cache to tighten up memory usage. @mjansson would you be interested in reviewing my changes, I could PM you.

from rpmalloc.

mjansson commented on September 27, 2024

Would love to. Whatever method works - join the discord at https://discord.gg/njzRV5Q9 or drop me an email at [email protected]

from rpmalloc.

Deadlock related to span->free_list_deferred about rpmalloc HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs