GithubHelp home page GithubHelp logo

Comments (7)

ak-mdufour avatar ak-mdufour commented on September 27, 2024

We noticed that on certain OSes, it's necessary for the acquiring thread to yield, so that another thread may release. We made it work everywhere by transforming this loop (and similar), adding a max iterations count and a second loop with a yield.

from rpmalloc.

mjansson avatar mjansson commented on September 27, 2024

I will look into this tonight - but yeah, doing a busy spin and fallback to yield sounds like a reasonable option

from rpmalloc.

bbi-michaelanderson avatar bbi-michaelanderson commented on September 27, 2024

I have confirmed that it is as you suspected @ak-mdufour, I was able to reproduce the deadlock and freeze the thread in the debugger (didn't know I could do that till now!) that was spinning blocking the lower priority thread from releasing the lock (core affinity blocked the thread from roaming to another core), with the high pri thread frozen I was able to break the deadlock. Now I just have to find a reliable way to have the scheduler yield to lower priority threads (for all platforms). I tried a 1us sleep but that doesn't seem to be enough time for the lower pri thread to get scheduled (at least on Switch). I could experiment with priority inversion as well, bump the thread priority up when the lock is acquired and down when spinning... For the record I changed the deferred free list to use an explicit atomic 32bit lock (instead of using the pointer as the lock) and then changed all the spin locks to use the following function:

static inline void _rpmalloc_yield() {
#if PLATFORM_PLAYSTATION || PLATFORM_SWITCH || PLATFORM_POSIX
	// Need to sleep for a period of time to allow other lower priority
	// threads a chance to run, don't like this arbitrary time value, ideally
        // we should find a more deterministic way to schedule the lower priority
        // thread. 
	usleep(1);
#elif PLATFORM_WINDOWS
	// Windows will let lower priority threads run for the remainder
	// of this threads time slice when Sleep(0) is used.
	Sleep(0);
#else
#  error "Platform not supported."
#endif
}

static inline void
_rpmalloc_acquire_lock(atomic32_t* lock) {
	// NOTE: (manderson) We could get deadlocked if a thread with a higher
	// priority preempts a lower priority thread that is holding the lock and
	// has a core affinity mask limited to the same core. This should prevent
	// that case by attempting to acquire the lock and periodically yielding
	// to give the lower priority thread a chance to release the lock.
	int64_t const attempts_per_yield = 1000;
	int64_t yields = 0;
	int64_t attempts = 1;
	while (!atomic_cas32_acquire(lock, 1, 0)) {
		_rpmalloc_spin();
		if ((attempts % attempts_per_yield) == 0) {
			yields++;
			_rpmalloc_yield();
		}
		attempts++;
	}
#if ENABLE_STATISTICS
	atomic_add64(&_lock_calls, 1);
	atomic_add64(&_lock_attempts, attempts);
	atomic_add64(&_lock_yields, yields);
	if (attempts > atomic_load64(&_peak_lock_attempts)) {
		atomic_store64(&_peak_lock_attempts, attempts);
	}
#endif
}

from rpmalloc.

mjansson avatar mjansson commented on September 27, 2024

Yeah, I can see the issue with thread priority and core affinity ... will have a think about how this could be generalized, I like the approach with spin-then-yield.

(Also, careful about talking about certain platforms and their specific internals, you don't want to be breaking any NDAs here - which is also why a solution in the main repo would have to be platform agnostic)

from rpmalloc.

ak-mdufour avatar ak-mdufour commented on September 27, 2024

Sadly yielding offers no strong guarantee that the owning thread will be scheduled; a futex-based solution may be more robust, for platforms that offer it.

from rpmalloc.

bbi-michaelanderson avatar bbi-michaelanderson commented on September 27, 2024

I ended up using a mutex. I didn't like the idea that spinning threads could end up taking longer to acquire the lock by arbitrarily backing off till the blocked thread got an opportunity to release the lock. The mutex is per heap so that still gives decent granularity. With that in place I haven't had any other problems, I added some code to periodically cache each heap's deferred spans and trim the cache to tighten up memory usage. @mjansson would you be interested in reviewing my changes, I could PM you.

from rpmalloc.

mjansson avatar mjansson commented on September 27, 2024

Would love to. Whatever method works - join the discord at https://discord.gg/njzRV5Q9 or drop me an email at [email protected]

from rpmalloc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.