GithubHelp home page GithubHelp logo

Comments (2)

matthiasmullie avatar matthiasmullie commented on May 27, 2024

Thanks for spotting this, and apologies for not having seen this & responding earlier.

There was, however, a fix submitted long ago WRT the change in exists return value: 31f4578 (in a way that remains compatible with older Redis versions)

And another nice catch with the StampedeProtector TTL! It's a bit of an annoying one to land, because there's no way to "fix it right".

Changing the constructor from milliseconds to seconds may have a big effect for those who already pass in a milliseconds value. For the most part, this is probably not a huge deal. The locks will be there for a (too) long time, but once another process has completed & stored the value, the long lock becomes irrelevant. I.e., most currently don't ever notice.
If we were to change that to seconds, their current milliseconds-input may have a significant effect: they would now find their code sleep for a really long time before polling again (e.g. 100 seconds rather than 0.1), and may end up with too many concurrent processes. Basically: we can't move towards seconds without expecting users to update their calling code, so this would effectively be a breaking change.

OTOH, sticking with milliseconds is simply wrong.

I'll merge the fix now that I'm about to roll out a new major release.

from scrapbook.

matthiasmullie avatar matthiasmullie commented on May 27, 2024

After thinking things over some more last night, I think we should stick with milliseconds.
2 things are affected by that time:

  • it determines how long a protective lock is kept (to signal to other processes that the value is already being worked on)
  • it determines how long these other processes "sleep" in order to wait for that value to become available

Seconds have no benefit over milliseconds, but there is one the other way around: it becomes possible to acquire a shorter lock/protective time. The "ideal" lock time is essentially determined by 2 variable factors:

  • how fast another process is able to compute & cache the new value (too short of a lock and the stampede goes through)
  • how many other processes can remain open simultaneously until the value becomes available (too long = too many processes & they fail) - while those processes are mostly just idling (which is still a step up compared to no stampede protected - at least they're not all doing intensive work), there may end up being too many of them, depending on available infrastructure (e.g. Apache2 defaults to only allowing 150 concurrent connections)

When a stampede happens, it is very likely that there's a high number of incoming requests. And the average response time for a relatively responsive application is usually not over a couple hundred milliseconds. Ergo, being able to set a sub-second TTL may be important to achieve a good balance between both factors in certain applications. Usually, a 1-or-more-seconds TTL will be just fine, but that will remain possible with milliseconds support as well.

But the catch: we can't acquire a sub-second lock.
The only thing we can do is basically rounding up to the nearest second. And... that's just fine!

Imagine a sub-second lock TTL (e.g. 500ms).

  • 0ms: Request 1 comes in, data is not available, creates a lock for 1s.
  • 25ms: Request 2 comes, data is not available, but there's a lock. Waits a bit.
  • 75ms: Request 2 polls the cache; still not available. Waits a bit.
  • 125ms: Request 2 polls the cache; still not available. Waits a bit.
  • 150ms: Request 3 comes in, data is not available, but there's a lock. Waits a bit.
  • 175ms: Request 2 polls the cache; still not available. Waits a bit.
  • 200ms: Request 3 polls the cache; still not available. Waits a bit.
  • 210ms: Request 1 completes & stores data in cache (lock remains). Wraps out remaining work & completes.
  • 225ms: Request 2 polls the cache; data is now available. Wraps up remaining work & completes.
  • 250ms: Request 3 polls the cache; data is now available. Wraps up remaining work & completes.
  • 400ms: Request 4 comes in, data is available. Wraps up remaining work & completes. Lock is still around, but didn't matter in this case.
  • 1000ms: Lock disappears.

The only case where "the lock sticking around for longer than it was supposed to" becomes a problem, is when request 1 (the one that created the lock & was supposed to store the new data) didn't complete its job (e.g. crashed).
If that happens, all new requests coming in are still assuming that some other process is working on it (because there is a lock file), when it fact that's not the case. It would be better for the lock file to be removed, so that another process can pick up the work.
That doesn't change with moving from milliseconds to seconds, though; then, too, those other processes would be stuck waiting out the remainder of the second.

(Of course, it's worse in the current incorrect implementation - the lock is held for much, much longer; request 1 not being able to fulfill its job has longer-lasting impact, and that needs to be fixed)

I'm going to stick with a milliseconds SLA, but will fix the lock time so that it's holding it for the "correct" (rounding up the milleseconds to a second) time.

Does that make sense?

from scrapbook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.