GithubHelp home page GithubHelp logo

Comments (12)

ben-manes avatar ben-manes commented on June 7, 2024 1

My recollection is that this was added in one of my first bug fixes when learning that code. The original author was on sabbatical and no one else knew the code very well, but a commonly reported footgun were deadlocks in production due to recursive loads. I didn't want to use a ThreadLocal because those had a reputation for performance issues (multiple rewrites). The alternative was to stash the thread id in the loading entry as extra bookkeeping, but that needed care for proper clean up. I advocated for the holdsLock trick because an uncontended lock is free, it had no extra memory (header bit), there were no observed degradations, and it was a very low risk change to unfamiliar code. We then unofficially supported recursive loads despite discouraging them, and future discussions wrt ConcurrentHashMap facing the same problem. (My name might not be on the internal commit bit due to my manager having preferred that our team ships CLs to others to avoid perceptions that 20% projects might distract from the core project's timeline)

Virtual thread support in mutexes is on going work (latest status). It may have been premature to have released the feature in Java 21 without it, as it is very easy to deadlock in JDK only code due to ample usage of synchronization throughout the standard library. The answer isn't to remove all of that and switching to ReentrantLock was only meant as advice for those wanting to experiment for feedback purposes. I wouldn't use VTs in production until these footguns are resolved, which will likely happen before most are ready to upgrade to 21 anyway.

I still stand by my position in Caffeine that it is not worth the effort and instead the JDK team will solve it soonish. It is good to be aware of, but primarily as a reason to avoid adoption of VTs until they are more robust.

from guava.

cpovirk avatar cpovirk commented on June 7, 2024 1

No guarantees, but I'd say that opening a PR is likely to be worth your time. I can have a look at it and run some tests. Thanks.

from guava.

cpovirk avatar cpovirk commented on June 7, 2024 1

The fix is now released as part of 33.0.0.

from guava.

cortlepp-intershop avatar cortlepp-intershop commented on June 7, 2024

I would be happy to work on a PR for this myself, If someone can explain to me what exactly the synchronized is currently trying to prevent. The comment suggests that otherwise something can load "recursively". How can one recursively call this method? And how does a synchronized prevent this, can't it just at best only delay it?

I have been trying to wrap my head around this for some time now (without success), and I can't find a test case that covers this.

from guava.

cpovirk avatar cpovirk commented on June 7, 2024

The fix was made before common.cache was open-sourced, so I can't link to a specific commit for it.

The fix cited #369. It added the following test, which was removed as part of a switch to "white-box" tests, again before the code was open-sourced:

    public void testRecursiveDuplicateComputation() throws InterruptedException {
      final class RecursiveFunction implements Function<String, String> {
        Map<String, String> map;
        @Override public String apply(String key) {
          return map.get(key);
        }
      }
      final RecursiveFunction computeFunction = new RecursiveFunction();
      final ConcurrentMap<String, String> map = new MapMaker()
          .makeComputingMap(computeFunction);
      computeFunction.map = map;

      // tells the test when the compution has completed
      final CountDownLatch doneSignal = new CountDownLatch(1);

      Thread thread = new Thread() {
        @Override public void run() {
          try {
            map.get("foo");
          } finally {
            doneSignal.countDown();
          }
        }
      };
      thread.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
        @Override public void uncaughtException(Thread t, Throwable e) {}
      });
      thread.start();

      boolean done = doneSignal.await(1, TimeUnit.SECONDS);
      if (!done) {
        StringBuilder builder = new StringBuilder();
        for (StackTraceElement trace : thread.getStackTrace()) {
          builder.append("\tat ").append(trace).append('\n');
        }
        fail(builder.toString());
      }
    }

The diff to the prod code was:

@@ -17,6 +17,7 @@
 package com.google.common.collect;
 
 import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
 
 import com.google.common.base.Equivalence;
 import com.google.common.base.Function;
@@ -119,7 +120,13 @@
             // This thread solely created the entry.
             boolean success = false;
             try {
-              V value = computingValueReference.compute(key, hash);
+              V value = null;
+              // Synchronizes on the entry to allow failing fast when a
+              // recursive computation is detected. This is not full-proof
+              // since the entry may be copied when the segment is written to.
+              synchronized (entry) {
+                value = computingValueReference.compute(key, hash);
+              }
               checkNotNull(value, "compute() returned null unexpectedly");
               success = true;
               return value;
@@ -136,6 +143,7 @@
         try {
           while (true) {
             try {
+              checkState(!Thread.holdsLock(entry), "Recursive computation");
               ValueReference<K, V> valueReference = entry.getValueReference();
               V value = valueReference.waitForValue();
               if (value == null) {

The holdsLock check still exists. I assume that that's what the lock is for.

Perhaps we should be using a ThreadLocal instead? (ThreadLocal also comes up in discussions of "things to avoid when using virtual threads." But I think the concern there is with things like ThreadLocal<byte[]> that can hold lots of memory. So if all we did was set a boolean and then remove it afterward, we'd likely be fine.)

All that said: Our usual recommendation is to use Caffeine instead of our caches. Caffeine has so far taken the position that it's not worth restructuring the code to support virtual threads because eventually synchronized won't force threads to be pinned (ben-manes/caffeine#1018, ben-manes/caffeine#860).

I don't know whether we'd act on a PR here or not. On the one hand, I'm shocked that this is the only use of synchronized in common.cache, so eliminating it might actually provide meaningful improvements. (I gather from your comment about -Djdk.tracePinnedThreads=full that you haven't seen other pinning, such an inside JDK libraries that common.cache uses?) On the other hand, we really don't want to encourage people to move away from Caffeine, nor give the impression that Guava is virtual-thread-friendly on the whole. (Notably, we have one of those ThreadLocal<char> instances in escape.Platform.)

from guava.

cortlepp-intershop avatar cortlepp-intershop commented on June 7, 2024

Ok a couple of things first just so I understand correctly:

  • the recursive call is not something that regularly happens. It is always an error, this synchronized is only a fail-fast mechanism and as such doesn't fix anything
  • If I were to remove both the synchronized and the Thread.holdsLock call everything would work fine, as long as I don't try to load a value recursively. If I would though then I'd get a deadlock

Regarding your question about other pinning @cpovirk : So far I haven't discovered anything else, but I'm still working my way through it.

Regarding the possible fix using a ThreadLocal: I also think this would work, and I would also the agree that it should not be a problem performance/memory wise using either platform or virtual threads (and since we are on the slow path anyways I at least would not worry too much about it). Even though It pains me a bit to be using a ThreadLocal, I think in this case, since we can just clean everything up inside the finally this should really be manageable (ThreadLocal creation and destruction would only be ~10 loc apart). If we just put a static object inside the ThreadLocal (and then check whether it is there or not) the memory footprint should also be fairly minimal. For what it's worth I think this would be a fairly small fix that would bring a lot of value to everyone who already wants to adopt VirtualThreads, all of their problems aside.

The only question for me that remains is: would you consider a PR for this? I will probably end up implementing either the ThreadLocal solution or just throw out the synchronized in my fork (I am doing this for a thesis, so I don't absolutely need a real released and permanent solution, but I would definitely prefer one).

But I also absolutely understand your point though @ben-manes . I just happen to need a solution now, so I kind of just need to deal with all the quirks that VirtualThreads currently have ^^.

from guava.

cortlepp-intershop avatar cortlepp-intershop commented on June 7, 2024

I went ahead and implemented the ThreadLocal solution in my fork. Unlike I said yesterday though I didn't put a static object inside the ThreadLocal but the entry itself, because otherwise we would also fail on the (maybe a bit weird) legal case that someone tries to load a different key in their loader.

An edgecase that this implementation would not detect is if one tries to recursively load the same entry, but with another "normal" loader in the middle. In that case the threadlocal would be reset by the second loader, and the third one could no longer determine that loader nr. 1 already tried to load this entry. But this really seems very niche to me.

I also wrote a test for it. Please do let me know If you would like me to submit a PR, or if you have other suggestions or comments.

from guava.

ben-manes avatar ben-manes commented on June 7, 2024

Unfortunately circular recursive calls (a -> b -> a) is not unheard of for longer dependency chains. The recursion is rarely intentional but rather a side-effect of calling an application api that circles back eventually. A fast fail is much better than a deadlock since the developer won’t understand at first what went wrong.

Thus, the ThreadLocal would need to capture the list of loading entries. That’s still not horrible or difficult to implement.

Alternatively the loading reference entry would need to capture the thread. This entry type is temporary and swapped out once complete. That would avoid the entry copying concerns that the doc comment mentions since the copy could capture the thread as needed. This is probably the ideal approach, just a bit more work and think time to avoid regressions.

If you can use Caffeine, then AsyncCache is compatible as it defers the compute to a future so the monitor is held briefly for establishing the mapping. The executor could be set to a VT and the synchronous view used for convenience.

from guava.

cortlepp-intershop avatar cortlepp-intershop commented on June 7, 2024

@ben-manes I updated my implementation to remeber the loading Thread like you suggested and added a test for the "proxied recursive load" scenario that this is supposed to fix (and it seems to be working for me locally). I didn't find a copying of the loading entry though, so I didn't adjust anything there.
Also @cpovirk what do you think about this now?

Thanks for the hint, I'm going to have a look at it. But if possible I would still prefer to fix the issue in guava. We use guava caches quite extensively, and I'm guessing that switching to caffeine would require a pretty major rework.

from guava.

ben-manes avatar ben-manes commented on June 7, 2024

Those changes look good. From a quick review it doesn't seem this entry type needs to be copied (copyEntry / copyFor return itself) so presumably the original doc comment may have been an incorrect concern.

Caffeine is mostly a drop-in replacement for Guava's cache with minor differences. The performance difference can be quite stark, see below for a zipfian workload on my macbook with 16 threads and varying Guava's concurrencyLevel (defaults to 4). Whether that translates to realizable gains is application specific, so you might consider capturing a jfr profile to see if the caches remain a bottleneck after your fix.

Screenshot 2023-11-24 at 1 21 07 PM

from guava.

cortlepp-intershop avatar cortlepp-intershop commented on June 7, 2024

Ok great. Can I open a PR for this now?

Cool. A migration to Caffeine is definitely on our radar and will happen at some point.

from guava.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.