Comments (12)
My recollection is that this was added in one of my first bug fixes when learning that code. The original author was on sabbatical and no one else knew the code very well, but a commonly reported footgun were deadlocks in production due to recursive loads. I didn't want to use a ThreadLocal because those had a reputation for performance issues (multiple rewrites). The alternative was to stash the thread id in the loading entry as extra bookkeeping, but that needed care for proper clean up. I advocated for the holdsLock trick because an uncontended lock is free, it had no extra memory (header bit), there were no observed degradations, and it was a very low risk change to unfamiliar code. We then unofficially supported recursive loads despite discouraging them, and future discussions wrt ConcurrentHashMap facing the same problem. (My name might not be on the internal commit bit due to my manager having preferred that our team ships CLs to others to avoid perceptions that 20% projects might distract from the core project's timeline)
Virtual thread support in mutexes is on going work (latest status). It may have been premature to have released the feature in Java 21 without it, as it is very easy to deadlock in JDK only code due to ample usage of synchronization throughout the standard library. The answer isn't to remove all of that and switching to ReentrantLock
was only meant as advice for those wanting to experiment for feedback purposes. I wouldn't use VTs in production until these footguns are resolved, which will likely happen before most are ready to upgrade to 21 anyway.
I still stand by my position in Caffeine that it is not worth the effort and instead the JDK team will solve it soonish. It is good to be aware of, but primarily as a reason to avoid adoption of VTs until they are more robust.
from guava.
No guarantees, but I'd say that opening a PR is likely to be worth your time. I can have a look at it and run some tests. Thanks.
from guava.
The fix is now released as part of 33.0.0.
from guava.
I would be happy to work on a PR for this myself, If someone can explain to me what exactly the synchronized
is currently trying to prevent. The comment suggests that otherwise something can load "recursively". How can one recursively call this method? And how does a synchronized
prevent this, can't it just at best only delay it?
I have been trying to wrap my head around this for some time now (without success), and I can't find a test case that covers this.
from guava.
The fix was made before common.cache was open-sourced, so I can't link to a specific commit for it.
The fix cited #369. It added the following test, which was removed as part of a switch to "white-box" tests, again before the code was open-sourced:
public void testRecursiveDuplicateComputation() throws InterruptedException {
final class RecursiveFunction implements Function<String, String> {
Map<String, String> map;
@Override public String apply(String key) {
return map.get(key);
}
}
final RecursiveFunction computeFunction = new RecursiveFunction();
final ConcurrentMap<String, String> map = new MapMaker()
.makeComputingMap(computeFunction);
computeFunction.map = map;
// tells the test when the compution has completed
final CountDownLatch doneSignal = new CountDownLatch(1);
Thread thread = new Thread() {
@Override public void run() {
try {
map.get("foo");
} finally {
doneSignal.countDown();
}
}
};
thread.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
@Override public void uncaughtException(Thread t, Throwable e) {}
});
thread.start();
boolean done = doneSignal.await(1, TimeUnit.SECONDS);
if (!done) {
StringBuilder builder = new StringBuilder();
for (StackTraceElement trace : thread.getStackTrace()) {
builder.append("\tat ").append(trace).append('\n');
}
fail(builder.toString());
}
}
The diff to the prod code was:
@@ -17,6 +17,7 @@
package com.google.common.collect;
import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
import com.google.common.base.Equivalence;
import com.google.common.base.Function;
@@ -119,7 +120,13 @@
// This thread solely created the entry.
boolean success = false;
try {
- V value = computingValueReference.compute(key, hash);
+ V value = null;
+ // Synchronizes on the entry to allow failing fast when a
+ // recursive computation is detected. This is not full-proof
+ // since the entry may be copied when the segment is written to.
+ synchronized (entry) {
+ value = computingValueReference.compute(key, hash);
+ }
checkNotNull(value, "compute() returned null unexpectedly");
success = true;
return value;
@@ -136,6 +143,7 @@
try {
while (true) {
try {
+ checkState(!Thread.holdsLock(entry), "Recursive computation");
ValueReference<K, V> valueReference = entry.getValueReference();
V value = valueReference.waitForValue();
if (value == null) {
The holdsLock
check still exists. I assume that that's what the lock is for.
Perhaps we should be using a ThreadLocal
instead? (ThreadLocal
also comes up in discussions of "things to avoid when using virtual threads." But I think the concern there is with things like ThreadLocal<byte[]>
that can hold lots of memory. So if all we did was set a boolean and then remove it afterward, we'd likely be fine.)
All that said: Our usual recommendation is to use Caffeine instead of our caches. Caffeine has so far taken the position that it's not worth restructuring the code to support virtual threads because eventually synchronized
won't force threads to be pinned (ben-manes/caffeine#1018, ben-manes/caffeine#860).
I don't know whether we'd act on a PR here or not. On the one hand, I'm shocked that this is the only use of synchronized
in common.cache
, so eliminating it might actually provide meaningful improvements. (I gather from your comment about -Djdk.tracePinnedThreads=full
that you haven't seen other pinning, such an inside JDK libraries that common.cache
uses?) On the other hand, we really don't want to encourage people to move away from Caffeine, nor give the impression that Guava is virtual-thread-friendly on the whole. (Notably, we have one of those ThreadLocal<char>
instances in escape.Platform
.)
from guava.
Ok a couple of things first just so I understand correctly:
- the recursive call is not something that regularly happens. It is always an error, this synchronized is only a fail-fast mechanism and as such doesn't fix anything
- If I were to remove both the
synchronized
and theThread.holdsLock
call everything would work fine, as long as I don't try to load a value recursively. If I would though then I'd get a deadlock
Regarding your question about other pinning @cpovirk : So far I haven't discovered anything else, but I'm still working my way through it.
Regarding the possible fix using a ThreadLocal
: I also think this would work, and I would also the agree that it should not be a problem performance/memory wise using either platform or virtual threads (and since we are on the slow path anyways I at least would not worry too much about it). Even though It pains me a bit to be using a ThreadLocal
, I think in this case, since we can just clean everything up inside the finally this should really be manageable (ThreadLocal
creation and destruction would only be ~10 loc apart). If we just put a static object inside the ThreadLocal
(and then check whether it is there or not) the memory footprint should also be fairly minimal. For what it's worth I think this would be a fairly small fix that would bring a lot of value to everyone who already wants to adopt VirtualThreads
, all of their problems aside.
The only question for me that remains is: would you consider a PR for this? I will probably end up implementing either the ThreadLocal
solution or just throw out the synchronized
in my fork (I am doing this for a thesis, so I don't absolutely need a real released and permanent solution, but I would definitely prefer one).
But I also absolutely understand your point though @ben-manes . I just happen to need a solution now, so I kind of just need to deal with all the quirks that VirtualThreads
currently have ^^.
from guava.
I went ahead and implemented the ThreadLocal
solution in my fork. Unlike I said yesterday though I didn't put a static object inside the ThreadLocal
but the entry itself, because otherwise we would also fail on the (maybe a bit weird) legal case that someone tries to load a different key in their loader.
An edgecase that this implementation would not detect is if one tries to recursively load the same entry, but with another "normal" loader in the middle. In that case the threadlocal would be reset by the second loader, and the third one could no longer determine that loader nr. 1 already tried to load this entry. But this really seems very niche to me.
I also wrote a test for it. Please do let me know If you would like me to submit a PR, or if you have other suggestions or comments.
from guava.
Unfortunately circular recursive calls (a -> b -> a) is not unheard of for longer dependency chains. The recursion is rarely intentional but rather a side-effect of calling an application api that circles back eventually. A fast fail is much better than a deadlock since the developer won’t understand at first what went wrong.
Thus, the ThreadLocal would need to capture the list of loading entries. That’s still not horrible or difficult to implement.
Alternatively the loading reference entry would need to capture the thread. This entry type is temporary and swapped out once complete. That would avoid the entry copying concerns that the doc comment mentions since the copy could capture the thread as needed. This is probably the ideal approach, just a bit more work and think time to avoid regressions.
If you can use Caffeine, then AsyncCache is compatible as it defers the compute to a future so the monitor is held briefly for establishing the mapping. The executor could be set to a VT and the synchronous view used for convenience.
from guava.
@ben-manes I updated my implementation to remeber the loading Thread
like you suggested and added a test for the "proxied recursive load" scenario that this is supposed to fix (and it seems to be working for me locally). I didn't find a copying of the loading entry though, so I didn't adjust anything there.
Also @cpovirk what do you think about this now?
Thanks for the hint, I'm going to have a look at it. But if possible I would still prefer to fix the issue in guava. We use guava caches quite extensively, and I'm guessing that switching to caffeine would require a pretty major rework.
from guava.
Those changes look good. From a quick review it doesn't seem this entry type needs to be copied (copyEntry
/ copyFor
return itself) so presumably the original doc comment may have been an incorrect concern.
Caffeine is mostly a drop-in replacement for Guava's cache with minor differences. The performance difference can be quite stark, see below for a zipfian workload on my macbook with 16 threads and varying Guava's concurrencyLevel (defaults to 4). Whether that translates to realizable gains is application specific, so you might consider capturing a jfr profile to see if the caches remain a bottleneck after your fix.
from guava.
Ok great. Can I open a PR for this now?
Cool. A migration to Caffeine is definitely on our radar and will happen at some point.
from guava.
Related Issues (20)
- Make BloomFilter.bitSize() public
- Btc
- T
- open access to constuct HashBasedTable object for supporting deserializer HOT 4
- Help Jackson with Guava Deserializers HOT 4
- There is no charset parameter on application/json HOT 5
- Support JDK 21 Sequenced Collections HOT 1
- x.y.z-jre version has -android.jar artifact with missing -jre classes HOT 2
- Move graph functionality to a separate module and maven artifact HOT 2
- Found 24 NPEs in guava HOT 1
- Consider reporting errors during encoding in ReaderInputStream HOT 3
- ArrayIndexOutOfBoundsException when creating a EntityManagerFactory HOT 1
- Support for weak values in `Multimap`
- debug android HOT 1
- Addition of Built-In Methods for Primitive Math operations HOT 1
- Could not find error_prone_annotations-2.11.0.jar HOT 1
- Gradle 6.x isn't able to pick right Guava variant HOT 3
- Guava build fails on `master` with JDK 21 HOT 4
- Add action version comments in GitHub workflow files HOT 1
- Supply Chain Security
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from guava.