-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Fix concurrency bug in NamespacedHierarchicalStore.computeIfAbsent
#5209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pankraz76
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great fix. Precisely executed and documented without any flaw, thanks a lot.
+1
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
🔎 No tests executed 🔎🏷️ Commit: 32faf28 Learn more about TestLens at testlens.app. |
|
You're welcome @Pankraz76, thanks as well for the praise and the review, I really appreciate it! :) |
Pankraz76
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 feat. complete.
Well done, thanks again for dedication leading to incrementation.
Now its just about polish, giving optional potential dedication - striving for excellence.
But also this is danger land, might better to extract this into clean PR afterwards. Scout principle is nice, still tend ppl. to tilt on this, likely to be overwhelmed.
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Show resolved
Hide resolved
Pankraz76
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2cts.
going fully functional separating the concerns (SoC/SRP).
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Outdated
Show resolved
Hide resolved
|
You’re welcome, and thanks again, @Pankraz76, for taking another careful look. |
mpkorstanje
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given this a quick read through and left some comments to resolve open questions, but this is not a full review yet.
It seems that you've found and solved a different problem than described in #5171. And you claim the solutions overlap.
Unfortunately the original problem is quite tricky and the description for this pull request incredibly verbose. We'll need some time to go through the details. You can help us process this by writing a much more concise PR description.
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Show resolved
Hide resolved
...ngine/src/main/java/org/junit/platform/engine/support/store/NamespacedHierarchicalStore.java
Show resolved
Hide resolved
| block each other and temporarily see a missing or incorrectly initialized state | ||
| for values created via `computeIfAbsent`. The method now evaluates | ||
| `defaultCreator` outside the critical section using a memoizing supplier, | ||
| aligning its behavior with the deprecated `getOrComputeIfAbsent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write this down more concisely? The release notes generally focus on a top-line understanding of what was fixed. You could express this as having solved the symptoms of #5209 rather than its root cause.
You could for clarity also add a second item that describes how computeIfAbsent no longer deadlocks.
| } | ||
|
|
||
| @Test | ||
| void computeIfAbsentCanDeadlockWithCollidingKeys() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of this test suggests that computeIfAbsent can currently deadlock. But I assume that after your fix this is no longer the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. After the fix, this scenario should no longer deadlock. I renamed it to computeIfAbsentDoesNotDeadlockWithCollidingKeys
mpkorstanje
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've already found one reason this will not work as expected. See comment below.
simulateRaceConditionInComputeIfAbsent did not catch this issue because it only exercises contention on a single key and relies on ConcurrentHashMap's per-key atomicity; it does not force different keys into the same bucket or run user code in a way that re-enters the store while the map lock is held, so the problematic interaction never occurs in that test
With this in mind, I would have expected to see a test like simulateRaceConditionInComputeIfAbsent that forces keys into the same bucket.
| return requireNonNull(newStoredValue.evaluate()); | ||
| } | ||
| catch (Throwable t) { | ||
| storedValues.remove(compositeKey, newStoredValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a period of time between storedValues.compute() and storedValues.remove() where a different thread via getStoredValue() can briefly access the newStoredValue and encounter its stored exception. As such the stores operations are not atomic.
And I think this invalidates any approach that tries to avoid execution of the defaultCreator outside the compute method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think this invalidates any approach that tries to avoid execution of the defaultCreator outside the compute method.
Both implementations now address this concern:
Approach 1 (MemoizingSupplier): Added transientFailures flag. When true, get() returns null instead of throwing during the transient window.
Approach 2 (DeferredSupplier): Uses type-based dispatch. DeferredSupplier.get() catches ExecutionException and returns null, while getOrThrow() rethrows for the original computeIfAbsent caller.
Both ensure:
defaultCreatorruns outside the bucket lock (avoiding blocking/deadlock)get()returnsnullduring the transient window (not the exception)- The original
computeIfAbsentcaller sees the exception and cleans up - After cleanup,
get()continues to returnnull(correct final state)
Tests added:
getDoesNotSeeTransientExceptionFromComputeIfAbsentgetConcurrentWithFailingComputeIfAbsentDoesNotSeeException
| var result = StoredValue.evaluateIfNotNull(storedValue); | ||
| if (result == null) { | ||
| StoredValue newStoredValue = this.storedValues.compute(compositeKey, (__, oldStoredValue) -> { | ||
| if (StoredValue.evaluateIfNotNull(oldStoredValue) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your analysis you said:
In NamespacedHierarchicalStore#computeIfAbsent, the implementation previously relied on ConcurrentMap.computeIfAbsent, which provides the "one logical initialization per key" behavior. After the change to storedValues.compute(…), every call to NamespacedHierarchicalStore.computeIfAbsent for the same key can rerun the initialization logic and replace the existing StoredValue.
That means that even though each compute call is atomic, two threads calling NamespacedHierarchicalStore.computeIfAbsent for the same key can:
- Have Thread A initialize the stored value and start tracking statistics.
- Then have Thread B rerun the initialization and replace that value, effectively resetting the statistics.
But looking at the existing implementation, the defaultCreator is not applied until after the oldStoredValue has been checked. So when defaultCreator is applied for a given key a value was either not set at all or that value was set and set not null. So on the face of it the defaultCreator should be applied at most once and point 2 shouldn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that the previous implementation checked oldStoredValue before applying defaultCreator. The issue wasn't about re-running initialization for the same key, but about where defaultCreator executes.
The previous implementation ran defaultCreator.apply(key) inside the compute() lambda while holding the bucket lock. Even though the check prevents double-initialization, any thread trying to access a different key in the same bucket is blocked until defaultCreator completes.
The fix moves defaultCreator execution outside the lock, so the map operation is fast and doesn't block other bucket operations.
| return defaultCreator.apply(key); | ||
| }))); | ||
| } | ||
| return storedValue.evaluate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be the same problem as #5209 (comment). The result of a failing defaultCreator can be seen through get. This behaviour is guarded against for computeIfAbsent and verify with tests.
Note to self: Write some more tests to cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed together with #5209 (comment). Both implementations ensure that:
- The caller of
computeIfAbsentsees the exception - Other callers via
get()seenull
This preserves the expected semantics: after computeIfAbsent fails and removes the entry, get() returns null.
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
…fAbsent(Object, Object, Function)` junit-team#5171 junit-team#5209 Signed-off-by: Vincent Potucek <[email protected]>
Added
This test fails with the old implementation (creator called 10x instead of 1x) and passes with the fix. I also added |
|
Thanks for the initial review @mpkorstanje! I apologize for being a bit verbose on both the release notes and the description of the PR. I now shortened both of them, I hope I didn't shorten them too much - let me know if I should make any changes there. I also addressed your comments by adding both tests to reproduce the issues you mentioned and also addressed them. I first implemented an approach but didn't like some aspects, so I tried another approach. Both approaches ended up working, and since I'm not sure which one you would prefer I decided to push both as separate commits, so I could later drop the one you don't like.
Both approaches:
Trade-offs:
Please let me know which approach you prefer, and I'll drop the other commit. |
`computeIfAbsent` previously invoked `defaultCreator` while holding the store's internal map lock. Under parallel execution this could cause threads using the store to block each other and temporarily see missing or incorrectly initialized state for values created via `computeIfAbsent`. The implementation now wraps `defaultCreator` in a `MemoizingSupplier` and installs that supplier via the map operation, evaluating it only after the update has completed. This avoids running user code while holding the lock and aligns `computeIfAbsent`'s behavior with the deprecated `getOrComputeIfAbsent`, preserving the intended "one initialization per key" semantics. Issue: junit-team#5171 Signed-off-by: martinfrancois <[email protected]>
Signed-off-by: martinfrancois <[email protected]>
Signed-off-by: martinfrancois <[email protected]>
Signed-off-by: martinfrancois <[email protected]>
81d673b to
32faf28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think you're going in the right direction.
But I don't think the CAS loop is a good solution. It means that we're not taking advantage of the existing data structure.
Rather, I think it would be important to recognize and make explicit that there are 4 type of stored values. Absent (if nothing was stored yet), Constant (from put), Memoized (from getOrComputeIfAbsent) and Defered (from computeIfAbsent). These are containers that may have the following contents:
| Type | Value | Null | Exception |
|---|---|---|---|
| Absent | No | No | No |
| Constant | Maybe | Maybe | No |
| Memoized | Maybe | Maybe | Maybe |
| Defered | Maybe | No | Maybe |
These can be directly stored in the map and evaluated immediately afterwards. Then depending on retrieval function (get, getOrComputeIfAbsent, computeIfAbsent) different things happen. This means that if the evaluation of a value failed, the value need need not be removed from the map.
| Method | Type | Containing | Action |
|---|---|---|---|
| get | Absent | - | return null |
| get | Constant | value | return value |
| get | Constant | null | return null |
| get | Memoized | value | return value |
| get | Memoized | null | return null |
| get | Memoized | exception | trow exception |
| get | Defered | value | return value |
| get | Defered | exception | return null |
| getOrComputeIfAbsent | Absent | - | compute new value |
| getOrComputeIfAbsent | Constant | value | return value |
| getOrComputeIfAbsent | Constant | null | return null |
| getOrComputeIfAbsent | Memoized | value | return value |
| getOrComputeIfAbsent | Memoized | null | return null |
| getOrComputeIfAbsent | Memoized | exception | trow exception |
| getOrComputeIfAbsent | Defered | value | return value |
| getOrComputeIfAbsent | Defered | exception | return null |
| computeIfAbsent | Absent | - | compute new value |
| computeIfAbsent | Constant | value | return value |
| computeIfAbsent | Constant | null | compute new value |
| computeIfAbsent | Memoized | value | return value |
| computeIfAbsent | Memoized | null | compute new value |
| computeIfAbsent | Memoized | exception | compute new value |
| computeIfAbsent | Defered | value | return value |
| computeIfAbsent | Defered | exception | compute new value |
You have addressed some this implicitly already but I think it would be good to make this explicit.
For example one of the cases that was overlooked in the current situation happens when a call to getOrComputeIfAbsent puts a value in the store but computeIfAbsent wins the race to evaluates it. This would allow an exception to be thrown from computeIfAbsent that did not originate from its creator function. I think the solution would be to have both computeIfAbsent and getOrComputeIfAbsent use the DeferredSupplier.
Finally, I think the current solution is also very dependent on testing race conditions. This make for rather verbose and potentially flakey tests. I think we can avoid most of that by letting the concurrent hash-map do the heavy lifting and then figure out what to do with it state afterwards.
Anyway, I'm going to have a look tomorrow and see if I can get all of those concepts into a pull request.
This PR fixes a concurrency bug in
NamespacedHierarchicalStore.computeIfAbsentwheredefaultCreatorwas executed while holding the internal map's bucket lock, causing:defaultCreatoraccesses other keys that collide in the same hash bucketRoot Cause
The previous implementation called
defaultCreator.apply(key)insideConcurrentHashMap.compute(), which holds the bucket lock. Any thread trying to access a different key in the same bucket was blocked untildefaultCreatorcompleted.Fix
The fix moves
defaultCreatorexecution outside the lock:MemoizingSupplierorDeferredSupplier) is installed via the map operationget()returnsnull(not the exception) during the transient windowNote: Two implementation approaches are provided as separate commits. See the comment below for details on each approach.
Changes
Implementation
computeIfAbsentto executedefaultCreatoroutside the bucket lockget()doesn't observe transient exceptions from failedcomputeIfAbsentcallsTests
simulateRaceConditionInComputeIfAbsentWithCollidingKeys: Verifies correct initialization under contention with colliding keyscomputeIfAbsentWithCollidingKeysDoesNotBlockConcurrentAccess: Verifies no blocking between colliding keyscomputeIfAbsentDoesNotDeadlockWithCollidingKeys: Verifies no deadlock with colliding keysgetOrComputeIfAbsentDoesNotDeadlockWithCollidingKeys: Same for deprecated methodgetDoesNotSeeTransientExceptionFromComputeIfAbsent: Verifies atomicity for transient failuresgetConcurrentWithFailingComputeIfAbsentDoesNotSeeException: Stress test for atomicitycomputeIfAbsentOverridesParentNullValue: Verifies parent/child store semanticsCollidingKeyhelper to force hash collisions for testingFixes #5171
This should also fix the flakiness in AssertJ's
SoftAssertionsExtension_PER_CLASS_Concurrency_Test(assertj/assertj#1996).I hereby agree to the terms of the JUnit Contributor License Agreement.
Definition of Done
@APIannotations