Skip to content

Conversation

@martinfrancois
Copy link

@martinfrancois martinfrancois commented Dec 7, 2025

This PR fixes a concurrency bug in NamespacedHierarchicalStore.computeIfAbsent where defaultCreator was executed while holding the internal map's bucket lock, causing:

  1. Threads accessing different keys in the same hash bucket to block each other during parallel test execution
  2. Potential deadlocks when defaultCreator accesses other keys that collide in the same hash bucket

Root Cause

The previous implementation called defaultCreator.apply(key) inside ConcurrentHashMap.compute(), which holds the bucket lock. Any thread trying to access a different key in the same bucket was blocked until defaultCreator completed.

Fix

The fix moves defaultCreator execution outside the lock:

  • A placeholder (MemoizingSupplier or DeferredSupplier) is installed via the map operation
  • The actual computation runs after the map operation completes
  • Failed computations are cleaned up, and get() returns null (not the exception) during the transient window

Note: Two implementation approaches are provided as separate commits. See the comment below for details on each approach.

Changes

Implementation

  • Reworked computeIfAbsent to execute defaultCreator outside the bucket lock
  • Ensured get() doesn't observe transient exceptions from failed computeIfAbsent calls
  • Preserved "one initialization per key" semantics

Tests

  • simulateRaceConditionInComputeIfAbsentWithCollidingKeys: Verifies correct initialization under contention with colliding keys
  • computeIfAbsentWithCollidingKeysDoesNotBlockConcurrentAccess: Verifies no blocking between colliding keys
  • computeIfAbsentDoesNotDeadlockWithCollidingKeys: Verifies no deadlock with colliding keys
  • getOrComputeIfAbsentDoesNotDeadlockWithCollidingKeys: Same for deprecated method
  • getDoesNotSeeTransientExceptionFromComputeIfAbsent: Verifies atomicity for transient failures
  • getConcurrentWithFailingComputeIfAbsentDoesNotSeeException: Stress test for atomicity
  • computeIfAbsentOverridesParentNullValue: Verifies parent/child store semantics
  • Added CollidingKey helper to force hash collisions for testing

Fixes #5171

This should also fix the flakiness in AssertJ's SoftAssertionsExtension_PER_CLASS_Concurrency_Test (assertj/assertj#1996).


I hereby agree to the terms of the JUnit Contributor License Agreement.


Definition of Done

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix. Precisely executed and documented without any flaw, thanks a lot.

+1

@testlens-app
Copy link

testlens-app bot commented Dec 8, 2025

🔎 No tests executed 🔎

🏷️ Commit: 32faf28
▶️ Tests: 0 executed
⚪️ Checks: 0/0 completed


Learn more about TestLens at testlens.app.

@martinfrancois
Copy link
Author

martinfrancois commented Dec 8, 2025

You're welcome @Pankraz76, thanks as well for the praise and the review, I really appreciate it! :)

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 feat. complete.

Well done, thanks again for dedication leading to incrementation.

Now its just about polish, giving optional potential dedication - striving for excellence.

But also this is danger land, might better to extract this into clean PR afterwards. Scout principle is nice, still tend ppl. to tilt on this, likely to be overwhelmed.

Copy link
Contributor

@Pankraz76 Pankraz76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2cts.

going fully functional separating the concerns (SoC/SRP).

@martinfrancois
Copy link
Author

You’re welcome, and thanks again, @Pankraz76, for taking another careful look.
Your suggestions are great, but as you pointed out, they mostly target existing code that could be cleaned up independently. I’d like to keep this PR focused on the bugfix so we do not drag out the review with additional refactoring.
Once this is merged, I am happy to follow up with a separate cleanup PR. I do not want to delay AssertJ being able to update to JUnit 6 again any longer than necessary 🙂

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a quick read through and left some comments to resolve open questions, but this is not a full review yet.

It seems that you've found and solved a different problem than described in #5171. And you claim the solutions overlap.

Unfortunately the original problem is quite tricky and the description for this pull request incredibly verbose. We'll need some time to go through the details. You can help us process this by writing a much more concise PR description.

block each other and temporarily see a missing or incorrectly initialized state
for values created via `computeIfAbsent`. The method now evaluates
`defaultCreator` outside the critical section using a memoizing supplier,
aligning its behavior with the deprecated `getOrComputeIfAbsent`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write this down more concisely? The release notes generally focus on a top-line understanding of what was fixed. You could express this as having solved the symptoms of #5209 rather than its root cause.

You could for clarity also add a second item that describes how computeIfAbsent no longer deadlocks.

}

@Test
void computeIfAbsentCanDeadlockWithCollidingKeys() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this test suggests that computeIfAbsent can currently deadlock. But I assume that after your fix this is no longer the case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. After the fix, this scenario should no longer deadlock. I renamed it to computeIfAbsentDoesNotDeadlockWithCollidingKeys

Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've already found one reason this will not work as expected. See comment below.

simulateRaceConditionInComputeIfAbsent did not catch this issue because it only exercises contention on a single key and relies on ConcurrentHashMap's per-key atomicity; it does not force different keys into the same bucket or run user code in a way that re-enters the store while the map lock is held, so the problematic interaction never occurs in that test

With this in mind, I would have expected to see a test like simulateRaceConditionInComputeIfAbsent that forces keys into the same bucket.

return requireNonNull(newStoredValue.evaluate());
}
catch (Throwable t) {
storedValues.remove(compositeKey, newStoredValue);
Copy link
Contributor

@mpkorstanje mpkorstanje Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a period of time between storedValues.compute() and storedValues.remove() where a different thread via getStoredValue() can briefly access the newStoredValue and encounter its stored exception. As such the stores operations are not atomic.

And I think this invalidates any approach that tries to avoid execution of the defaultCreator outside the compute method.

Copy link
Author

@martinfrancois martinfrancois Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think this invalidates any approach that tries to avoid execution of the defaultCreator outside the compute method.

Both implementations now address this concern:

Approach 1 (MemoizingSupplier): Added transientFailures flag. When true, get() returns null instead of throwing during the transient window.

Approach 2 (DeferredSupplier): Uses type-based dispatch. DeferredSupplier.get() catches ExecutionException and returns null, while getOrThrow() rethrows for the original computeIfAbsent caller.

Both ensure:

  • defaultCreator runs outside the bucket lock (avoiding blocking/deadlock)
  • get() returns null during the transient window (not the exception)
  • The original computeIfAbsent caller sees the exception and cleans up
  • After cleanup, get() continues to return null (correct final state)

Tests added:

  • getDoesNotSeeTransientExceptionFromComputeIfAbsent
  • getConcurrentWithFailingComputeIfAbsentDoesNotSeeException

var result = StoredValue.evaluateIfNotNull(storedValue);
if (result == null) {
StoredValue newStoredValue = this.storedValues.compute(compositeKey, (__, oldStoredValue) -> {
if (StoredValue.evaluateIfNotNull(oldStoredValue) == null) {
Copy link
Contributor

@mpkorstanje mpkorstanje Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your analysis you said:

In NamespacedHierarchicalStore#computeIfAbsent, the implementation previously relied on ConcurrentMap.computeIfAbsent, which provides the "one logical initialization per key" behavior. After the change to storedValues.compute(…), every call to NamespacedHierarchicalStore.computeIfAbsent for the same key can rerun the initialization logic and replace the existing StoredValue.

That means that even though each compute call is atomic, two threads calling NamespacedHierarchicalStore.computeIfAbsent for the same key can:

  1. Have Thread A initialize the stored value and start tracking statistics.
  2. Then have Thread B rerun the initialization and replace that value, effectively resetting the statistics.

But looking at the existing implementation, the defaultCreator is not applied until after the oldStoredValue has been checked. So when defaultCreator is applied for a given key a value was either not set at all or that value was set and set not null. So on the face of it the defaultCreator should be applied at most once and point 2 shouldn't happen.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that the previous implementation checked oldStoredValue before applying defaultCreator. The issue wasn't about re-running initialization for the same key, but about where defaultCreator executes.

The previous implementation ran defaultCreator.apply(key) inside the compute() lambda while holding the bucket lock. Even though the check prevents double-initialization, any thread trying to access a different key in the same bucket is blocked until defaultCreator completes.

The fix moves defaultCreator execution outside the lock, so the map operation is fast and doesn't block other bucket operations.

return defaultCreator.apply(key);
})));
}
return storedValue.evaluate();
Copy link
Contributor

@mpkorstanje mpkorstanje Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be the same problem as #5209 (comment). The result of a failing defaultCreator can be seen through get. This behaviour is guarded against for computeIfAbsent and verify with tests.

Note to self: Write some more tests to cover this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed together with #5209 (comment). Both implementations ensure that:

  • The caller of computeIfAbsent sees the exception
  • Other callers via get() see null

This preserves the expected semantics: after computeIfAbsent fails and removes the entry, get() returns null.

Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 12, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
Pankraz76 pushed a commit to Pankraz76/junit5 that referenced this pull request Dec 13, 2025
@martinfrancois
Copy link
Author

I think I've already found one reason this will not work as expected. See comment below.

simulateRaceConditionInComputeIfAbsent did not catch this issue because it only exercises contention on a single key and relies on ConcurrentHashMap's per-key atomicity; it does not force different keys into the same bucket or run user code in a way that re-enters the store while the map lock is held, so the problematic interaction never occurs in that test

With this in mind, I would have expected to see a test like simulateRaceConditionInComputeIfAbsent that forces keys into the same bucket.

Added simulateRaceConditionInComputeIfAbsentWithCollidingKeys which:

  • Uses 20 threads racing on two colliding keys
  • Verifies each key's defaultCreator is called exactly once
  • Verifies all threads see and share the same computed value

This test fails with the old implementation (creator called 10x instead of 1x) and passes with the fix.

I also added computeIfAbsentWithCollidingKeysDoesNotBlockConcurrentAccess which specifically tests that computeIfAbsent for one key doesn't block concurrent access to a different key in the same hash bucket.

@martinfrancois
Copy link
Author

martinfrancois commented Dec 14, 2025

Thanks for the initial review @mpkorstanje!

I apologize for being a bit verbose on both the release notes and the description of the PR. I now shortened both of them, I hope I didn't shorten them too much - let me know if I should make any changes there.

I also addressed your comments by adding both tests to reproduce the issues you mentioned and also addressed them. I first implemented an approach but didn't like some aspects, so I tried another approach. Both approaches ended up working, and since I'm not sure which one you would prefer I decided to push both as separate commits, so I could later drop the one you don't like.

  1. Commit 1 (267e938): Uses MemoizingSupplier with a transientFailures flag

    • Adds a boolean flag to distinguish between callers that should see exceptions vs. those that shouldn't
    • Uses storedValues.compute() with the supplier installed inside the lambda
  2. Commit 2 (32faf28): Uses DeferredSupplier with FutureTask

    • Uses DeferredSupplier for computeIfAbsent, MemoizingSupplier for getOrComputeIfAbsent
    • Uses explicit CAS-retry loop (for (;;) with putIfAbsent/replace) instead of compute()
    • Leverages FutureTask which already provides "run once" semantics

Both approaches:

  • Fix the bucket-lock contention issue (threads no longer block each other on colliding keys)
  • Fix the deadlock scenario (first computation waiting for second no longer deadlocks)
  • Ensure get() doesn't see transient exceptions from failed computeIfAbsent
  • Pass all existing and new tests

Trade-offs:

Aspect Approach 1 (MemoizingSupplier) Approach 2 (DeferredSupplier)
Complexity Less code, adds flag to existing class More code, new class
Dispatch Boolean flag (more implicit) Type-based (cleaner, but more verbose)
Pattern Uses compute() Explicit CAS-retry loop (matches ConcurrentHashMap internals)
Dependencies None Uses FutureTask

Please let me know which approach you prefer, and I'll drop the other commit.

`computeIfAbsent` previously invoked `defaultCreator` while holding the
store's internal map lock. Under parallel execution this could cause
threads using the store to block each other and temporarily see missing
or incorrectly initialized state for values created via
`computeIfAbsent`.

The implementation now wraps `defaultCreator` in a `MemoizingSupplier`
and installs that supplier via the map operation, evaluating it only
after the update has completed. This avoids running user code while
holding the lock and aligns `computeIfAbsent`'s behavior with the
deprecated `getOrComputeIfAbsent`, preserving the intended "one
initialization per key" semantics.

Issue: junit-team#5171
Signed-off-by: martinfrancois <[email protected]>
Signed-off-by: martinfrancois <[email protected]>
Copy link
Contributor

@mpkorstanje mpkorstanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think you're going in the right direction.

But I don't think the CAS loop is a good solution. It means that we're not taking advantage of the existing data structure.

Rather, I think it would be important to recognize and make explicit that there are 4 type of stored values. Absent (if nothing was stored yet), Constant (from put), Memoized (from getOrComputeIfAbsent) and Defered (from computeIfAbsent). These are containers that may have the following contents:

Type Value Null Exception
Absent No No No
Constant Maybe Maybe No
Memoized Maybe Maybe Maybe
Defered Maybe No Maybe

These can be directly stored in the map and evaluated immediately afterwards. Then depending on retrieval function (get, getOrComputeIfAbsent, computeIfAbsent) different things happen. This means that if the evaluation of a value failed, the value need need not be removed from the map.

Method Type Containing Action
get Absent - return null
get Constant value return value
get Constant null return null
get Memoized value return value
get Memoized null return null
get Memoized exception trow exception
get Defered value return value
get Defered exception return null
getOrComputeIfAbsent Absent - compute new value
getOrComputeIfAbsent Constant value return value
getOrComputeIfAbsent Constant null return null
getOrComputeIfAbsent Memoized value return value
getOrComputeIfAbsent Memoized null return null
getOrComputeIfAbsent Memoized exception trow exception
getOrComputeIfAbsent Defered value return value
getOrComputeIfAbsent Defered exception return null
computeIfAbsent Absent - compute new value
computeIfAbsent Constant value return value
computeIfAbsent Constant null compute new value
computeIfAbsent Memoized value return value
computeIfAbsent Memoized null compute new value
computeIfAbsent Memoized exception compute new value
computeIfAbsent Defered value return value
computeIfAbsent Defered exception compute new value

You have addressed some this implicitly already but I think it would be good to make this explicit.

For example one of the cases that was overlooked in the current situation happens when a call to getOrComputeIfAbsent puts a value in the store but computeIfAbsent wins the race to evaluates it. This would allow an exception to be thrown from computeIfAbsent that did not originate from its creator function. I think the solution would be to have both computeIfAbsent and getOrComputeIfAbsent use the DeferredSupplier.

Finally, I think the current solution is also very dependent on testing race conditions. This make for rather verbose and potentially flakey tests. I think we can avoid most of that by letting the concurrent hash-map do the heavy lifting and then figure out what to do with it state afterwards.

Anyway, I'm going to have a look tomorrow and see if I can get all of those concepts into a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concurrency problem in NamespacedHierarchicalStore#computeIfAbsent

3 participants