8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively #26401

sendaoYan · 2025-07-19T02:08:59Z

Hi all,

The test runtime/Thread/ThreadCountLimit.java was observed fails when run with other tests. The test start subprocess with shell prefix command ulimit -u 4096 which intend to limite the usage of thread number. But this will cause test fails when this test run with other tests. I create a demo to demonstrate that.

I start a java process which will create 5k threads, and then I can not start new java process with prefix ulimit -u 4096 on the same machine.

ManyThreads.java.txt

So it's necessary to make this test run sperately to make this test success.
Change has been verified locally, test-fix only, risk is low.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26401/head:pull/26401
$ git checkout pull/26401

Update a local copy of the PR:
$ git checkout pull/26401
$ git pull https://git.openjdk.org/jdk.git pull/26401/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26401

View PR using the GUI difftool:
$ git pr show -t 26401

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26401.diff

Using Webrev

Link to Webrev Comment

…vely

bridgekeeper · 2025-07-19T02:09:46Z

👋 Welcome back syan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-19T02:10:33Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-07-19T02:11:12Z

@sendaoYan The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-07-19T02:15:36Z

Webrevs

dholmes-ora · 2025-07-21T02:23:39Z

@sendaoYan exclusiveAccess.dirs does not work the way you expect/require. It simply indicates that only one test at a time may run from the given directory. It does not mean no other tests from any other directories may run.

dholmes-ora · 2025-07-21T02:29:37Z

FWIW we see no issue running this test, but we ensure we already have a high ulimit setting available in our test machines by default.

sendaoYan · 2025-07-21T09:00:03Z

FWIW we see no issue running this test, but we ensure we already have a high ulimit setting available in our test machines by default.

Maybe this test has been excluded by TEST.groups
The error reported in this testcase should not be related to the ulimit configuration of the test environment, but may be related to the number of CPU cores of the machine. On a machine with a large number of CPU cores, each testcase will start more gc threads and JIT threads, and the number of jtreg concurrency will also be relatively large, causing the total number of threads of all testcases to easily exceed 4096. For example, in the example below, my environment configuration ulimit -u is unlimited. I first start a background java process, which will start 5000 threads and will not exit; then I use shell predix ulimit -u to start the java process (similar to the test situation of this testcase), and then I cannot start java.

ManyThreads.java.txt

…t/hotspot/jtreg/resourcehogs/runtime/Thread/

sendaoYan · 2025-07-21T09:20:13Z

@sendaoYan exclusiveAccess.dirs does not work the way you expect/require. It simply indicates that only one test at a time may run from the given directory. It does not mean no other tests from any other directories may run.

Thanks your correction @dholmes-ora.
I have move this test to test/hotspot/jtreg/resourcehogs, similar to JDK-8227645.

dholmes-ora · 2025-07-21T11:59:00Z

I first start a background java process, which will start 5000 threads and will not exit; then I use shell predix ulimit -u to start the java process (similar to the test situation of this testcase), and then I cannot start java.

Okay, but in that scenario what is it you are actually running out of?

You are changing the test to suit the way you need to run it, but I'm not aware of anyone else reporting issues running this test.

sendaoYan · 2025-07-21T13:30:47Z

Okay, but in that scenario what is it you are actually running out of?

I think it's running out of "user processes" which limit by ulimit -u 4096.

I think it is the user processes set by ulimit -u are exhausted that Java cannot start.
I created a small example in C language to illustrate this problem. The create-thread program will try to create threads continuously until it can no longer create threads, or the number of threads created exceeds 5000.

Use bash -c "ulimit -u 1000 && ./create-thread" command shows that the number of threads that the create-thread program can create is about 580;

First directly start the create-thread program (background running mode), and then use bash -c "ulimit -u 4096 && ./create-thread" command to test the number of threads that the second create-thread process can create. It will be found that the second create-thread process cannot create any thread. Explain that the number of max user processes limited by ulimit -u 4096 includes the number of threads created by the first create-thread process.
This C language example shows that this testcase is not suitable for concurrent running with other test cases, otherwise we may encounter the failure described by the issue

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

void *thread_func() {
    while (1) {
        sleep(1);
    }
    return NULL;
}

int main() {
    pthread_t tid;
    int thread_count = 0;
    int ret;

    while (thread_count < 5000) {
        ret = pthread_create(&tid, NULL, thread_func, NULL);
        if (ret != 0) {
            if (ret == EAGAIN) {
                printf("can not create thread(EAGAIN) anymore: %d\n", thread_count);
                break;
            } else {
                printf("pthread_create error: %s\n", strerror(ret));
                break;
            }
        }
        thread_count++;
        if (thread_count % 1000 == 0) {
            printf("already create thread number %d\n", thread_count);
        }
    }
    printf("total created thread number: %d\n", thread_count);

    while (1) {
        sleep(1);
    }

    return 0;
}

You are changing the test to suit the way you need to run it, but I'm not aware of anyone else reporting issues running this test.

I think the failure descripted by issue, only appearance on huge CPU core number machine.

dholmes-ora · 2025-07-21T21:20:19Z

Explain that the number of max user processes limited by ulimit -u 4096 includes the number of threads created by the first create-thread process.

That's not the way ulimit should work in different sub-shells. What is the ulimit in the parent shell? I think the subshells are limited by the parent.

sendaoYan · 2025-07-22T01:59:30Z

That's not the way ulimit should work in different sub-shells.

I initially thought that ulimit shouldn't work like that in different sub-shells. But actually ulimit works in different sub-shells as unexpectedly.
The testcase runtime/Thread/ThreadCountLimit.java attempts to limited the number user processes of 4096 by adding the prefix "bash -c ulimit -u 4096" to start the child process, but the actual situation is that ulimit does not work as expected. If this testcase run with other tests simultaneously, the number of threads can created maybe be zero, at least the number always less than 4096, it depends how many user processes has been created in the test machine.

What is the ulimit in the parent shell? I think the subshells are limited by the parent.

The ulimit in the parent shell is unlimited. The first process "./create-thread" can create 5k threads shows that the parent shell has no limit.

dholmes-ora · 2025-07-22T05:28:28Z

There is definitely something unexpected/odd about the behaviour of ulimit when used in this way, though I do not observe the exact problems you describe unless I run a number of test processes concurrently - which is simply demonstrating machine overloading.

First, what does it even mean to use ulimit -u? The manpage says it limits the maximum number of processes the user can create - it doesn't say "per shell" (and setrlimit confirms this). But you can easily demonstrate that the user can create far more processes/threads than have been set by a ulimit command running in another shell. So perhaps there is something else that affects how ulimit works, and that something is different between our systems and yours. ?? (I know there are capabilities that disable the limit but I couldn't see any indication such capabilities were present.)

Second, I observe that with ulimit -u 1024 I can't even run java-version - which makes no sense in terms of number of threads created. Relatedly with a 4096 limit the test typically can only create around 2500 threads - so where did the other 1500+ go?

The use of ulimit was added to the test, for Linux only, because we found we could exhaust other resources that could then cause fatal errors in the VM in unexpected places - rather than the failure of pthread_create that we were trying to induce.

I'm really not sure how to proceed here. The change you propose affects all platforms, but there is only an issue for you on Linux.

sendaoYan · 2025-07-22T09:58:53Z

Hi @dholmes-ora

which is simply demonstrating machine overloading.

I think it's not machine overloading, becasuse the setting of 'ulimit -u' on my machine is 'unlimited'. I can create 5000 threads many times, show as below:

you can easily demonstrate that the user can create far more processes/threads than have been set by a ulimit command running in another shell

I think the 'ulimit -u' in sub-shell take effect in the sub-shell only, it's temporary setting, it will not affect the parent shell.

Relatedly with a 4096 limit the test typically can only create around 2500 threads - so where did the other 1500+ go?

It seems that the sub-shell with 'ulimit -u 4096' prefix will count all the user processes number. It's just my speculatation. That's why this test not suitable run with other tests simultaneous

Anyway, I change this PR to use docker run --pids-limit 4096 to instead the original 'ulimit -u 4096'. It will make this test more complict but more elegant and more robustness.

… 4096'

dholmes-ora · 2025-07-22T12:07:31Z

You can't just change the test to use docker! This is not a container test. We use special test tasks to run container tests in an environment where containers are enabled.

This reverts commit 374c297.

…limit -u 4096'" This reverts commit f993419.

dholmes-ora · 2025-07-22T12:13:44Z

I think the 'ulimit -u' in sub-shell take effect in the sub-shell only, it's temporary setting, it will not affect the parent shell.

I'm finding some of these statements to be contradictory to the problem being stated. If the ulimit setting only affects the sub-shell then it can't cause other concurrent tests to hit the limit and fail to create threads!

It seems that the sub-shell with 'ulimit -u 4096' prefix will count all the user processes number. It's just my speculatation. That's why this test not suitable run with other tests simultaneous

If the sub-shell counts all processes/threads belonging to the user and applies the new ulimit then that would make some sense. But again how does that then cause any problem in a different shell?

sendaoYan · 2025-07-22T12:14:47Z

You can't just change the test to use docker! This is not a container test. We use special test tasks to run container tests in an environment where containers are enabled.

Okey, I have revert the docker commit.

sendaoYan · 2025-07-22T12:39:42Z

If the ulimit setting only affects the sub-shell then it can't cause other concurrent tests to hit the limit and fail to create threads!

Maybe some of my previous statements have caused some misunderstandings.
The usage of ulimit in this testcase will not cause other concurrent tests to hit the limit, but will cause this test itself do not have enough user processes to start the java.
On the huge core number machine, every test will create more JIT compiler threads and more GC work threads. So when this test run with other tests simultancely, we can see this test can not start subprocess java with prefix "ulimit -u", the subprocess java report Failed to start thread "GC Thread#0", because the subprocess has limited by "ulimit -u 4096", and the user processes resources maybe has been occupied by other tests which run simultancely. And the other tests run normally, because they do not have 'ulimit -u' explicitly.

8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusi…

2b42b7c

…vely

openjdk bot added the rfr Pull request is ready for review label Jul 19, 2025

openjdk bot added the hotspot-runtime [email protected] label Jul 19, 2025

add test/hotspot/jtreg/runtime/Thread/stress/TEST.properties

12a1b4c

update TEST.groups

3c3c697

mv test/hotspot/jtreg/runtime/Thread/stress/ThreadCountLimit.java tes…

5d6c69b

…t/hotspot/jtreg/resourcehogs/runtime/Thread/

Use 'docker run --pids-limit 4096' to instead the original 'ulimit -u…

f993419

… 4096'

openjdk bot removed the rfr Pull request is ready for review label Jul 22, 2025

Remove extra whitespaces

374c297

openjdk bot added the rfr Pull request is ready for review label Jul 22, 2025

SendaoYan added 2 commits July 22, 2025 20:13

Revert "Remove extra whitespaces"

809d3b1

This reverts commit 374c297.

Revert "Use 'docker run --pids-limit 4096' to instead the original 'u…

77884c2

…limit -u 4096'" This reverts commit f993419.

8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively #26401

Are you sure you want to change the base?

8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively #26401

Conversation

sendaoYan commented Jul 19, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jul 19, 2025

Uh oh!

openjdk bot commented Jul 19, 2025

Uh oh!

openjdk bot commented Jul 19, 2025

Uh oh!

mlbridge bot commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

dholmes-ora commented Jul 21, 2025

Uh oh!

dholmes-ora commented Jul 21, 2025

Uh oh!

sendaoYan commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sendaoYan commented Jul 21, 2025

Uh oh!

dholmes-ora commented Jul 21, 2025

Uh oh!

sendaoYan commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dholmes-ora commented Jul 21, 2025

Uh oh!

sendaoYan commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dholmes-ora commented Jul 22, 2025

Uh oh!

sendaoYan commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dholmes-ora commented Jul 22, 2025

Uh oh!

dholmes-ora commented Jul 22, 2025

Uh oh!

sendaoYan commented Jul 22, 2025

Uh oh!

sendaoYan commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sendaoYan commented Jul 19, 2025 •

edited by openjdk bot

Loading

mlbridge bot commented Jul 19, 2025 •

edited

Loading

sendaoYan commented Jul 21, 2025 •

edited

Loading

sendaoYan commented Jul 21, 2025 •

edited

Loading

sendaoYan commented Jul 22, 2025 •

edited

Loading

sendaoYan commented Jul 22, 2025 •

edited

Loading

sendaoYan commented Jul 22, 2025 •

edited

Loading