Skip to content

8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively #26401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

sendaoYan
Copy link
Member

@sendaoYan sendaoYan commented Jul 19, 2025

Hi all,

The test runtime/Thread/ThreadCountLimit.java was observed fails when run with other tests. The test start subprocess with shell prefix command ulimit -u 4096 which intend to limite the usage of thread number. But this will cause test fails when this test run with other tests. I create a demo to demonstrate that.

I start a java process which will create 5k threads, and then I can not start new java process with prefix ulimit -u 4096 on the same machine.

image

ManyThreads.java.txt

So it's necessary to make this test run sperately to make this test success.
Change has been verified locally, test-fix only, risk is low.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359827: Test runtime/Thread/ThreadCountLimit.java should run exclusively (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26401/head:pull/26401
$ git checkout pull/26401

Update a local copy of the PR:
$ git checkout pull/26401
$ git pull https://git.openjdk.org/jdk.git pull/26401/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26401

View PR using the GUI difftool:
$ git pr show -t 26401

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26401.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 19, 2025

👋 Welcome back syan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 19, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 19, 2025
@openjdk
Copy link

openjdk bot commented Jul 19, 2025

@sendaoYan The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jul 19, 2025

Webrevs

@dholmes-ora
Copy link
Member

@sendaoYan exclusiveAccess.dirs does not work the way you expect/require. It simply indicates that only one test at a time may run from the given directory. It does not mean no other tests from any other directories may run.

@dholmes-ora
Copy link
Member

FWIW we see no issue running this test, but we ensure we already have a high ulimit setting available in our test machines by default.

@sendaoYan
Copy link
Member Author

sendaoYan commented Jul 21, 2025

FWIW we see no issue running this test, but we ensure we already have a high ulimit setting available in our test machines by default.

  1. Maybe this test has been excluded by TEST.groups
  2. The error reported in this testcase should not be related to the ulimit configuration of the test environment, but may be related to the number of CPU cores of the machine. On a machine with a large number of CPU cores, each testcase will start more gc threads and JIT threads, and the number of jtreg concurrency will also be relatively large, causing the total number of threads of all testcases to easily exceed 4096. For example, in the example below, my environment configuration ulimit -u is unlimited. I first start a background java process, which will start 5000 threads and will not exit; then I use shell predix ulimit -u to start the java process (similar to the test situation of this testcase), and then I cannot start java.
image

ManyThreads.java.txt

…t/hotspot/jtreg/resourcehogs/runtime/Thread/
@sendaoYan
Copy link
Member Author

@sendaoYan exclusiveAccess.dirs does not work the way you expect/require. It simply indicates that only one test at a time may run from the given directory. It does not mean no other tests from any other directories may run.

Thanks your correction @dholmes-ora.
I have move this test to test/hotspot/jtreg/resourcehogs, similar to JDK-8227645.

@dholmes-ora
Copy link
Member

I first start a background java process, which will start 5000 threads and will not exit; then I use shell predix ulimit -u to start the java process (similar to the test situation of this testcase), and then I cannot start java.

Okay, but in that scenario what is it you are actually running out of?

You are changing the test to suit the way you need to run it, but I'm not aware of anyone else reporting issues running this test.

@sendaoYan
Copy link
Member Author

sendaoYan commented Jul 21, 2025

Okay, but in that scenario what is it you are actually running out of?

I think it's running out of "user processes" which limit by ulimit -u 4096.

I think it is the user processes set by ulimit -u are exhausted that Java cannot start.
I created a small example in C language to illustrate this problem. The create-thread program will try to create threads continuously until it can no longer create threads, or the number of threads created exceeds 5000.

  1. Use bash -c "ulimit -u 1000 && ./create-thread" command shows that the number of threads that the create-thread program can create is about 580;
image
  1. First directly start the create-thread program (background running mode), and then use bash -c "ulimit -u 4096 && ./create-thread" command to test the number of threads that the second create-thread process can create. It will be found that the second create-thread process cannot create any thread. Explain that the number of max user processes limited by ulimit -u 4096 includes the number of threads created by the first create-thread process.
    This C language example shows that this testcase is not suitable for concurrent running with other test cases, otherwise we may encounter the failure described by the issue
image
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

void *thread_func() {
    while (1) {
        sleep(1);
    }
    return NULL;
}

int main() {
    pthread_t tid;
    int thread_count = 0;
    int ret;

    while (thread_count < 5000) {
        ret = pthread_create(&tid, NULL, thread_func, NULL);
        if (ret != 0) {
            if (ret == EAGAIN) {
                printf("can not create thread(EAGAIN) anymore: %d\n", thread_count);
                break;
            } else {
                printf("pthread_create error: %s\n", strerror(ret));
                break;
            }
        }
        thread_count++;
        if (thread_count % 1000 == 0) {
            printf("already create thread number %d\n", thread_count);
        }
    }
    printf("total created thread number: %d\n", thread_count);

    while (1) {
        sleep(1);
    }

    return 0;
}

You are changing the test to suit the way you need to run it, but I'm not aware of anyone else reporting issues running this test.

I think the failure descripted by issue, only appearance on huge CPU core number machine.

@dholmes-ora
Copy link
Member

Explain that the number of max user processes limited by ulimit -u 4096 includes the number of threads created by the first create-thread process.

That's not the way ulimit should work in different sub-shells. What is the ulimit in the parent shell? I think the subshells are limited by the parent.

@sendaoYan
Copy link
Member Author

sendaoYan commented Jul 22, 2025

That's not the way ulimit should work in different sub-shells.

I initially thought that ulimit shouldn't work like that in different sub-shells. But actually ulimit works in different sub-shells as unexpectedly.
The testcase runtime/Thread/ThreadCountLimit.java attempts to limited the number user processes of 4096 by adding the prefix "bash -c ulimit -u 4096" to start the child process, but the actual situation is that ulimit does not work as expected. If this testcase run with other tests simultaneously, the number of threads can created maybe be zero, at least the number always less than 4096, it depends how many user processes has been created in the test machine.

What is the ulimit in the parent shell? I think the subshells are limited by the parent.

The ulimit in the parent shell is unlimited. The first process "./create-thread" can create 5k threads shows that the parent shell has no limit.

image

@dholmes-ora
Copy link
Member

There is definitely something unexpected/odd about the behaviour of ulimit when used in this way, though I do not observe the exact problems you describe unless I run a number of test processes concurrently - which is simply demonstrating machine overloading.

First, what does it even mean to use ulimit -u? The manpage says it limits the maximum number of processes the user can create - it doesn't say "per shell" (and setrlimit confirms this). But you can easily demonstrate that the user can create far more processes/threads than have been set by a ulimit command running in another shell. So perhaps there is something else that affects how ulimit works, and that something is different between our systems and yours. ?? (I know there are capabilities that disable the limit but I couldn't see any indication such capabilities were present.)

Second, I observe that with ulimit -u 1024 I can't even run java-version - which makes no sense in terms of number of threads created. Relatedly with a 4096 limit the test typically can only create around 2500 threads - so where did the other 1500+ go?

The use of ulimit was added to the test, for Linux only, because we found we could exhaust other resources that could then cause fatal errors in the VM in unexpected places - rather than the failure of pthread_create that we were trying to induce.

I'm really not sure how to proceed here. The change you propose affects all platforms, but there is only an issue for you on Linux.

@sendaoYan
Copy link
Member Author

sendaoYan commented Jul 22, 2025

Hi @dholmes-ora

which is simply demonstrating machine overloading.

I think it's not machine overloading, becasuse the setting of 'ulimit -u' on my machine is 'unlimited'. I can create 5000 threads many times, show as below:

image

you can easily demonstrate that the user can create far more processes/threads than have been set by a ulimit command running in another shell

I think the 'ulimit -u' in sub-shell take effect in the sub-shell only, it's temporary setting, it will not affect the parent shell.

Relatedly with a 4096 limit the test typically can only create around 2500 threads - so where did the other 1500+ go?

It seems that the sub-shell with 'ulimit -u 4096' prefix will count all the user processes number. It's just my speculatation. That's why this test not suitable run with other tests simultaneous

Anyway, I change this PR to use docker run --pids-limit 4096 to instead the original 'ulimit -u 4096'. It will make this test more complict but more elegant and more robustness.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jul 22, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 22, 2025
@dholmes-ora
Copy link
Member

You can't just change the test to use docker! This is not a container test. We use special test tasks to run container tests in an environment where containers are enabled.

SendaoYan added 2 commits July 22, 2025 20:13
@dholmes-ora
Copy link
Member

I think the 'ulimit -u' in sub-shell take effect in the sub-shell only, it's temporary setting, it will not affect the parent shell.

I'm finding some of these statements to be contradictory to the problem being stated. If the ulimit setting only affects the sub-shell then it can't cause other concurrent tests to hit the limit and fail to create threads!

It seems that the sub-shell with 'ulimit -u 4096' prefix will count all the user processes number. It's just my speculatation. That's why this test not suitable run with other tests simultaneous

If the sub-shell counts all processes/threads belonging to the user and applies the new ulimit then that would make some sense. But again how does that then cause any problem in a different shell?

@sendaoYan
Copy link
Member Author

You can't just change the test to use docker! This is not a container test. We use special test tasks to run container tests in an environment where containers are enabled.

Okey, I have revert the docker commit.

@sendaoYan
Copy link
Member Author

sendaoYan commented Jul 22, 2025

If the ulimit setting only affects the sub-shell then it can't cause other concurrent tests to hit the limit and fail to create threads!

Maybe some of my previous statements have caused some misunderstandings.
The usage of ulimit in this testcase will not cause other concurrent tests to hit the limit, but will cause this test itself do not have enough user processes to start the java.
On the huge core number machine, every test will create more JIT compiler threads and more GC work threads. So when this test run with other tests simultancely, we can see this test can not start subprocess java with prefix "ulimit -u", the subprocess java report Failed to start thread "GC Thread#0", because the subprocess has limited by "ulimit -u 4096", and the user processes resources maybe has been occupied by other tests which run simultancely. And the other tests run normally, because they do not have 'ulimit -u' explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

2 participants