Skip to content

Potential memory leak and race with the JVMTI wallclock sampler #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 9, 2025

Conversation

zhengyu123
Copy link
Contributor

@zhengyu123 zhengyu123 commented Jun 30, 2025

What does this PR do?:
Release jthread local reference to prevent memory leak.

Profiler uses jvmtiError GetAllThreads(jvmtiEnv* env, jint* threads_count_ptr, jthread** threads_ptr) to obtain a list of running threads. The document states

On return, the jthread* points to a newly allocated array of size *threads_count_ptr. The array should be freed with [Deallocate](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#Deallocate). The objects returned by threads_ptr are JNI local references and must be [managed](https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#refs).

The returned array contains JNI local references of threads, should be managed by caller, which means the caller should manage the life cycle of returned JNI local reference. In this case, we should delete those JNI local references to avoid the leak.

Also, JVMTI GetAllTheads() snapshots alive threads, the returned JNI local references only guarantee that Thread objects are not reclaimed by GCs, it does not prevent underneath native thread from exiting, so that, we have to be extremely careful when examining captured thread's native structures, as they may no longer be valid.

Motivation:
Make JVMTI wallclock sampler useable.

Additional Notes:

How to test the change?:
Run:

java -javaagent:/Users/zhengyu.gu/ws/dd-java-agent.jar -Ddd.profiling.enabled=true -Ddd.profiling.upload.period=10 -Ddd.profiling.start-force-first=true -Ddd.profiling.ddprof.debug.lib=/Users/zhengyu.gu/go/src/github.com/DataDog/java-profiler/ddprof-lib/build/lib/main/debug/macos/arm64/libjavaProfiler.dylib -Ddd.env=workspace-jb -Ddd.service=akka-uct -XX:NativeMemoryTracking=summary -Ddd.profiling.smap.aggregation.enabled=false -Ddd.profiling.experimental.ddprof.wall.jvmti=true -Ddd.profiling.ddprof.wall.context.filter=false -jar renaissance-gpl-0.16.0.jar akka-uct -r 500000

It crashes without this fix, no crash with the fix.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-10859

Unsure? Have a question? Request a review!

Copy link

github-actions bot commented Jun 30, 2025

🔧 Report generated by pr-comment-cppcheck

CppCheck Report

Errors (2)

Warnings (4)

Style Violations (301)

Copy link

github-actions bot commented Jun 30, 2025

🔧 Report generated by pr-comment-scanbuild

@zhengyu123 zhengyu123 marked this pull request as ready for review June 30, 2025 19:51
@zhengyu123 zhengyu123 marked this pull request as draft June 30, 2025 20:07
Copy link
Collaborator

@jbachorik jbachorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@zhengyu123 zhengyu123 changed the title Potential memory leak with the JVMTI wallclock sampler Potential memory leak and race with the JVMTI wallclock sampler Jul 7, 2025
@zhengyu123 zhengyu123 marked this pull request as ready for review July 7, 2025 18:52
@zhengyu123 zhengyu123 requested a review from r1viollet July 7, 2025 18:53
jint threads_count = 0;
jthread* threads_ptr = nullptr;
if (jvmti->GetAllThreads(&threads_count, &threads_ptr) != JVMTI_ERROR_NONE ||
threads_count == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my domain, though I think returning on threads_count == 0 is a functional difference.
I think it is OK to stop the profiler if there are no threads.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should always have at least one thread reported by JVMTI

Copy link
Collaborator

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@r1viollet
Copy link
Collaborator

I allowed myself to pull main into your branch, I'm chasing flaky tests down and wanted to check if some of them were fixed.

Copy link
Collaborator

@jbachorik jbachorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few things I think are not right. Please, re-check and eventually fix. Thanks!

Copy link
Collaborator

@jbachorik jbachorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zhengyu123 zhengyu123 force-pushed the zgu/release_local_ref branch from 2e7e43f to a1db674 Compare July 9, 2025 17:30
@zhengyu123 zhengyu123 merged commit c76f65f into main Jul 9, 2025
92 of 94 checks passed
@zhengyu123 zhengyu123 deleted the zgu/release_local_ref branch July 9, 2025 20:18
@github-actions github-actions bot added this to the 1.29.0 milestone Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants