Skip to content

Add the micro-benchmark for thread filtering #237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2025
Merged

Conversation

jbachorik
Copy link
Collaborator

What does this PR do?:
It adds the very limited micro-benchmark for ThreadFilter.addThread/removeThread combination

Motivation:
Reduce the noise in the more 'macro-benchmarky' benchmarks. Allow to focus on the sole performance of adding and removing a thread to the filter with specific parallelism and synthetic workload and see how the performance scales.

Additional Notes:
Running this benchmark on MacBook M1 the difference between the JNI access and Unsafe access is almost non-existent.

We see a huge cliff when going from single thread to more threads when the workload is very low (10-100ns) which seems to be caused by:

  1. Single contended _size variable which is mutated in atomic fashion by all benchmark threads. I tried sharding that variable and collect the actual size only when needed but that improves the situation only marginally and makes it quite difficult to maintain the unsafe implementation.
  2. I tried a more 'random' thread id mapping using golden ration fibonacci hash - but that also provides almost no improvement and makes the unsafe implementation unhappy, as it assumes the original mapping.
  3. There is the only remaining thing

JNI access

Benchmark Workload Mode Score Units
ThreadFilterBenchmark.threadFilterStress01 0 avgt 0.021 us/op
ThreadFilterBenchmark.threadFilterStress01 7 avgt 0.023 us/op
ThreadFilterBenchmark.threadFilterStress01 70000 avgt 147.206 us/op
ThreadFilterBenchmark.threadFilterStress02 0 avgt 0.143 us/op
ThreadFilterBenchmark.threadFilterStress02 7 avgt 0.151 us/op
ThreadFilterBenchmark.threadFilterStress02 70000 avgt 149.653 us/op
ThreadFilterBenchmark.threadFilterStress04 0 avgt 0.402 us/op
ThreadFilterBenchmark.threadFilterStress04 7 avgt 0.449 us/op
ThreadFilterBenchmark.threadFilterStress04 70000 avgt 166.627 us/op
ThreadFilterBenchmark.threadFilterStress08 0 avgt 1.315 us/op
ThreadFilterBenchmark.threadFilterStress08 7 avgt 1.302 us/op
ThreadFilterBenchmark.threadFilterStress08 70000 avgt 167.421 us/op
ThreadFilterBenchmark.threadFilterStress16 0 avgt 2.783 us/op
ThreadFilterBenchmark.threadFilterStress16 7 avgt 2.772 us/op
ThreadFilterBenchmark.threadFilterStress16 70000 avgt 304.041 us/op
ThreadFilterBenchmark.threadFilterStress99 0 avgt 15.222 us/op
ThreadFilterBenchmark.threadFilterStress99 7 avgt 15.599 us/op
ThreadFilterBenchmark.threadFilterStress99 70000 avgt 1797.784 us/op

Unsafe access

Benchmark Workload Mode Score Units
ThreadFilterBenchmark.threadFilterStress01 0 avgt 0.029 us/op
ThreadFilterBenchmark.threadFilterStress01 7 avgt 0.032 us/op
ThreadFilterBenchmark.threadFilterStress01 70000 avgt 145.954 us/op
ThreadFilterBenchmark.threadFilterStress02 0 avgt 0.165 us/op
ThreadFilterBenchmark.threadFilterStress02 7 avgt 0.171 us/op
ThreadFilterBenchmark.threadFilterStress02 70000 avgt 150.606 us/op
ThreadFilterBenchmark.threadFilterStress04 0 avgt 0.497 us/op
ThreadFilterBenchmark.threadFilterStress04 7 avgt 0.574 us/op
ThreadFilterBenchmark.threadFilterStress04 70000 avgt 163.668 us/op
ThreadFilterBenchmark.threadFilterStress08 0 avgt 1.713 us/op
ThreadFilterBenchmark.threadFilterStress08 7 avgt 1.690 us/op
ThreadFilterBenchmark.threadFilterStress08 70000 avgt 165.619 us/op
ThreadFilterBenchmark.threadFilterStress16 0 avgt 3.817 us/op
ThreadFilterBenchmark.threadFilterStress16 7 avgt 3.953 us/op
ThreadFilterBenchmark.threadFilterStress16 70000 avgt 303.807 us/op
ThreadFilterBenchmark.threadFilterStress99 0 avgt 15.862 us/op
ThreadFilterBenchmark.threadFilterStress99 7 avgt 16.451 us/op
ThreadFilterBenchmark.threadFilterStress99 70000 avgt 1788.177 us/op

Unsure? Have a question? Request a review!

@Param({"0", "7", "70000"})
public String workload;

private long workloadNum = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

Suggested change
private long workloadNum = 0;
private long workloadNum;
Remove initialization, this is already the default value. (...read more)

When initializing fields, prevent initializing fields to the default value. Any additional initialization means more bytecode instructions, and allocating many of these objects may impact your application performance.

If you initialize to a default value, remove the initialization.

View in Datadog  Leave us feedback  Documentation

Copy link

github-actions bot commented Jul 4, 2025

🔧 Report generated by pr-comment-cppcheck

CppCheck Report

Errors (2)

Warnings (4)

Style Violations (297)

Copy link

github-actions bot commented Jul 4, 2025

🔧 Report generated by pr-comment-scanbuild

Copy link
Collaborator

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jbachorik jbachorik merged commit a358978 into main Jul 4, 2025
95 checks passed
@jbachorik jbachorik deleted the jb/thread_filter_bench branch July 4, 2025 12:30
@github-actions github-actions bot added this to the 1.29.0 milestone Jul 4, 2025
zhengyu123 pushed a commit that referenced this pull request Jul 9, 2025
* Add the micro-benchmark for thread filtering

* Do not test for obviously invalid thread id

* Relax threadfilter mem order
zhengyu123 added a commit that referenced this pull request Jul 9, 2025
* Potential memory leak with the JVMTI wallclock sampler

* v1

* Don't sample terminated thread

* v2

* v3

* v4

* Safe access

* Fix thread state

* v5

* Cleanup

* Cleanup

* safeFetch impl

* jdk11 support

* v6

* enhance and cleanup

* fix nullptr deference

* More cleanup

* Erwan's finding

* Fixed memory leak found by Erwan

* [Automated] Bump dev version to 1.29.0

* Update the sonatype repos (#235)

* Fix artifact download URL

* Split debug (#233)

* Split debug
Add build steps to store split debug information for release builds

* Add the micro-benchmark for thread filtering (#237)

* Add the micro-benchmark for thread filtering

* Do not test for obviously invalid thread id

* Relax threadfilter mem order

* Flaky test - j9 OSR (#239)

Skip zing and j9 flaky tests

* Fix flaky allocation test (#241)

Lower threshold for allocation test

* jbachorik's comments

* More jbachorik's comments

* Cleanup thread local references

---------

Co-authored-by: zhengyu.gu <[email protected]>
Co-authored-by: Datadog Java Profiler <[email protected]>
Co-authored-by: Jaroslav Bachorik <[email protected]>
Co-authored-by: Jaroslav Bachorik <[email protected]>
Co-authored-by: r1viollet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants