Skip to content

Conversation

@charlesmyu
Copy link
Contributor

@charlesmyu charlesmyu commented Oct 16, 2025

What Does This Do

  • Instruments the fromSparkPlan function to:
    • Parse the plan parameter into a map of String properties
    • Update the meta field of returned SparkPlanInfo with those properties
  • Creates a Spark21XPlanUtils class with a extractPlanProduct method that parses a SparkPlan object and returns the properties as a <String, String> map
  • Creates a AbstractSparkPlanUtils class with a parsePlanProduct method that parses the various Objects extracted by extractPlanProduct to return a comprehensible string representation
    • This is extended by Spark21XPlanUtils
    • Updates the toJson function in SparkSQLUtils to write a JSON object if possible, otherwise just write a string
  • Updates tests to reflect these changes & additions
  • Gates this feature with two feature flags:
    • dd.data.jobs.experimental_features.enabled: meant to gate all experimental features before we GA, we should leave this on by default for all internal users
    • dd.data.jobs.parse_spark_plan.enabled: meant to gate this feature specifically

Motivation

The SparkPlan houses additional details about its execution that is useful to visualize for operators to use. Extract these into spans so they can be ingested.

Additional Notes

This PR leverages the existing meta field in the SparkPlanInfo class. This should be safe as we don't overwrite the field if any data exists, and it is currently only used for ScanExec node details. Furthermore since this class appears to be primarily intended as an abstraction for informational purposes, any faulty updates to the object shouldn't result in any breaking issues.

Also note that we use the Product API to obtain the key names (using productElementName), however this was only made available in Scala 2.13. As a result the Scala 2.12 instrumentation uses arbitrary _dd.unknown_key.X names for the keys, so the values can at least be extracted.

Worth mentioning that this PR does not introduce traversal of the physical plan itself into the tracer - this is left to Spark itself. This is because the recursive fromSparkPlan method is instrumented, meaning as each node is built the tracer is invoked to parse it, and we expressly filter out any potential QueryPlan nodes when performing the parsing.

Contributor Checklist

Jira ticket: DJM-974

@charlesmyu charlesmyu added type: enhancement Enhancements and improvements inst: apache spark Apache Spark instrumentation labels Oct 16, 2025
@datadog-official
Copy link

datadog-official bot commented Oct 16, 2025

🎯 Code Coverage
Patch Coverage: 0.00%
Total Coverage: 59.89%

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 34528dc | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Oct 16, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761665050 1761664950
git_commit_sha dc6264e 34528dc
release_version 1.55.0-SNAPSHOT~dc6264eaa3 1.55.0-SNAPSHOT~34528dca71
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761666808 1761666808
ci_job_id 1202338126 1202338126
ci_pipeline_id 80508255 80508255
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-ol3rim10 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-ol3rim10 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 2 performance regressions! Performance is the same for 50 metrics, 13 unstable metrics.

scenario Δ mean execution_time candidate mean execution_time baseline mean execution_time
scenario:startup:petclinic:appsec:Debugger worse
[+223.435µs; +371.647µs] or [+3.799%; +6.318%]
6.180ms 5.882ms
scenario:startup:petclinic:tracing:Remote Config worse
[+14.702µs; +51.210µs] or [+2.165%; +7.542%]
711.974µs 679.018µs
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.035 s) : 0, 1035414
Total [baseline] (8.651 s) : 0, 8651094
Agent [candidate] (1.018 s) : 0, 1017556
Total [candidate] (8.692 s) : 0, 8691892
section iast
Agent [baseline] (1.162 s) : 0, 1162425
Total [baseline] (9.358 s) : 0, 9358245
Agent [candidate] (1.155 s) : 0, 1155392
Total [candidate] (9.329 s) : 0, 9329250
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.035 s -
Agent iast 1.162 s 127.011 ms (12.3%)
Total tracing 8.651 s -
Total iast 9.358 s 707.151 ms (8.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.018 s -
Agent iast 1.155 s 137.836 ms (13.5%)
Total tracing 8.692 s -
Total iast 9.329 s 637.358 ms (7.3%)
gantt
    title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.482 ms) : 0, 1482
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (705.996 ms) : 0, 705996
BytebuddyAgent [candidate] (692.638 ms) : 0, 692638
GlobalTracer [baseline] (245.939 ms) : 0, 245939
GlobalTracer [candidate] (241.598 ms) : 0, 241598
AppSec [baseline] (32.512 ms) : 0, 32512
AppSec [candidate] (32.616 ms) : 0, 32616
Debugger [baseline] (6.449 ms) : 0, 6449
Debugger [candidate] (6.409 ms) : 0, 6409
Remote Config [baseline] (706.205 µs) : 0, 706
Remote Config [candidate] (710.978 µs) : 0, 711
Telemetry [baseline] (15.241 ms) : 0, 15241
Telemetry [candidate] (9.324 ms) : 0, 9324
Flare Poller [baseline] (5.791 ms) : 0, 5791
Flare Poller [candidate] (11.599 ms) : 0, 11599
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (825.112 ms) : 0, 825112
BytebuddyAgent [candidate] (818.369 ms) : 0, 818369
GlobalTracer [baseline] (234.12 ms) : 0, 234120
GlobalTracer [candidate] (232.466 ms) : 0, 232466
AppSec [baseline] (28.867 ms) : 0, 28867
AppSec [candidate] (35.098 ms) : 0, 35098
Debugger [baseline] (6.05 ms) : 0, 6050
Debugger [candidate] (6.152 ms) : 0, 6152
Remote Config [baseline] (609.003 µs) : 0, 609
Remote Config [candidate] (618.619 µs) : 0, 619
Telemetry [baseline] (8.351 ms) : 0, 8351
Telemetry [candidate] (8.738 ms) : 0, 8738
Flare Poller [baseline] (4.167 ms) : 0, 4167
Flare Poller [candidate] (4.288 ms) : 0, 4288
IAST [baseline] (32.368 ms) : 0, 32368
IAST [candidate] (26.576 ms) : 0, 26576
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.035 s) : 0, 1034881
Total [baseline] (10.8 s) : 0, 10799541
Agent [candidate] (1.018 s) : 0, 1018116
Total [candidate] (10.745 s) : 0, 10745034
section appsec
Agent [baseline] (1.202 s) : 0, 1201614
Total [baseline] (10.909 s) : 0, 10909230
Agent [candidate] (1.201 s) : 0, 1201017
Total [candidate] (11.036 s) : 0, 11035667
section iast
Agent [baseline] (1.165 s) : 0, 1164556
Total [baseline] (11.114 s) : 0, 11113604
Agent [candidate] (1.158 s) : 0, 1157828
Total [candidate] (11.068 s) : 0, 11067745
section profiling
Agent [baseline] (1.181 s) : 0, 1180971
Total [baseline] (10.939 s) : 0, 10938766
Agent [candidate] (1.16 s) : 0, 1159841
Total [candidate] (11.035 s) : 0, 11035029
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.035 s -
Agent appsec 1.202 s 166.733 ms (16.1%)
Agent iast 1.165 s 129.676 ms (12.5%)
Agent profiling 1.181 s 146.09 ms (14.1%)
Total tracing 10.8 s -
Total appsec 10.909 s 109.689 ms (1.0%)
Total iast 11.114 s 314.063 ms (2.9%)
Total profiling 10.939 s 139.225 ms (1.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.018 s -
Agent appsec 1.201 s 182.901 ms (18.0%)
Agent iast 1.158 s 139.712 ms (13.7%)
Agent profiling 1.16 s 141.725 ms (13.9%)
Total tracing 10.745 s -
Total appsec 11.036 s 290.634 ms (2.7%)
Total iast 11.068 s 322.711 ms (3.0%)
Total profiling 11.035 s 289.995 ms (2.7%)
gantt
    title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.478 ms) : 0, 1478
BytebuddyAgent [baseline] (704.498 ms) : 0, 704498
BytebuddyAgent [candidate] (694.254 ms) : 0, 694254
GlobalTracer [baseline] (246.576 ms) : 0, 246576
GlobalTracer [candidate] (241.98 ms) : 0, 241980
AppSec [baseline] (32.634 ms) : 0, 32634
AppSec [candidate] (32.356 ms) : 0, 32356
Debugger [baseline] (6.487 ms) : 0, 6487
Debugger [candidate] (6.447 ms) : 0, 6447
Remote Config [baseline] (679.018 µs) : 0, 679
Remote Config [candidate] (711.974 µs) : 0, 712
Telemetry [baseline] (13.214 ms) : 0, 13214
Telemetry [candidate] (9.377 ms) : 0, 9377
Flare Poller [baseline] (8.005 ms) : 0, 8005
Flare Poller [candidate] (10.306 ms) : 0, 10306
section appsec
crashtracking [baseline] (1.485 ms) : 0, 1485
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (725.281 ms) : 0, 725281
BytebuddyAgent [candidate] (722.555 ms) : 0, 722555
GlobalTracer [baseline] (235.838 ms) : 0, 235838
GlobalTracer [candidate] (235.852 ms) : 0, 235852
AppSec [baseline] (174.171 ms) : 0, 174171
AppSec [candidate] (175.764 ms) : 0, 175764
Debugger [baseline] (5.882 ms) : 0, 5882
Debugger [candidate] (6.18 ms) : 0, 6180
Remote Config [baseline] (622.151 µs) : 0, 622
Remote Config [candidate] (629.667 µs) : 0, 630
Telemetry [baseline] (8.367 ms) : 0, 8367
Telemetry [candidate] (8.418 ms) : 0, 8418
Flare Poller [baseline] (3.892 ms) : 0, 3892
Flare Poller [candidate] (3.882 ms) : 0, 3882
IAST [baseline] (24.943 ms) : 0, 24943
IAST [candidate] (25.018 ms) : 0, 25018
section iast
crashtracking [baseline] (1.47 ms) : 0, 1470
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (826.435 ms) : 0, 826435
BytebuddyAgent [candidate] (820.163 ms) : 0, 820163
GlobalTracer [baseline] (234.35 ms) : 0, 234350
GlobalTracer [candidate] (232.738 ms) : 0, 232738
AppSec [baseline] (29.238 ms) : 0, 29238
AppSec [candidate] (35.181 ms) : 0, 35181
Debugger [baseline] (6.081 ms) : 0, 6081
Debugger [candidate] (6.161 ms) : 0, 6161
Remote Config [baseline] (605.309 µs) : 0, 605
Remote Config [candidate] (610.783 µs) : 0, 611
Telemetry [baseline] (8.444 ms) : 0, 8444
Telemetry [candidate] (8.721 ms) : 0, 8721
Flare Poller [baseline] (4.135 ms) : 0, 4135
Flare Poller [candidate] (4.294 ms) : 0, 4294
IAST [baseline] (32.52 ms) : 0, 32520
IAST [candidate] (26.747 ms) : 0, 26747
section profiling
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.427 ms) : 0, 1427
BytebuddyAgent [baseline] (731.323 ms) : 0, 731323
BytebuddyAgent [candidate] (720.541 ms) : 0, 720541
GlobalTracer [baseline] (221.8 ms) : 0, 221800
GlobalTracer [candidate] (217.239 ms) : 0, 217239
AppSec [baseline] (32.424 ms) : 0, 32424
AppSec [candidate] (32.247 ms) : 0, 32247
Debugger [baseline] (11.387 ms) : 0, 11387
Debugger [candidate] (6.507 ms) : 0, 6507
Remote Config [baseline] (721.676 µs) : 0, 722
Remote Config [candidate] (812.761 µs) : 0, 813
Telemetry [baseline] (11.399 ms) : 0, 11399
Telemetry [candidate] (16.134 ms) : 0, 16134
Flare Poller [baseline] (4.142 ms) : 0, 4142
Flare Poller [candidate] (4.066 ms) : 0, 4066
ProfilingAgent [baseline] (110.538 ms) : 0, 110538
ProfilingAgent [candidate] (107.923 ms) : 0, 107923
Profiling [baseline] (111.185 ms) : 0, 111185
Profiling [candidate] (108.809 ms) : 0, 108809
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761665050 1761664950
git_commit_sha dc6264e 34528dc
release_version 1.55.0-SNAPSHOT~dc6264eaa3 1.55.0-SNAPSHOT~34528dca71
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1761666482 1761666482
ci_job_id 1202338127 1202338127
ci_pipeline_id 80508255 80508255
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-2-qjwiv03s 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-2-qjwiv03s 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 2 performance regressions! Performance is the same for 9 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:petclinic:profiling:high_load better
[-2.613ms; -1.616ms] or [-5.288%; -3.269%]
unstable
[-2.448op/s; +10.923op/s] or [-2.585%; +11.533%]
47.309ms 98.950op/s 49.423ms 94.713op/s
scenario:load:petclinic:no_agent:high_load worse
[+0.830ms; +1.464ms] or [+2.257%; +3.983%]
unstable
[-12.061op/s; +4.386op/s] or [-9.486%; +3.450%]
37.916ms 123.300op/s 36.769ms 127.138op/s
scenario:load:petclinic:iast:high_load worse
[+2.045ms; +2.904ms] or [+4.617%; +6.557%]
unstable
[-12.529op/s; +1.479op/s] or [-11.866%; +1.401%]
46.768ms 100.062op/s 44.293ms 105.588op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
    dateFormat X
    axisFormat %s
section baseline
no_agent (36.769 ms) : 36483, 37054
.   : milestone, 36769,
appsec (48.797 ms) : 48377, 49216
.   : milestone, 48797,
code_origins (43.683 ms) : 43298, 44067
.   : milestone, 43683,
iast (44.293 ms) : 43906, 44680
.   : milestone, 44293,
profiling (49.423 ms) : 48942, 49905
.   : milestone, 49423,
tracing (45.157 ms) : 44773, 45542
.   : milestone, 45157,
section candidate
no_agent (37.916 ms) : 37612, 38220
.   : milestone, 37916,
appsec (49.035 ms) : 48605, 49465
.   : milestone, 49035,
code_origins (44.17 ms) : 43791, 44550
.   : milestone, 44170,
iast (46.768 ms) : 46357, 47179
.   : milestone, 46768,
profiling (47.309 ms) : 46864, 47753
.   : milestone, 47309,
tracing (44.269 ms) : 43887, 44651
.   : milestone, 44269,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.769 ms [36.483 ms, 37.054 ms] -
appsec 48.797 ms [48.377 ms, 49.216 ms] 12.028 ms (32.7%)
code_origins 43.683 ms [43.298 ms, 44.067 ms] 6.914 ms (18.8%)
iast 44.293 ms [43.906 ms, 44.68 ms] 7.524 ms (20.5%)
profiling 49.423 ms [48.942 ms, 49.905 ms] 12.654 ms (34.4%)
tracing 45.157 ms [44.773 ms, 45.542 ms] 8.388 ms (22.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.916 ms [37.612 ms, 38.22 ms] -
appsec 49.035 ms [48.605 ms, 49.465 ms] 11.119 ms (29.3%)
code_origins 44.17 ms [43.791 ms, 44.55 ms] 6.255 ms (16.5%)
iast 46.768 ms [46.357 ms, 47.179 ms] 8.852 ms (23.3%)
profiling 47.309 ms [46.864 ms, 47.753 ms] 9.393 ms (24.8%)
tracing 44.269 ms [43.887 ms, 44.651 ms] 6.353 ms (16.8%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.395 ms) : 4346, 4445
.   : milestone, 4395,
iast (9.618 ms) : 9445, 9791
.   : milestone, 9618,
iast_FULL (14.045 ms) : 13763, 14328
.   : milestone, 14045,
iast_GLOBAL (10.878 ms) : 10685, 11072
.   : milestone, 10878,
profiling (8.964 ms) : 8823, 9104
.   : milestone, 8964,
tracing (7.9 ms) : 7776, 8024
.   : milestone, 7900,
section candidate
no_agent (4.306 ms) : 4256, 4355
.   : milestone, 4306,
iast (9.729 ms) : 9568, 9890
.   : milestone, 9729,
iast_FULL (14.052 ms) : 13775, 14330
.   : milestone, 14052,
iast_GLOBAL (10.992 ms) : 10794, 11190
.   : milestone, 10992,
profiling (9.031 ms) : 8877, 9185
.   : milestone, 9031,
tracing (7.704 ms) : 7595, 7814
.   : milestone, 7704,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.395 ms [4.346 ms, 4.445 ms] -
iast 9.618 ms [9.445 ms, 9.791 ms] 5.223 ms (118.8%)
iast_FULL 14.045 ms [13.763 ms, 14.328 ms] 9.65 ms (219.6%)
iast_GLOBAL 10.878 ms [10.685 ms, 11.072 ms] 6.483 ms (147.5%)
profiling 8.964 ms [8.823 ms, 9.104 ms] 4.569 ms (103.9%)
tracing 7.9 ms [7.776 ms, 8.024 ms] 3.505 ms (79.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.306 ms [4.256 ms, 4.355 ms] -
iast 9.729 ms [9.568 ms, 9.89 ms] 5.423 ms (126.0%)
iast_FULL 14.052 ms [13.775 ms, 14.33 ms] 9.747 ms (226.4%)
iast_GLOBAL 10.992 ms [10.794 ms, 11.19 ms] 6.686 ms (155.3%)
profiling 9.031 ms [8.877 ms, 9.185 ms] 4.726 ms (109.8%)
tracing 7.704 ms [7.595 ms, 7.814 ms] 3.399 ms (78.9%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master charles.yu/djm-974/extract-spark-plan-product
git_commit_date 1761665050 1761664950
git_commit_sha dc6264e 34528dc
release_version 1.55.0-SNAPSHOT~dc6264eaa3 1.55.0-SNAPSHOT~34528dca71
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1761667079 1761667079
ci_job_id 1202338128 1202338128
ci_pipeline_id 80508255 80508255
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-hcx8nltd 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-hcx8nltd 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.036 s) : 15036000, 15036000
.   : milestone, 15036000,
appsec (14.874 s) : 14874000, 14874000
.   : milestone, 14874000,
iast (18.504 s) : 18504000, 18504000
.   : milestone, 18504000,
iast_GLOBAL (18.268 s) : 18268000, 18268000
.   : milestone, 18268000,
profiling (15.27 s) : 15270000, 15270000
.   : milestone, 15270000,
tracing (15.151 s) : 15151000, 15151000
.   : milestone, 15151000,
section candidate
no_agent (14.912 s) : 14912000, 14912000
.   : milestone, 14912000,
appsec (15.05 s) : 15050000, 15050000
.   : milestone, 15050000,
iast (18.623 s) : 18623000, 18623000
.   : milestone, 18623000,
iast_GLOBAL (17.987 s) : 17987000, 17987000
.   : milestone, 17987000,
profiling (15.652 s) : 15652000, 15652000
.   : milestone, 15652000,
tracing (15.23 s) : 15230000, 15230000
.   : milestone, 15230000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.036 s [15.036 s, 15.036 s] -
appsec 14.874 s [14.874 s, 14.874 s] -162.0 ms (-1.1%)
iast 18.504 s [18.504 s, 18.504 s] 3.468 s (23.1%)
iast_GLOBAL 18.268 s [18.268 s, 18.268 s] 3.232 s (21.5%)
profiling 15.27 s [15.27 s, 15.27 s] 234.0 ms (1.6%)
tracing 15.151 s [15.151 s, 15.151 s] 115.0 ms (0.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.912 s [14.912 s, 14.912 s] -
appsec 15.05 s [15.05 s, 15.05 s] 138.0 ms (0.9%)
iast 18.623 s [18.623 s, 18.623 s] 3.711 s (24.9%)
iast_GLOBAL 17.987 s [17.987 s, 17.987 s] 3.075 s (20.6%)
profiling 15.652 s [15.652 s, 15.652 s] 740.0 ms (5.0%)
tracing 15.23 s [15.23 s, 15.23 s] 318.0 ms (2.1%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.479 ms) : 1467, 1490
.   : milestone, 1479,
appsec (3.71 ms) : 3494, 3927
.   : milestone, 3710,
iast (2.224 ms) : 2160, 2288
.   : milestone, 2224,
iast_GLOBAL (2.254 ms) : 2190, 2318
.   : milestone, 2254,
profiling (2.067 ms) : 2015, 2119
.   : milestone, 2067,
tracing (2.048 ms) : 1998, 2098
.   : milestone, 2048,
section candidate
no_agent (1.483 ms) : 1471, 1494
.   : milestone, 1483,
appsec (3.724 ms) : 3507, 3940
.   : milestone, 3724,
iast (2.209 ms) : 2146, 2273
.   : milestone, 2209,
iast_GLOBAL (2.264 ms) : 2199, 2328
.   : milestone, 2264,
profiling (2.049 ms) : 1998, 2100
.   : milestone, 2049,
tracing (2.038 ms) : 1988, 2088
.   : milestone, 2038,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.479 ms [1.467 ms, 1.49 ms] -
appsec 3.71 ms [3.494 ms, 3.927 ms] 2.232 ms (151.0%)
iast 2.224 ms [2.16 ms, 2.288 ms] 745.476 µs (50.4%)
iast_GLOBAL 2.254 ms [2.19 ms, 2.318 ms] 775.627 µs (52.5%)
profiling 2.067 ms [2.015 ms, 2.119 ms] 588.55 µs (39.8%)
tracing 2.048 ms [1.998 ms, 2.098 ms] 569.307 µs (38.5%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.483 ms [1.471 ms, 1.494 ms] -
appsec 3.724 ms [3.507 ms, 3.94 ms] 2.241 ms (151.1%)
iast 2.209 ms [2.146 ms, 2.273 ms] 726.527 µs (49.0%)
iast_GLOBAL 2.264 ms [2.199 ms, 2.328 ms] 780.776 µs (52.7%)
profiling 2.049 ms [1.998 ms, 2.1 ms] 566.1 µs (38.2%)
tracing 2.038 ms [1.988 ms, 2.088 ms] 555.06 µs (37.4%)

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch 3 times, most recently from dc41615 to d9d6213 Compare October 16, 2025 22:43
Comment on lines 28 to 29
// Should really only return valid JSON types (Array, Map, String, Boolean, Number, null)
public Object parsePlanProduct(Object value) {
Copy link
Contributor Author

@charlesmyu charlesmyu Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that this method returns an Object instead of something definite like a JSON node (or even just a String). The end goal is to allow any JSON object (other than null, which we filter out) to be serialized into a string using writeObjectToString, and this seemed like the most straightforwards way to achieve that. There's probably some more idiomatic way I'm missing - happy to hear about it if anyone has ideas!

@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from d9d6213 to 54ab1ad Compare October 16, 2025 22:47
@charlesmyu charlesmyu force-pushed the charles.yu/djm-974/extract-spark-plan-product branch from 54ab1ad to 0279fff Compare October 17, 2025 04:53
public static void exit(
@Advice.Return(readOnly = false) SparkPlanInfo planInfo,
@Advice.Argument(0) SparkPlan plan) {
if (planInfo.metadata().size() == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the existing metadata on the DataSourceScanExec nodes, we open ourselves to a bit of inconsistency in the JSON parsing:

"meta": {
    "Format": "Parquet",
    "Batched": true,
    ...,
    "DataFilters": "[CASE WHEN PULocationID#28 IN (236,132,161) THEN true ELSE isnotnull(PULocationID#28) END]"
},

Specifically the lists are not quoted & escaped, which means when we read out the field it's treated as a string rather than a JSON native array. Ideally we would parse this ourselves and upsert it so we can control that formatting, but obviously there's a risk of the parsing going wrong and impacting something that actually uses the field. Leaning slightly towards keeping the formatting as-is in favour of not touching existing fields but happy to hear any other thoughts on this...

// An extension of how Spark translates `SparkPlan`s to `SparkPlanInfo`, see here:
// https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala#L54
public class Spark213PlanUtils extends AbstractSparkPlanUtils {
public Map<String, String> extractPlanProduct(TreeNode plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenLineage connector we had a special facet for storing serialized LogicalPlan of the query. This was the most problematic feature we ever had. Because the plan can contain everything. For example, if a user creates in memory few gigabyte dataframe, then this becomes a node in a logical plan. And OpenLineage connector tried to serliaze it and failed the whole Spark driver.

This PR seems to be doing same thing for the physical plan. I think we shouldn't serialize the object when we don't know what's inside.

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted about this over a call, summarizing for posterity:

  • Worth clarifying that this function does not traverse the tree itself; we leave that up to Spark because we instrument the recursive fromSparkPlan method
  • We should avoid serializing anything we don't know about arbitrarily, especially using toString(). Since we are taking the full product of the TreeNode we could get some enormous structure (e.g. improbable, but maybe an array of all the data) and toString() would then attempt to serialize all of that data
    • Instead we should lean solely on simpleString() which is safe by default and default to not serializing otherwise. We could then only serialize other TreeNodes and leave out any unknown or unexpected data structures
    • With this change it would even be safe to parse the child QueryPlan nodes because it would no longer output the long physical plan, and instead print the one line string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 177 to 185
// Parse any nested objects to a standard form that ignores the keys
// Right now we do this by just asserting the key set and none of the values
static Object parseNestedMetaObject(Object value) {
if (value instanceof Map) {
return value.keySet()
} else {
return value
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was driving me nuts - there must be a better way to accomplish this without a ton of additional code... The issue is that for the Spark32 suite of tests, the expectations for the meta fields use named keys, but when we run the tests using Scala 2.12 we expect those to all show up as _dd.unknown_key.*. I added a (not great) way around that in assertSQLPlanEquals, which worked fine until we started getting nested maps that can have unknown keys. e.g.:

"meta": {
  "_dd.unparsed" : "any",
  "outputPartitioning" : {
    "HashPartitioning" : {
      "numPartitions" : 2,
      "expressions" : [ "string_col#28" ]
    }
  },
  "shuffleOrigin" : "ENSURE_REQUIREMENTS"
},

Where the numPartitions and expressions keys would show up as _dd.unknown_key.* in Scala 2.12. Initially I went for a recursive approach but that ended up feeling very bloated, so I abandoned it in favour of a subpar keyset check (i.e. only check that HashPartitioning exists in the map).

No false impressions that this is any good - let me know if there's a better way I'm missing, if just the key check is okay (only applies to the test suite running Scala 2.12/Spark 3.2.0, the other two suites compare everything as expected), or if we just have to put up with the recursive approach...

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the approach to compare lists of values instead of whatever I had put before - a bit cleaner and simpler to follow. Has its own downsides (e.g. not perfect comparisons as some stable keys are eliminated, and the containsAll comparison can be fooled) but at least it attempts to compare values and is much easier to maintain. Given it's on an older version of Scala that will no longer be supported for new Spark versions, I think this should probably be fine.

c68b356 (#9783)

Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the first round of reading and left some comments.
Pls let me know do you think about it.

@charlesmyu charlesmyu marked this pull request as ready for review October 21, 2025 21:20
@charlesmyu charlesmyu requested a review from a team as a code owner October 21, 2025 21:20
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. My primary concern is naturally to make sure this won't cause problems on any spark version nor any physical plans the job is processing.

I think PR does well to achieve this:

  • feature is going to be rolled out first to users which explicitly turn it on,
  • it serializes only known nodes (serializing unknown nodes is a common pitfall)
  • serializer is limited on recursion depth and max collection sizes
  • code introduced depends in a minimal way on Spark classes and methods, making it resilient to future updates on Spark side.

Few minor comments added. Happy to approve the PR once they're resolved.

assert res.toString() == "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]"
}

def "unknown objects should return null"() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating a test for this. I think this is really important.

// in Spark v3+, the signature of `simpleString` includes an int parameter for `maxFields`
return TreeNode.class
.getDeclaredMethod("simpleString", new Class[] {int.class})
.invoke(value, MAX_LENGTH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls make sure this doesn't throw NullPointerException in case getDeclaredMethod returns null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's true! Based on the signature of getDeclaredMethod it looks like we should expect NoSuchMethodException in that case:

public Method getDeclaredMethod(String name, Class<?>... parameterTypes) throws NoSuchMethodException, SecurityException

I've added NullPointerException to the catch just in case, though. 5527ad0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kidding, the spotbugs job did not like that - reverted that change. I'm fairly confident based on the signature & impl that we should only get NoSuchMethodException, though, and not NullPointerException. Let me know if we'd still like to do a more explicit null check.

18e51d5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If invoke returns null, our code will call toString on null causing NullPointerException.
Let me know if this is possible or never going to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, understood - you're right, I was looking at the wrong call. Updated to be an explicit cast. de336b9 (#9783)

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left first comments. I will continue the review later and let @mhlidd do the full review 😉

planInfo.simpleString(),
planInfo.children(),
HashMap.from(
JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()),
JavaConverters.asScala(planUtils.extractFormattedProduct(plan))),

Do we need to convert to a List first before converting to a HashMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

d4c8264

Comment on lines 110 to 111
args.$plus$plus(
JavaConverters.mapAsScalaMap(planUtils.extractFormattedProduct(plan))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need args here? It seems like it would always be an empty map, so there isn't a need to concatenate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment from above:

To be frank, I was struggling with this a lot - I was trying to convert scala.collection.mutable.Map to scala.collection.immutable.Map, but I didn't quite know how to do those with a lot of the Scala implicits. Updated now to use toMap instead (figured out how to get the <:< implicit sorted properly). Let me know if this looks better!

160558e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops didn't see the review earlier. Thanks for the update!

protected static assertStringSQLPlanSubset(String expectedString, String actualString) {
System.err.println("Checking if expected $expectedString SQL plan is a super set of $actualString")

protected static assertStringSQLPlanSubset(String expectedString, String actualString, String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a Util class that stores all Spark assertions? This way this test classes can be separated from the assertion definitions that are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you mean, but would you have an example of a util class that's similar to what you're looking for? My assumption is this would be useful if we decide to swap out the assertion framework used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it's just slightly cleaner! Or just in general if there are new versions of the instrumentation that we decide to support, we can have all the assertions in a util file and refer to it easily. Just a suggestion, not blocking.

This was my inspiration, but tailored for the Spark assertions that are being made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Okay that makes sense. There is logic that's being re-used in AbstractSpark24SqlTest by AbstractSpark32SqlTest already, and agreed it should really be in its own class. I would prefer to do that in a different PR though just to allow these set of changes to keep the status quo in the test files if that's okay.

Comment on lines +83 to +86
public static final String DATA_JOBS_PARSE_SPARK_PLAN_ENABLED =
"data.jobs.parse_spark_plan.enabled";
public static final String DATA_JOBS_EXPERIMENTAL_FEATURES_ENABLED =
"data.jobs.experimental_features.enabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be added to metadata/supported-configurations.json and documented in the Feature Parity Dashboard? I added some docs about this recently that can be referenced.

Copy link
Contributor Author

@charlesmyu charlesmyu Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (link), thanks for mentioning. I forgot about the env vars - would it be correct to assume that the env var name (e.g. DD_DATA_JOBS_PARSE_SPARK_PLAN_ENABLED) is inferred by the tracer when mapping it to the actual config in Config.java? Just curious since we don't explicitly define the env var keys anywhere else

d4c8264

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! There is a mapping that is defined here.

@charlesmyu charlesmyu requested a review from a team as a code owner October 27, 2025 19:09
@charlesmyu charlesmyu requested review from dougqh and removed request for a team October 27, 2025 19:09
Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments related to performances. Sorry for the split review but we’re having a summit with the department.
And thanks for your last updates following my comments 🙏

Copy link
Contributor

@mhlidd mhlidd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after addressing Bruce's comments! Thanks for the fixed! 🚀

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 praise: ‏Thanks for the follow up changes 👍

@charlesmyu charlesmyu merged commit 81ec538 into master Oct 28, 2025
535 checks passed
@charlesmyu charlesmyu deleted the charles.yu/djm-974/extract-spark-plan-product branch October 28, 2025 17:30
@github-actions github-actions bot added this to the 1.55.0 milestone Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants