-
Couldn't load subscription status.
- Fork 315
Extract SparkPlan product and append to trace #9783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract SparkPlan product and append to trace #9783
Conversation
…ort more types and use JSON arrays
|
🎯 Code Coverage 🔗 Commit SHA: 34528dc | Docs | Was this helpful? Give us feedback! |
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 50 metrics, 13 unstable metrics.
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.035 s) : 0, 1035414
Total [baseline] (8.651 s) : 0, 8651094
Agent [candidate] (1.018 s) : 0, 1017556
Total [candidate] (8.692 s) : 0, 8691892
section iast
Agent [baseline] (1.162 s) : 0, 1162425
Total [baseline] (9.358 s) : 0, 9358245
Agent [candidate] (1.155 s) : 0, 1155392
Total [candidate] (9.329 s) : 0, 9329250
gantt
title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.482 ms) : 0, 1482
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (705.996 ms) : 0, 705996
BytebuddyAgent [candidate] (692.638 ms) : 0, 692638
GlobalTracer [baseline] (245.939 ms) : 0, 245939
GlobalTracer [candidate] (241.598 ms) : 0, 241598
AppSec [baseline] (32.512 ms) : 0, 32512
AppSec [candidate] (32.616 ms) : 0, 32616
Debugger [baseline] (6.449 ms) : 0, 6449
Debugger [candidate] (6.409 ms) : 0, 6409
Remote Config [baseline] (706.205 µs) : 0, 706
Remote Config [candidate] (710.978 µs) : 0, 711
Telemetry [baseline] (15.241 ms) : 0, 15241
Telemetry [candidate] (9.324 ms) : 0, 9324
Flare Poller [baseline] (5.791 ms) : 0, 5791
Flare Poller [candidate] (11.599 ms) : 0, 11599
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (825.112 ms) : 0, 825112
BytebuddyAgent [candidate] (818.369 ms) : 0, 818369
GlobalTracer [baseline] (234.12 ms) : 0, 234120
GlobalTracer [candidate] (232.466 ms) : 0, 232466
AppSec [baseline] (28.867 ms) : 0, 28867
AppSec [candidate] (35.098 ms) : 0, 35098
Debugger [baseline] (6.05 ms) : 0, 6050
Debugger [candidate] (6.152 ms) : 0, 6152
Remote Config [baseline] (609.003 µs) : 0, 609
Remote Config [candidate] (618.619 µs) : 0, 619
Telemetry [baseline] (8.351 ms) : 0, 8351
Telemetry [candidate] (8.738 ms) : 0, 8738
Flare Poller [baseline] (4.167 ms) : 0, 4167
Flare Poller [candidate] (4.288 ms) : 0, 4288
IAST [baseline] (32.368 ms) : 0, 32368
IAST [candidate] (26.576 ms) : 0, 26576
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.035 s) : 0, 1034881
Total [baseline] (10.8 s) : 0, 10799541
Agent [candidate] (1.018 s) : 0, 1018116
Total [candidate] (10.745 s) : 0, 10745034
section appsec
Agent [baseline] (1.202 s) : 0, 1201614
Total [baseline] (10.909 s) : 0, 10909230
Agent [candidate] (1.201 s) : 0, 1201017
Total [candidate] (11.036 s) : 0, 11035667
section iast
Agent [baseline] (1.165 s) : 0, 1164556
Total [baseline] (11.114 s) : 0, 11113604
Agent [candidate] (1.158 s) : 0, 1157828
Total [candidate] (11.068 s) : 0, 11067745
section profiling
Agent [baseline] (1.181 s) : 0, 1180971
Total [baseline] (10.939 s) : 0, 10938766
Agent [candidate] (1.16 s) : 0, 1159841
Total [candidate] (11.035 s) : 0, 11035029
gantt
title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.478 ms) : 0, 1478
BytebuddyAgent [baseline] (704.498 ms) : 0, 704498
BytebuddyAgent [candidate] (694.254 ms) : 0, 694254
GlobalTracer [baseline] (246.576 ms) : 0, 246576
GlobalTracer [candidate] (241.98 ms) : 0, 241980
AppSec [baseline] (32.634 ms) : 0, 32634
AppSec [candidate] (32.356 ms) : 0, 32356
Debugger [baseline] (6.487 ms) : 0, 6487
Debugger [candidate] (6.447 ms) : 0, 6447
Remote Config [baseline] (679.018 µs) : 0, 679
Remote Config [candidate] (711.974 µs) : 0, 712
Telemetry [baseline] (13.214 ms) : 0, 13214
Telemetry [candidate] (9.377 ms) : 0, 9377
Flare Poller [baseline] (8.005 ms) : 0, 8005
Flare Poller [candidate] (10.306 ms) : 0, 10306
section appsec
crashtracking [baseline] (1.485 ms) : 0, 1485
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (725.281 ms) : 0, 725281
BytebuddyAgent [candidate] (722.555 ms) : 0, 722555
GlobalTracer [baseline] (235.838 ms) : 0, 235838
GlobalTracer [candidate] (235.852 ms) : 0, 235852
AppSec [baseline] (174.171 ms) : 0, 174171
AppSec [candidate] (175.764 ms) : 0, 175764
Debugger [baseline] (5.882 ms) : 0, 5882
Debugger [candidate] (6.18 ms) : 0, 6180
Remote Config [baseline] (622.151 µs) : 0, 622
Remote Config [candidate] (629.667 µs) : 0, 630
Telemetry [baseline] (8.367 ms) : 0, 8367
Telemetry [candidate] (8.418 ms) : 0, 8418
Flare Poller [baseline] (3.892 ms) : 0, 3892
Flare Poller [candidate] (3.882 ms) : 0, 3882
IAST [baseline] (24.943 ms) : 0, 24943
IAST [candidate] (25.018 ms) : 0, 25018
section iast
crashtracking [baseline] (1.47 ms) : 0, 1470
crashtracking [candidate] (1.466 ms) : 0, 1466
BytebuddyAgent [baseline] (826.435 ms) : 0, 826435
BytebuddyAgent [candidate] (820.163 ms) : 0, 820163
GlobalTracer [baseline] (234.35 ms) : 0, 234350
GlobalTracer [candidate] (232.738 ms) : 0, 232738
AppSec [baseline] (29.238 ms) : 0, 29238
AppSec [candidate] (35.181 ms) : 0, 35181
Debugger [baseline] (6.081 ms) : 0, 6081
Debugger [candidate] (6.161 ms) : 0, 6161
Remote Config [baseline] (605.309 µs) : 0, 605
Remote Config [candidate] (610.783 µs) : 0, 611
Telemetry [baseline] (8.444 ms) : 0, 8444
Telemetry [candidate] (8.721 ms) : 0, 8721
Flare Poller [baseline] (4.135 ms) : 0, 4135
Flare Poller [candidate] (4.294 ms) : 0, 4294
IAST [baseline] (32.52 ms) : 0, 32520
IAST [candidate] (26.747 ms) : 0, 26747
section profiling
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.427 ms) : 0, 1427
BytebuddyAgent [baseline] (731.323 ms) : 0, 731323
BytebuddyAgent [candidate] (720.541 ms) : 0, 720541
GlobalTracer [baseline] (221.8 ms) : 0, 221800
GlobalTracer [candidate] (217.239 ms) : 0, 217239
AppSec [baseline] (32.424 ms) : 0, 32424
AppSec [candidate] (32.247 ms) : 0, 32247
Debugger [baseline] (11.387 ms) : 0, 11387
Debugger [candidate] (6.507 ms) : 0, 6507
Remote Config [baseline] (721.676 µs) : 0, 722
Remote Config [candidate] (812.761 µs) : 0, 813
Telemetry [baseline] (11.399 ms) : 0, 11399
Telemetry [candidate] (16.134 ms) : 0, 16134
Flare Poller [baseline] (4.142 ms) : 0, 4142
Flare Poller [candidate] (4.066 ms) : 0, 4066
ProfilingAgent [baseline] (110.538 ms) : 0, 110538
ProfilingAgent [candidate] (107.923 ms) : 0, 107923
Profiling [baseline] (111.185 ms) : 0, 111185
Profiling [candidate] (108.809 ms) : 0, 108809
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 2 performance regressions! Performance is the same for 9 metrics, 12 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section baseline
no_agent (36.769 ms) : 36483, 37054
. : milestone, 36769,
appsec (48.797 ms) : 48377, 49216
. : milestone, 48797,
code_origins (43.683 ms) : 43298, 44067
. : milestone, 43683,
iast (44.293 ms) : 43906, 44680
. : milestone, 44293,
profiling (49.423 ms) : 48942, 49905
. : milestone, 49423,
tracing (45.157 ms) : 44773, 45542
. : milestone, 45157,
section candidate
no_agent (37.916 ms) : 37612, 38220
. : milestone, 37916,
appsec (49.035 ms) : 48605, 49465
. : milestone, 49035,
code_origins (44.17 ms) : 43791, 44550
. : milestone, 44170,
iast (46.768 ms) : 46357, 47179
. : milestone, 46768,
profiling (47.309 ms) : 46864, 47753
. : milestone, 47309,
tracing (44.269 ms) : 43887, 44651
. : milestone, 44269,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section baseline
no_agent (4.395 ms) : 4346, 4445
. : milestone, 4395,
iast (9.618 ms) : 9445, 9791
. : milestone, 9618,
iast_FULL (14.045 ms) : 13763, 14328
. : milestone, 14045,
iast_GLOBAL (10.878 ms) : 10685, 11072
. : milestone, 10878,
profiling (8.964 ms) : 8823, 9104
. : milestone, 8964,
tracing (7.9 ms) : 7776, 8024
. : milestone, 7900,
section candidate
no_agent (4.306 ms) : 4256, 4355
. : milestone, 4306,
iast (9.729 ms) : 9568, 9890
. : milestone, 9729,
iast_FULL (14.052 ms) : 13775, 14330
. : milestone, 14052,
iast_GLOBAL (10.992 ms) : 10794, 11190
. : milestone, 10992,
profiling (9.031 ms) : 8877, 9185
. : milestone, 9031,
tracing (7.704 ms) : 7595, 7814
. : milestone, 7704,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section baseline
no_agent (15.036 s) : 15036000, 15036000
. : milestone, 15036000,
appsec (14.874 s) : 14874000, 14874000
. : milestone, 14874000,
iast (18.504 s) : 18504000, 18504000
. : milestone, 18504000,
iast_GLOBAL (18.268 s) : 18268000, 18268000
. : milestone, 18268000,
profiling (15.27 s) : 15270000, 15270000
. : milestone, 15270000,
tracing (15.151 s) : 15151000, 15151000
. : milestone, 15151000,
section candidate
no_agent (14.912 s) : 14912000, 14912000
. : milestone, 14912000,
appsec (15.05 s) : 15050000, 15050000
. : milestone, 15050000,
iast (18.623 s) : 18623000, 18623000
. : milestone, 18623000,
iast_GLOBAL (17.987 s) : 17987000, 17987000
. : milestone, 17987000,
profiling (15.652 s) : 15652000, 15652000
. : milestone, 15652000,
tracing (15.23 s) : 15230000, 15230000
. : milestone, 15230000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~34528dca71, baseline=1.55.0-SNAPSHOT~dc6264eaa3
dateFormat X
axisFormat %s
section baseline
no_agent (1.479 ms) : 1467, 1490
. : milestone, 1479,
appsec (3.71 ms) : 3494, 3927
. : milestone, 3710,
iast (2.224 ms) : 2160, 2288
. : milestone, 2224,
iast_GLOBAL (2.254 ms) : 2190, 2318
. : milestone, 2254,
profiling (2.067 ms) : 2015, 2119
. : milestone, 2067,
tracing (2.048 ms) : 1998, 2098
. : milestone, 2048,
section candidate
no_agent (1.483 ms) : 1471, 1494
. : milestone, 1483,
appsec (3.724 ms) : 3507, 3940
. : milestone, 3724,
iast (2.209 ms) : 2146, 2273
. : milestone, 2209,
iast_GLOBAL (2.264 ms) : 2199, 2328
. : milestone, 2264,
profiling (2.049 ms) : 1998, 2100
. : milestone, 2049,
tracing (2.038 ms) : 1988, 2088
. : milestone, 2038,
|
dc41615 to
d9d6213
Compare
| // Should really only return valid JSON types (Array, Map, String, Boolean, Number, null) | ||
| public Object parsePlanProduct(Object value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that this method returns an Object instead of something definite like a JSON node (or even just a String). The end goal is to allow any JSON object (other than null, which we filter out) to be serialized into a string using writeObjectToString, and this seemed like the most straightforwards way to achieve that. There's probably some more idiomatic way I'm missing - happy to hear about it if anyone has ideas!
d9d6213 to
54ab1ad
Compare
54ab1ad to
0279fff
Compare
| public static void exit( | ||
| @Advice.Return(readOnly = false) SparkPlanInfo planInfo, | ||
| @Advice.Argument(0) SparkPlan plan) { | ||
| if (planInfo.metadata().size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using the existing metadata on the DataSourceScanExec nodes, we open ourselves to a bit of inconsistency in the JSON parsing:
"meta": {
"Format": "Parquet",
"Batched": true,
...,
"DataFilters": "[CASE WHEN PULocationID#28 IN (236,132,161) THEN true ELSE isnotnull(PULocationID#28) END]"
},
Specifically the lists are not quoted & escaped, which means when we read out the field it's treated as a string rather than a JSON native array. Ideally we would parse this ourselves and upsert it so we can control that formatting, but obviously there's a risk of the parsing going wrong and impacting something that actually uses the field. Leaning slightly towards keeping the formatting as-is in favour of not touching existing fields but happy to hear any other thoughts on this...
| // An extension of how Spark translates `SparkPlan`s to `SparkPlanInfo`, see here: | ||
| // https://github.com/apache/spark/blob/v3.5.0/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala#L54 | ||
| public class Spark213PlanUtils extends AbstractSparkPlanUtils { | ||
| public Map<String, String> extractPlanProduct(TreeNode plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OpenLineage connector we had a special facet for storing serialized LogicalPlan of the query. This was the most problematic feature we ever had. Because the plan can contain everything. For example, if a user creates in memory few gigabyte dataframe, then this becomes a node in a logical plan. And OpenLineage connector tried to serliaze it and failed the whole Spark driver.
This PR seems to be doing same thing for the physical plan. I think we shouldn't serialize the object when we don't know what's inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted about this over a call, summarizing for posterity:
- Worth clarifying that this function does not traverse the tree itself; we leave that up to Spark because we instrument the recursive
fromSparkPlanmethod - We should avoid serializing anything we don't know about arbitrarily, especially using
toString(). Since we are taking the full product of theTreeNodewe could get some enormous structure (e.g. improbable, but maybe an array of all the data) andtoString()would then attempt to serialize all of that data- Instead we should lean solely on
simpleString()which is safe by default and default to not serializing otherwise. We could then only serialize otherTreeNodes and leave out any unknown or unexpected data structures - With this change it would even be safe to parse the child
QueryPlannodes because it would no longer output the long physical plan, and instead print the one line string
- Instead we should lean solely on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Parse any nested objects to a standard form that ignores the keys | ||
| // Right now we do this by just asserting the key set and none of the values | ||
| static Object parseNestedMetaObject(Object value) { | ||
| if (value instanceof Map) { | ||
| return value.keySet() | ||
| } else { | ||
| return value | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was driving me nuts - there must be a better way to accomplish this without a ton of additional code... The issue is that for the Spark32 suite of tests, the expectations for the meta fields use named keys, but when we run the tests using Scala 2.12 we expect those to all show up as _dd.unknown_key.*. I added a (not great) way around that in assertSQLPlanEquals, which worked fine until we started getting nested maps that can have unknown keys. e.g.:
"meta": {
"_dd.unparsed" : "any",
"outputPartitioning" : {
"HashPartitioning" : {
"numPartitions" : 2,
"expressions" : [ "string_col#28" ]
}
},
"shuffleOrigin" : "ENSURE_REQUIREMENTS"
},
Where the numPartitions and expressions keys would show up as _dd.unknown_key.* in Scala 2.12. Initially I went for a recursive approach but that ended up feeling very bloated, so I abandoned it in favour of a subpar keyset check (i.e. only check that HashPartitioning exists in the map).
No false impressions that this is any good - let me know if there's a better way I'm missing, if just the key check is okay (only applies to the test suite running Scala 2.12/Spark 3.2.0, the other two suites compare everything as expected), or if we just have to put up with the recursive approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the approach to compare lists of values instead of whatever I had put before - a bit cleaner and simpler to follow. Has its own downsides (e.g. not perfect comparisons as some stable keys are eliminated, and the containsAll comparison can be fooled) but at least it attempts to compare values and is much easier to maintain. Given it's on an older version of Scala that will no longer be supported for new Spark versions, I think this should probably be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the first round of reading and left some comments.
Pls let me know do you think about it.
...entation/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanUtils.java
Outdated
Show resolved
Hide resolved
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Outdated
Show resolved
Hide resolved
...rk/spark_2.12/src/main/java/datadog/trace/instrumentation/spark/Spark212Instrumentation.java
Show resolved
Hide resolved
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Show resolved
Hide resolved
...on/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractDatadogSparkListener.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me. My primary concern is naturally to make sure this won't cause problems on any spark version nor any physical plans the job is processing.
I think PR does well to achieve this:
- feature is going to be rolled out first to users which explicitly turn it on,
- it serializes only known nodes (serializing unknown nodes is a common pitfall)
- serializer is limited on recursion depth and max collection sizes
- code introduced depends in a minimal way on Spark classes and methods, making it resilient to future updates on Spark side.
Few minor comments added. Happy to approve the PR once they're resolved.
...rk/spark_2.12/src/main/java/datadog/trace/instrumentation/spark/Spark212Instrumentation.java
Outdated
Show resolved
Hide resolved
| assert res.toString() == "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]" | ||
| } | ||
|
|
||
| def "unknown objects should return null"() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating a test for this. I think this is really important.
| // in Spark v3+, the signature of `simpleString` includes an int parameter for `maxFields` | ||
| return TreeNode.class | ||
| .getDeclaredMethod("simpleString", new Class[] {int.class}) | ||
| .invoke(value, MAX_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls make sure this doesn't throw NullPointerException in case getDeclaredMethod returns null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's true! Based on the signature of getDeclaredMethod it looks like we should expect NoSuchMethodException in that case:
public Method getDeclaredMethod(String name, Class<?>... parameterTypes) throws NoSuchMethodException, SecurityException
I've added NullPointerException to the catch just in case, though. 5527ad0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just kidding, the spotbugs job did not like that - reverted that change. I'm fairly confident based on the signature & impl that we should only get NoSuchMethodException, though, and not NullPointerException. Let me know if we'd still like to do a more explicit null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If invoke returns null, our code will call toString on null causing NullPointerException.
Let me know if this is possible or never going to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, understood - you're right, I was looking at the wrong call. Updated to be an explicit cast. de336b9 (#9783)
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Outdated
Show resolved
Hide resolved
This reverts commit 5527ad0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left first comments. I will continue the review later and let @mhlidd do the full review 😉
...rk/spark_2.12/src/main/java/datadog/trace/instrumentation/spark/Spark212Instrumentation.java
Outdated
Show resolved
Hide resolved
...ark/spark_2.12/src/main/java/datadog/trace/instrumentation/spark/Spark212PlanSerializer.java
Outdated
Show resolved
Hide resolved
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Outdated
Show resolved
Hide resolved
| planInfo.simpleString(), | ||
| planInfo.children(), | ||
| HashMap.from( | ||
| JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| JavaConverters.asScala(planUtils.extractFormattedProduct(plan)).toList()), | |
| JavaConverters.asScala(planUtils.extractFormattedProduct(plan))), |
Do we need to convert to a List first before converting to a HashMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated
| args.$plus$plus( | ||
| JavaConverters.mapAsScalaMap(planUtils.extractFormattedProduct(plan))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need args here? It seems like it would always be an empty map, so there isn't a need to concatenate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment from above:
To be frank, I was struggling with this a lot - I was trying to convert
scala.collection.mutable.Maptoscala.collection.immutable.Map, but I didn't quite know how to do those with a lot of the Scala implicits. Updated now to usetoMapinstead (figured out how to get the<:<implicit sorted properly). Let me know if this looks better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops didn't see the review earlier. Thanks for the update!
| protected static assertStringSQLPlanSubset(String expectedString, String actualString) { | ||
| System.err.println("Checking if expected $expectedString SQL plan is a super set of $actualString") | ||
|
|
||
| protected static assertStringSQLPlanSubset(String expectedString, String actualString, String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to create a Util class that stores all Spark assertions? This way this test classes can be separated from the assertion definitions that are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what you mean, but would you have an example of a util class that's similar to what you're looking for? My assumption is this would be useful if we decide to swap out the assertion framework used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it's just slightly cleaner! Or just in general if there are new versions of the instrumentation that we decide to support, we can have all the assertions in a util file and refer to it easily. Just a suggestion, not blocking.
This was my inspiration, but tailored for the Spark assertions that are being made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Okay that makes sense. There is logic that's being re-used in AbstractSpark24SqlTest by AbstractSpark32SqlTest already, and agreed it should really be in its own class. I would prefer to do that in a different PR though just to allow these set of changes to keep the status quo in the test files if that's okay.
| public static final String DATA_JOBS_PARSE_SPARK_PLAN_ENABLED = | ||
| "data.jobs.parse_spark_plan.enabled"; | ||
| public static final String DATA_JOBS_EXPERIMENTAL_FEATURES_ENABLED = | ||
| "data.jobs.experimental_features.enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be added to metadata/supported-configurations.json and documented in the Feature Parity Dashboard? I added some docs about this recently that can be referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added (link), thanks for mentioning. I forgot about the env vars - would it be correct to assume that the env var name (e.g. DD_DATA_JOBS_PARSE_SPARK_PLAN_ENABLED) is inferred by the tracer when mapping it to the actual config in Config.java? Just curious since we don't explicitly define the env var keys anywhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! There is a mapping that is defined here.
…ormat, add FF to supported-configurations.json properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comments related to performances. Sorry for the split review but we’re having a summit with the department.
And thanks for your last updates following my comments 🙏
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Show resolved
Hide resolved
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractSparkPlanSerializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after addressing Bruce's comments! Thanks for the fixed! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 praise: Thanks for the follow up changes 👍
What Does This Do
fromSparkPlanfunction to:planparameter into a map of String propertiesmetafield of returnedSparkPlanInfowith those propertiesSpark21XPlanUtilsclass with aextractPlanProductmethod that parses aSparkPlanobject and returns the properties as a <String, String> mapAbstractSparkPlanUtilsclass with aparsePlanProductmethod that parses the various Objects extracted byextractPlanProductto return a comprehensible string representationSpark21XPlanUtilstoJsonfunction inSparkSQLUtilsto write a JSON object if possible, otherwise just write a stringdd.data.jobs.experimental_features.enabled: meant to gate all experimental features before we GA, we should leave this on by default for all internal usersdd.data.jobs.parse_spark_plan.enabled: meant to gate this feature specificallyMotivation
The SparkPlan houses additional details about its execution that is useful to visualize for operators to use. Extract these into spans so they can be ingested.
Additional Notes
This PR leverages the existing
metafield in theSparkPlanInfoclass. This should be safe as we don't overwrite the field if any data exists, and it is currently only used forScanExecnode details. Furthermore since this class appears to be primarily intended as an abstraction for informational purposes, any faulty updates to the object shouldn't result in any breaking issues.Also note that we use the
ProductAPI to obtain the key names (usingproductElementName), however this was only made available in Scala 2.13. As a result the Scala 2.12 instrumentation uses arbitrary_dd.unknown_key.Xnames for the keys, so the values can at least be extracted.Worth mentioning that this PR does not introduce traversal of the physical plan itself into the tracer - this is left to Spark itself. This is because the recursive
fromSparkPlanmethod is instrumented, meaning as each node is built the tracer is invoked to parse it, and we expressly filter out any potentialQueryPlannodes when performing the parsing.Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: DJM-974