-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38211][table-planner] Refactor explain for MultiJoin #26926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gustavodemorais looks like a planner test is failing in the are you have changed. |
e0510eb
to
266e5bd
Compare
I've rebased it twice already so I'll rebase it after getting a review |
postJoinFilter != null) | ||
.item("select", String.join(",", getRowType().getFieldNames())) | ||
.item("rowType", getRowType()) | ||
.item("outputRowType", getRowType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make sure that this is insync with other explains, I would suggest to keep it rowType
} ], | ||
"outputType" : "ROW<`user_id` VARCHAR(2147483647), `name` VARCHAR(2147483647), `user_id0` VARCHAR(2147483647), `order_id` VARCHAR(2147483647), `user_id1` VARCHAR(2147483647), `payment_id` VARCHAR(2147483647)>", | ||
"description" : "MultiJoin(joinFilter=[true], joinTypes=[[INNER, LEFT, LEFT]], joinConditions=[[true, =($0, $2), =($0, $4)]], joinAttributeMap=[{1=[LeftInputId:0;LeftFieldIndex:0;RightInputId:1;RightFieldIndex:0;], 2=[LeftInputId:0;LeftFieldIndex:0;RightInputId:2;RightFieldIndex:0;]}], select=[user_id,name,user_id0,order_id,user_id1,payment_id], rowType=[RecordType(VARCHAR(2147483647) user_id, VARCHAR(2147483647) name, VARCHAR(2147483647) user_id0, VARCHAR(2147483647) order_id, VARCHAR(2147483647) user_id1, VARCHAR(2147483647) payment_id)])" | ||
"description" : "MultiJoin(commonJoinKey=[user_id], joinTypes=[INNER, LEFT, LEFT], inputUniqueKeys=[noUniqueKey, noUniqueKey, noUniqueKey], joinConditions=[true, (user_id = user_id0), (user_id = user_id1)], joinFilter=[true], select=[user_id,name,user_id0,order_id,user_id1,payment_id], outputRowType=[RecordType(VARCHAR(2147483647) user_id, VARCHAR(2147483647) name, VARCHAR(2147483647) user_id0, VARCHAR(2147483647) order_id, VARCHAR(2147483647) user_id1, VARCHAR(2147483647) payment_id)])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove joinFilter=[true]
.collect(Collectors.joining(", "))) | ||
.item("inputUniqueKeys", formatInputUniqueKeysWithFieldNames()) | ||
.item("joinConditions", formatJoinConditionsWithFieldNames(pw)) | ||
.itemIf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove as it is contained in joinConditions
already
266e5bd
to
e45fc24
Compare
Thanks for the review, @twalthr! I've addressed your comments. I've also moved the only single plan test that was outside the MultiJoinTest file so that it's simpler to regenerate plans in the future. (The test wasn't changed). |
On another note, do you think it makes sense to remove the position 0 from all arrays related to joins? Right now, for N inputs, all arrays in the explain are of length N. However, for N inputs, we have N - 1 joins. The first position of all arrays in the MultiJoin node related to joins are a special case. Each position in the array is related to the join between inputs
Calcite creates as a default value for pos 0. The same thing with the joinTypes, the first position is always INNER. In general, I think it either makes sense keep the position 0 like above for all attributes so they have the same length or remove for it only for the join-related attributes. What do you think? |
I agree to simplify the explain here. A binary join should be:
So |
I've updated both joinTypes and joinConditions to skip the first position. It looks cleaner 👍 I've also moved up stateTtlHints to the beginning of the list next to inputUniqueKeys, since it's also related to inputs and is a short and a more relevant information than the lengthy outputType. I think we have everything now. Can you take a look @twalthr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
84242ac
to
1182bbc
Compare
1182bbc
to
278ef9b
Compare
What is the purpose of the change
It'd be useful to have some changes to the explain to make debugging and looking at the job graph easier:
Brief change log
Verifying this change
Plan and restore tests updated (this is ok because the operator is in an experimental state)
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation