-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…iases during AST conversion
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
@zabetak I'd greatly appreciate taking a second peek at this PR |
|
@konstantinb I will check tomorrow. Apologies for the delay but I was off for some time. |
| // Create schema that preserves base table columns with original alias, | ||
| // but gives new UDTF columns the unique lateral view alias | ||
| int baseFieldCount = tableFunctionSource.schema.size(); | ||
| List<RelDataTypeField> allOutputFields = tfs.getRowType().getFieldList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the syntax definition, a LATERAL VIEW is a virtual table with a user-defined table alias. Conceptually, every column that is in the output of the lateral view has the same table alias so I would expect that all columns in the same schema should have the same alias.
For all conversions, inside the ASTConverter we should distinguish the input schema(s) from the output schema. Both are very important for correctly and unambiguously constructing the AST/SQL query. For the lateral view case, input and output schema are somewhat mixed together and maybe they shouldn't. Some code inside the createASTLateralView method operates on the input schema and some other on the output schema. In other words, up to a certain point in the code, I think we could use the schema as is from the input/source and once we are done we could simply generate the output (new) schema using a new (generated) table alias. The idea is outlined on the comment below.
| LATERAL VIEW explode(val_array) lv1 AS first_val | ||
| LATERAL VIEW explode(val_array) lv2 AS second_val | ||
| WHERE first_val != second_val | ||
| ORDER BY first_val, second_val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the choice between SORT_QUERY_RESULTS and explicit ORDER BY in the query is somewhat subjective.
Both can avoid test flakiness and each has its own advantages & disadvantages.
Putting an ORDER BY in every query makes the tests more verbose and expands its scope. The plans will have more operators, EXPLAIN outputs will contain more info than strictly necessary, and potentially more rules will match/apply and affect the output plan. On the positive side, it is a native way to enforce sorted output and avoid potential test flakiness.
The SORT_QUERY_RESULTS applies to all queries inside the file and it is a post-processing step completely independent of the query execution. Test inputs/outputs are less verbose and flakiness does not interfere with the query execution and the actual testing scope.
Personally, for this case I feel that SORT_QUERY_RESULTS is a better choice but don't feel that strongly about it. I am OK to accept the ORDER BY approach if you prefer that. However, currently the test file contains both SORT_QUERY_RESULTS and ORDER BY clauses so we should remove one of them. I leave the final choice to you.
ql/src/test/queries/clientpositive/lateral_view_cbo_ppd_filter_loss.q
Outdated
Show resolved
Hide resolved
ql/src/test/queries/clientpositive/lateral_view_cbo_ppd_filter_loss.q
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
Outdated
Show resolved
Hide resolved
ql/src/test/queries/clientpositive/lateral_view_cbo_ppd_filter_loss.q
Outdated
Show resolved
Hide resolved
- clear separation between input & output schemas during LV AST conversion - simplified and minimized test queries
…ccurately show result accuracy after the fix
|
zabetak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konstantinb I am waiting a final confirmation from your side regarding the changes that I pushed but from my side everything looks good and this PR is ready to go in!
@zabetak thank you very much for the modifications. I was sure I had already removed the no longer used schema constructor; my apologies. Your refactoring is almost a 1:1 match with an intermediate working variant I had; I was concerned that the signature change of createASTLateralView() plus an early return in convertSource() might be harder to follow and could be frowned upon. I am fully comfortable with your refactoring, thank you! |
|
@konstantinb Many thanks for the PR and your thorough analysis and explanations here and under the JIRA ticket. |



What changes were proposed in this pull request?
HIVE-29084: Proposing changes to ASTConverter's logic of tableAlias assignment for Lateral View Queries
Why are the changes needed?
Before these changes, ASTConverter used to assign the base table alias as the tableAlias of all columns of the query tree. Technically, LV columns are "separate" tables participating in an implicit join. Therefore, PPD processing considered filters with conditions between table columns and LV columns as conditions on the columns of the same table.
The following condition:
hive/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java
Line 262 in 5dddb6e
made these expressions considered "pushable candidates", while the subsequent processing logic has no knowledge on how to optimize/convert/process such expressions, so they are ultimately discarded during the LateralViewJoinerPPD.removeAllCandidates() call
A very simple query to confirm the bug is
Does this PR introduce any user-facing change?
No
How was this patch tested?