HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014

konstantinb · 2025-08-07T19:41:19Z

What changes were proposed in this pull request?

HIVE-29084: Proposing changes to ASTConverter's logic of tableAlias assignment for Lateral View Queries

Why are the changes needed?

Before these changes, ASTConverter used to assign the base table alias as the tableAlias of all columns of the query tree. Technically, LV columns are "separate" tables participating in an implicit join. Therefore, PPD processing considered filters with conditions between table columns and LV columns as conditions on the columns of the same table.
The following condition:

hive/ql/src/java/org/apache/hadoop/hive/ql/ppd/ExprWalkerProcFactory.java

Line 262 in 5dddb6e

} else if (!chAlias.equalsIgnoreCase(alias)) {

made these expressions considered "pushable candidates", while the subsequent processing logic has no knowledge on how to optimize/convert/process such expressions, so they are ultimately discarded during the LateralViewJoinerPPD.removeAllCandidates() call

A very simple query to confirm the bug is

SELECT t.key, t.value, lv.col
FROM (SELECT '238' AS key, 'val_238' AS value) t
LATERAL VIEW explode(array('238', '86', '311')) lv AS col
WHERE t.key = '333' OR lv.col = '86'
ORDER BY t.key, lv.col;

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested locally primarily with TestMiniLlapLocalCliDriver
Applied the same patch to a custom Hive implementation based on Hive 4.0.1, confirmed the accuracy of the results of impacted queries after the tix

…iases during AST conversion

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

konstantinb · 2025-08-29T23:05:48Z

@zabetak I'd greatly appreciate taking a second peek at this PR

zabetak · 2025-09-01T14:02:42Z

@konstantinb I will check tomorrow. Apologies for the delay but I was off for some time.

zabetak · 2025-08-18T13:57:23Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java

+      // Create schema that preserves base table columns with original alias,
+      // but gives new UDTF columns the unique lateral view alias
+      int baseFieldCount = tableFunctionSource.schema.size();
+      List<RelDataTypeField> allOutputFields = tfs.getRowType().getFieldList();


From the syntax definition, a LATERAL VIEW is a virtual table with a user-defined table alias. Conceptually, every column that is in the output of the lateral view has the same table alias so I would expect that all columns in the same schema should have the same alias.

For all conversions, inside the ASTConverter we should distinguish the input schema(s) from the output schema. Both are very important for correctly and unambiguously constructing the AST/SQL query. For the lateral view case, input and output schema are somewhat mixed together and maybe they shouldn't. Some code inside the createASTLateralView method operates on the input schema and some other on the output schema. In other words, up to a certain point in the code, I think we could use the schema as is from the input/source and once we are done we could simply generate the output (new) schema using a new (generated) table alias. The idea is outlined on the comment below.

zabetak · 2025-09-02T10:10:04Z

ql/src/test/queries/clientpositive/lateral_view_cartesian_test.q

+LATERAL VIEW explode(val_array) lv1 AS first_val
+LATERAL VIEW explode(val_array) lv2 AS second_val
+WHERE first_val != second_val
+ORDER BY first_val, second_val;


I guess the choice between SORT_QUERY_RESULTS and explicit ORDER BY in the query is somewhat subjective.
Both can avoid test flakiness and each has its own advantages & disadvantages.

Putting an ORDER BY in every query makes the tests more verbose and expands its scope. The plans will have more operators, EXPLAIN outputs will contain more info than strictly necessary, and potentially more rules will match/apply and affect the output plan. On the positive side, it is a native way to enforce sorted output and avoid potential test flakiness.

The SORT_QUERY_RESULTS applies to all queries inside the file and it is a post-processing step completely independent of the query execution. Test inputs/outputs are less verbose and flakiness does not interfere with the query execution and the actual testing scope.

Personally, for this case I feel that SORT_QUERY_RESULTS is a better choice but don't feel that strongly about it. I am OK to accept the ORDER BY approach if you prefer that. However, currently the test file contains both SORT_QUERY_RESULTS and ORDER BY clauses so we should remove one of them. I leave the final choice to you.

ql/src/test/queries/clientpositive/lateral_view_cbo_ppd_filter_loss.q

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java

ql/src/test/queries/clientpositive/lateral_view_cbo_ppd_filter_loss.q

- clear separation between input & output schemas during LV AST conversion - simplified and minimized test queries

…ccurately show result accuracy after the fix

…ploit QueryBlockInfo

sonarqubecloud · 2025-09-11T16:43:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zabetak

@konstantinb I am waiting a final confirmation from your side regarding the changes that I pushed but from my side everything looks good and this PR is ready to go in!

konstantinb · 2025-09-12T22:03:30Z

@konstantinb I am waiting a final confirmation from your side regarding the changes that I pushed but from my side everything looks good and this PR is ready to go in!

@zabetak thank you very much for the modifications. I was sure I had already removed the no longer used schema constructor; my apologies.

Your refactoring is almost a 1:1 match with an intermediate working variant I had; I was concerned that the signature change of createASTLateralView() plus an early return in convertSource() might be harder to follow and could be frowned upon. I am fully comfortable with your refactoring, thank you!

zabetak · 2025-09-15T11:36:33Z

@konstantinb Many thanks for the PR and your thorough analysis and explanations here and under the JIRA ticket.

HIVE-29084: test files confirming the bug

75b8eaa

asf-ci-hive added tests pending tests unstable and removed tests pending labels Aug 7, 2025

HIVE-29084: a proposed change to assign LV field columns different al…

73f64d4

…iases during AST conversion

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Aug 8, 2025

HIVE-29084: correcting lineage2.q.out to match the alias changes

682b9d8

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Aug 8, 2025

HIVE-29084: trigger re-testing

9057349

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Aug 9, 2025

HIVE-29084: trigger additional CI run

b883d8c

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Aug 11, 2025

HIVE-29084: comprehensive test queries

0cc5941

asf-ci-hive added tests pending and removed tests passed labels Aug 12, 2025

konstantinb marked this pull request as ready for review August 12, 2025 22:05

asf-ci-hive added tests unstable and removed tests pending labels Aug 13, 2025

asf-ci-hive added tests pending tests passed and removed tests passed tests pending labels Aug 15, 2025

zabetak reviewed Sep 2, 2025

View reviewed changes

HIVE-29084: refactoring for PR feedback:

f413e34

- clear separation between input & output schemas during LV AST conversion - simplified and minimized test queries

asf-ci-hive added tests pending and removed tests passed labels Sep 8, 2025

HIVE-29084: the single LV query needed 2+ rows in the base table to a…

ee0bff6

…ccurately show result accuracy after the fix

konstantinb changed the title ~~HIVE-29084: ensuring different tableAlias values between the base table and LV columns to avoid dropping filters during PPD~~ HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion Sep 8, 2025

asf-ci-hive added tests unstable tests pending tests passed and removed tests pending tests unstable labels Sep 8, 2025

konstantinb requested a review from zabetak September 9, 2025 03:08

zabetak added 2 commits September 11, 2025 17:00

HIVE-29084: Refactor Schema generation in createASTLateralView and ex…

2eaabef

…ploit QueryBlockInfo

HIVE-29084: Remove unused Schema constructor

829ecf9

asf-ci-hive added tests pending and removed tests passed labels Sep 11, 2025

asf-ci-hive added tests passed and removed tests pending labels Sep 11, 2025

zabetak approved these changes Sep 12, 2025

View reviewed changes

zabetak merged commit 3b3c1cf into apache:master Sep 15, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014

HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014

Uh oh!

konstantinb commented Aug 7, 2025 •

edited

Loading

Uh oh!

konstantinb commented Aug 29, 2025

Uh oh!

zabetak commented Sep 1, 2025

Uh oh!

zabetak Aug 18, 2025

Uh oh!

zabetak Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 11, 2025

Uh oh!

zabetak left a comment

Uh oh!

konstantinb commented Sep 12, 2025

Uh oh!

Uh oh!

zabetak commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014

HIVE-29084: use nextAlias for the output schema of LV columns after AST Conversion #6014

Uh oh!

Conversation

konstantinb commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

konstantinb commented Aug 29, 2025

Uh oh!

zabetak commented Sep 1, 2025

Uh oh!

zabetak Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

zabetak Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 11, 2025

Quality Gate passed

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

konstantinb commented Sep 12, 2025

Uh oh!

Uh oh!

zabetak commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

konstantinb commented Aug 7, 2025 •

edited

Loading