-
Couldn't load subscription status.
- Fork 4.8k
HIVE-28280: SemanticException when querying VIEW with DISTINCT clause #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
51b5bc8 to
d399943
Compare
| case HiveProject hiveProject -> hiveProject; | ||
| case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject; | ||
| default -> throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode + ", rather than a HiveProject or a SingleRel with HiveProject as its child."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the error message clearer? Maybe something like "Could not obtain a HiveProject from " + relNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I have changed the message in the new commit
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = switch (Objects.requireNonNull(relNode)) { | ||
| case HiveProject hiveProject -> hiveProject; | ||
| case SingleRel singleRel when singleRel.getInput() instanceof HiveProject hiveProject -> hiveProject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to descend into SingleRels recursively, until a HiveProject can be found? Maybe the view definition could have, e.g., a Sort(Filter(Project(...))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually earlier I tried to produce a plan where the second node is not a Project but couldn't. But just to be safe I have updated the code to recursively check for the first Project through the SingleRels.
| throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = switch (Objects.requireNonNull(relNode)) { | ||
| case HiveProject hiveProject -> hiveProject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Sonar warns about wrong indentation for the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I noticed that but there's no good solution to it. In the latest commit it shouldn't complain as the code is:
private Optional<HiveProject> extractFirstProject(RelNode rel) {
return switch (rel) {
case HiveProject hiveProject -> Optional.of(hiveProject);
case SingleRel sr -> extractFirstProject(sr.getInput());
case null, default -> Optional.empty();
};
}
However, I would have liked to see the cases indented to:
private Optional<HiveProject> extractFirstProject(RelNode rel) {
return switch (rel) {
case HiveProject hiveProject -> Optional.of(hiveProject);
case SingleRel sr -> extractFirstProject(sr.getInput());
case null, default -> Optional.empty();
};
}
Actually in checkstyle/checkstyle.xml we have defined the indentation for cases to be 0:
<module name="Indentation">
<property name="basicOffset" value="2" />
<property name="caseIndent" value="0" />
<property name="throwsIndent" value="4" />
</module>
which is fine when you have the older switch-case statement, as it doesn't look too bad:
switch(...) {
case a: ...
case b: ...
}
But when you have a switch expression, like the example shown at the beginning of this comment (i.e., with a return), it makes more sense to treat it as a basicOffset and not a caseIndent in my opinion. But we cannot simply change the value of caseIndent as there are older style switch statements in the repo.
Ideally I would like to have a value of 2 for case indents. Also by default the cases should be indented as seen here: https://checkstyle.sourceforge.io/checks/misc/indentation.html#caseIndent
Sorry for the long write-up, but we shouldn't see checkstyle warnings in the latest commit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the long comment! I would support the motion to change caseIndent to 2. I guess that would be out-of-scope for this ticket. Until then I would prefer avoiding checkstyle warnings.
| } else { | ||
| throw new SemanticException("View " + subqAlias + " is corresponding to " | ||
| + relNode.toString() + ", rather than a HiveProject."); | ||
| HiveProject project = extractFirstProject(relNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a project and why the first project is important?
How do we use the project afterwards?
Does the code remain correct if we use a project that is not at the top of the plan?
Do we generally support views with LIMIT, ORDER BY, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the projects for authorization in HiveRelFieldTrimmer.trimFields here, and that's the only place where we use it. I believe that is why we need only the first project, that gives us the final list of fields.
I don't think we can use other projects in place of the top one, as the fields could be different.
I am not really sure if we officially support views with limits or order by, but right now it looks like we can support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The viewToProjectTableSchema has been introduced in HIVE-13095 to determine whether a user has the permission to access fields of a view. There's a comment in https://reviews.apache.org/r/43834/.
It could work if we expand the logic to all kinds of RelNodes. The code in trimFields(Project, ...) does not really use the project. The code could be applied to any kind of RelNode. The code would then work even if no project could be found (e.g., if the top level node is a join or union). The cost: adding a call to each of the trimFields methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the approach so that it would work with any RelNode. I will look into the failed tests tomorrow, but let me know what your thoughts are regarding the new approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update, @soumyakanti3578! I think we can simplify it a bit, see my review comment.
| List<RexNode> projects = relNodeToTableAndProjects.get(rel).right; | ||
| List<FieldSchema> tableAllCols = table.getAllCols(); | ||
| for (Ord<RexNode> ord : Ord.zip(projects)) { | ||
| if (fieldsUsed.get(ord.i)) { | ||
| columnAccessInfo.add(table.getCompleteName(), tableAllCols.get(ord.i).getName()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the projects. We could just iterate over the fields from the row type of the RelNode. That would also simplify the map, i.e., just a Map<RelNode, Table>, instead of a Map<RelNode, Pair<Table, List<RexNode>>>.
| RelNode input, | ||
| final ImmutableBitSet fieldsUsed, | ||
| Set<RelDataTypeField> extraFields) { | ||
| setColumnAccessInfoForViews(rel, fieldsUsed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a call for the root node somewhere as well.
f68ad6a to
4a1659c
Compare
|



What changes were proposed in this pull request?
Earlier we expected a view's logical plan to always have a
HiveProjectas its top node. But we can haveHiveSortLimittoo if the original view definition has a limit.We can add a case for
SingleReland check that itsgetInput()returns aHiveProject, and then the code can work as before.Why are the changes needed?
We get an error as described in https://issues.apache.org/jira/browse/HIVE-28280, https://issues.apache.org/jira/browse/HIVE-21163
Does this PR introduce any user-facing change?
No
How was this patch tested?