-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: plan-time SQL expression simplifying #19311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theirix
| where | ||
| T: TryFrom<ScalarValue, Error = DataFusionError>, | ||
| { | ||
| match context.sql_to_expr(expr.clone(), &Arc::clone(schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a lot of ceremony to do this process
However, it also feels like this function is non trivial to use itself (it has 4 parameters and a type parameter)
I wonder if we added a method like SessionContext::simplify to invoke the simplifier more easily
Then extracting an expr as a particular type might look like
let expr = ...; // Expr to simplity
// cast to target type and simplify
let cast_expr = SessionContext::simplify(expr.cast(target_type)?);
// Check if it simplified to a type
let Expr::Literal(scalar) = cast_expr else {
// error parsing
};
// convertThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Its purpose is to simplify expression and convert to a native type at the same time, hence the five parameters and the things behind the scenes (simplify, coerce, arrow cast and arrow-to-native cast) to shift it from client code.
For other usages of the optimiser's method - I agree that simplification could be done more easily in the client code via facade methods in context, as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if there's a way to compact this signature; for example, having both T and target_type seem redundant since they'd need to be compatible anyway, but off the top of my head I can't think of a way to combine them 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
I've found a way to use an ArrowPrimitiveType and derive a native type from this trait. Now it looks much more compact with three parameters.
| } | ||
| } | ||
|
|
||
| /// Parse a SQL expression as a numeric value (supports basic arithmetic). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geoffreyclaude I wonder what your thoughts on this PR and the approach of parse_sql_literal?
|
Don't we have this already at the logical and physical expression layers? |
@adriangb please correct me if I'm wrong, but since we are accessing SQL ast, we cannot have it at a higher level, especially physical planning. The proposed function won't be used in higher-level layers, but only for planning. |
I think the question is if we already have an optimizer rule to simplify expressions (including evaluating literals), what benefit does this PR bring to do it at the SQL planning level? |
Yes that was my thought as well |
Does it only work if the optimizer is fed with an expression These lines with logic for AST expression rely on having a numeric as a physical value - check if it is in For other cases, of course, optimizer rules can properly work with expressions, and this function won't be needed. |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the use case for this now, though the plumbing required does seem a bit gnarly 🤔
| let example: ExampleKind = std::env::args() | ||
| .nth(1) | ||
| .ok_or_else(|| DataFusionError::Execution(format!("Missing argument. {usage}")))? | ||
| .unwrap_or_else(|| ExampleKind::All.to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem consistent with the other example groups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this change shouldn't belong to this PR. I wanted to keep the original behaviour of running tests via classic cargo run --example relation_planner compared to cargo run --example relation_planner -- all. If it makes sense, I can make a consistent change for all examples in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate issue/PR for this sounds good
| where | ||
| T: TryFrom<ScalarValue, Error = DataFusionError>, | ||
| { | ||
| match context.sql_to_expr(expr.clone(), &Arc::clone(schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if there's a way to compact this signature; for example, having both T and target_type seem redundant since they'd need to be compatible anyway, but off the top of my head I can't think of a way to combine them 🤔
Which issue does this PR close?
Rationale for this change
Introduce a module for parsing and simplification of an SQL expression to a literal of a given type.
This module provides functionality to parse and simplify static SQL expressions
used in SQL constructs like
FROM TABLE SAMPLE (10 + 50 * 2). If they are requiredin a planning (not an execution) phase, they need to be reduced to literals of a given type.
What changes are included in this PR?
cargo run --package datafusion-examples --example relation_plannerinstead of
cargo run --package datafusion-examples --example relation_planner -- allAre these changes tested?
Are there any user-facing changes?
A new API is provided