Skip to content

Conversation

@theirix
Copy link
Contributor

@theirix theirix commented Dec 13, 2025

Which issue does this PR close?

Rationale for this change

Introduce a module for parsing and simplification of an SQL expression to a literal of a given type.

This module provides functionality to parse and simplify static SQL expressions
used in SQL constructs like FROM TABLE SAMPLE (10 + 50 * 2). If they are required
in a planning (not an execution) phase, they need to be reduced to literals of a given type.

What changes are included in this PR?

  1. New module with documentation and unit tests
  2. A table sample example is switched to use this module for SQL parsing
  3. A small fix-up to run
    cargo run --package datafusion-examples --example relation_planner
    instead of
    cargo run --package datafusion-examples --example relation_planner -- all

Are these changes tested?

  • Unit tests
  • Table sample example runs

Are there any user-facing changes?

A new API is provided

@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules labels Dec 13, 2025
@github-actions github-actions bot added the core Core DataFusion crate label Dec 13, 2025
@theirix theirix marked this pull request as ready for review December 13, 2025 23:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @theirix

where
T: TryFrom<ScalarValue, Error = DataFusionError>,
{
match context.sql_to_expr(expr.clone(), &Arc::clone(schema)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a lot of ceremony to do this process

However, it also feels like this function is non trivial to use itself (it has 4 parameters and a type parameter)

I wonder if we added a method like SessionContext::simplify to invoke the simplifier more easily

Then extracting an expr as a particular type might look like

let expr = ...; // Expr to simplity
// cast to target type and simplify
let cast_expr = SessionContext::simplify(expr.cast(target_type)?);

// Check if it simplified to a type
let Expr::Literal(scalar) = cast_expr  else {
  // error parsing
};

// convert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Its purpose is to simplify expression and convert to a native type at the same time, hence the five parameters and the things behind the scenes (simplify, coerce, arrow cast and arrow-to-native cast) to shift it from client code.

For other usages of the optimiser's method - I agree that simplification could be done more easily in the client code via facade methods in context, as you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if there's a way to compact this signature; for example, having both T and target_type seem redundant since they'd need to be compatible anyway, but off the top of my head I can't think of a way to combine them 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

I've found a way to use an ArrowPrimitiveType and derive a native type from this trait. Now it looks much more compact with three parameters.

}
}

/// Parse a SQL expression as a numeric value (supports basic arithmetic).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geoffreyclaude I wonder what your thoughts on this PR and the approach of parse_sql_literal?

@adriangb
Copy link
Contributor

Don't we have this already at the logical and physical expression layers?

@theirix
Copy link
Contributor Author

theirix commented Dec 19, 2025

Don't we have this already at the logical and physical expression layers?

@adriangb please correct me if I'm wrong, but since we are accessing SQL ast, we cannot have it at a higher level, especially physical planning. The proposed function won't be used in higher-level layers, but only for planning.

@Jefffrey
Copy link
Contributor

Don't we have this already at the logical and physical expression layers?

@adriangb please correct me if I'm wrong, but since we are accessing SQL ast, we cannot have it at a higher level, especially physical planning. The proposed function won't be used in higher-level layers, but only for planning.

I think the question is if we already have an optimizer rule to simplify expressions (including evaluating literals), what benefit does this PR bring to do it at the SQL planning level?

@adriangb
Copy link
Contributor

Don't we have this already at the logical and physical expression layers?

@adriangb please correct me if I'm wrong, but since we are accessing SQL ast, we cannot have it at a higher level, especially physical planning. The proposed function won't be used in higher-level layers, but only for planning.

I think the question is if we already have an optimizer rule to simplify expressions (including evaluating literals), what benefit does this PR bring to do it at the SQL planning level?

Yes that was my thought as well

@theirix
Copy link
Contributor Author

theirix commented Dec 24, 2025

Don't we have this already at the logical and physical expression layers?

@adriangb please correct me if I'm wrong, but since we are accessing SQL ast, we cannot have it at a higher level, especially physical planning. The proposed function won't be used in higher-level layers, but only for planning.

I think the question is if we already have an optimizer rule to simplify expressions (including evaluating literals), what benefit does this PR bring to do it at the SQL planning level?

Does it only work if the optimizer is fed with an expression datafusion_expr::Expr?

These lines with logic for AST expression rely on having a numeric as a physical value - check if it is in 0..1 range, float or int, etc. We don't pass it as an expression for the optimizer to process later.

For other cases, of course, optimizer rules can properly work with expressions, and this function won't be needed.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the use case for this now, though the plumbing required does seem a bit gnarly 🤔

let example: ExampleKind = std::env::args()
.nth(1)
.ok_or_else(|| DataFusionError::Execution(format!("Missing argument. {usage}")))?
.unwrap_or_else(|| ExampleKind::All.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem consistent with the other example groups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this change shouldn't belong to this PR. I wanted to keep the original behaviour of running tests via classic cargo run --example relation_planner compared to cargo run --example relation_planner -- all. If it makes sense, I can make a consistent change for all examples in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate issue/PR for this sounds good

where
T: TryFrom<ScalarValue, Error = DataFusionError>,
{
match context.sql_to_expr(expr.clone(), &Arc::clone(schema)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if there's a way to compact this signature; for example, having both T and target_type seem redundant since they'd need to be compatible anyway, but off the top of my head I can't think of a way to combine them 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: plan-time SQL expression simplifying

4 participants