-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[WIP] Preparser #52085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
srielau
wants to merge
49
commits into
apache:master
Choose a base branch
from
srielau:preparser
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[WIP] Preparser #52085
+4,096
−289
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit moves the core parameter substitution parser implementation from the parmsubstitution branch to the preparser branch: - SubstituteParamsParser: Main parser for substituting parameter markers in SQL - SubstituteParmsAstBuilder: AST builder for extracting parameter locations The usage code (BindParameters rule) remains in the parmsubstitution branch as requested, only the core parser implementation is moved.
This commit implements parameter substitution as a pre-processing step in the SparkSqlParser, allowing parameter markers to be substituted before the main SQL parser executes. Key changes: 1. Added ParameterContext classes and ThreadLocal management for passing parameter values to the parser 2. Modified SparkSqlParser to perform parameter substitution before variable substitution and main parsing 3. Updated SparkSession sql methods to use new parameter substitution approach via ThreadLocal context The flow is now: 1. SparkSession sets parameter context in ThreadLocal 2. SparkSqlParser detects and substitutes parameter markers 3. Variable substitution happens on the result 4. Main ANTLR parsing proceeds with fully substituted SQL This eliminates the need for ParameterizedQuery LogicalPlan nodes and moves parameter handling to the parsing phase where it belongs.
- Remove unused Literal import - Fix trailing whitespace issues
Added fully qualified class names for: - SubstituteParamsParser - ThreadLocalParameterContext - ParameterContext - SubstitutionRule - NamedParameterContext - PositionalParameterContext This should resolve the compilation issues preventing parameter substitution from working.
This will help us understand why parameter substitution is not working: - Added debug logging in parse() method to show parameter context - Added debug logging in substituteParametersIfNeeded() to trace execution - Will help identify if context is being set or if substitution is failing
Removed trailing whitespace from lines in the parameter substitution integration code. Future commits will ensure all files are free of trailing whitespace.
- Broke long import statement across multiple lines (imports should be < 100 chars) - Split long ThreadLocalParameterContext.get() call across lines - All lines now comply with 100-character limit This establishes the coding style guideline: - Maximum line length: 100 characters - Break long imports and method calls appropriately - Use proper indentation for continuation lines
Created CODING_STYLE_GUIDELINES.md with rules for: - Maximum line length: 100 characters - No trailing whitespace - Proper import statement formatting - Method definition formatting - Pattern matching formatting - Pre-commit checks and tools - Examples of common fixes This establishes consistent standards for all future code changes and provides clear guidelines for maintaining code quality.
The parameter substitution integration has been verified to work correctly: - Named parameters are properly detected and substituted - Parameter values are correctly converted to SQL literals - Integration flow: SparkSession -> ThreadLocal -> SparkSqlParser -> SubstituteParamsParser Example working usage: spark.sql("create or replace view v(c1) AS VALUES (:parm)", Map("parm" -> "hello")) Successfully substitutes :parm with 'hello' before main parsing.
816a26f
to
e415f4f
Compare
cloud-fan
reviewed
Aug 21, 2025
cloud-fan
reviewed
Aug 21, 2025
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Aug 21, 2025
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Show resolved
Hide resolved
…d named, unnamed errors.
This file was not related to the parameter substitution feature and should not be part of this branch.
- Remove unused PositionMapper import from SQLExecution.scala - Remove unused PositionMapper and SQLQueryContext imports from SparkSqlParser.scala - Fix import grouping in SparkSqlParser.scala to comply with scalastyle These changes resolve compilation errors introduced during the translateSqlContext method deduplication refactoring.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
We are proposing to greatly expand the availability of named parameter markers to virtually all places literals are allowed.
The way to do this is by adding a pre-processing parser which's sole purpose is to find named parameter markers and replace them with literals without having to impact the grammar or analyzer profoundly.
Why are the changes needed?
Users have shown a desire to parameterize not only expressions in queries, but also DDL scripts and utility commands.
Many such uses, such as COMMENT and TBLPROPERTIES do not get processed by Analyzer rules.
Does this PR introduce any user-facing change?
Yes, it expands the usage of named parameter markers.
How was this patch tested?
Numerous new testcases have been written. Existing testcases protect against regression.
We also have added an internal legacy config for safety to revert to teh original behavior.
Forthat reason teh original Rule based substitution cod remains avaiable.
Was this patch authored or co-authored using generative AI tooling?
Yes, it was co-authored by cursor.