[WIP] Preparser #52085

srielau · 2025-08-20T16:20:53Z

What changes were proposed in this pull request?

We are proposing to greatly expand the availability of named parameter markers to virtually all places literals are allowed.
The way to do this is by adding a pre-processing parser which's sole purpose is to find named parameter markers and replace them with literals without having to impact the grammar or analyzer profoundly.

Why are the changes needed?

Users have shown a desire to parameterize not only expressions in queries, but also DDL scripts and utility commands.
Many such uses, such as COMMENT and TBLPROPERTIES do not get processed by Analyzer rules.

Does this PR introduce any user-facing change?

Yes, it expands the usage of named parameter markers.

How was this patch tested?

Numerous new testcases have been written. Existing testcases protect against regression.
We also have added an internal legacy config for safety to revert to teh original behavior.
Forthat reason teh original Rule based substitution cod remains avaiable.

Was this patch authored or co-authored using generative AI tooling?

Yes, it was co-authored by cursor.

This commit moves the core parameter substitution parser implementation from the parmsubstitution branch to the preparser branch: - SubstituteParamsParser: Main parser for substituting parameter markers in SQL - SubstituteParmsAstBuilder: AST builder for extracting parameter locations The usage code (BindParameters rule) remains in the parmsubstitution branch as requested, only the core parser implementation is moved.

This commit implements parameter substitution as a pre-processing step in the SparkSqlParser, allowing parameter markers to be substituted before the main SQL parser executes. Key changes: 1. Added ParameterContext classes and ThreadLocal management for passing parameter values to the parser 2. Modified SparkSqlParser to perform parameter substitution before variable substitution and main parsing 3. Updated SparkSession sql methods to use new parameter substitution approach via ThreadLocal context The flow is now: 1. SparkSession sets parameter context in ThreadLocal 2. SparkSqlParser detects and substitutes parameter markers 3. Variable substitution happens on the result 4. Main ANTLR parsing proceeds with fully substituted SQL This eliminates the need for ParameterizedQuery LogicalPlan nodes and moves parameter handling to the parsing phase where it belongs.

- Remove unused Literal import - Fix trailing whitespace issues

Added fully qualified class names for: - SubstituteParamsParser - ThreadLocalParameterContext - ParameterContext - SubstitutionRule - NamedParameterContext - PositionalParameterContext This should resolve the compilation issues preventing parameter substitution from working.

This will help us understand why parameter substitution is not working: - Added debug logging in parse() method to show parameter context - Added debug logging in substituteParametersIfNeeded() to trace execution - Will help identify if context is being set or if substitution is failing

Removed trailing whitespace from lines in the parameter substitution integration code. Future commits will ensure all files are free of trailing whitespace.

- Broke long import statement across multiple lines (imports should be < 100 chars) - Split long ThreadLocalParameterContext.get() call across lines - All lines now comply with 100-character limit This establishes the coding style guideline: - Maximum line length: 100 characters - Break long imports and method calls appropriately - Use proper indentation for continuation lines

Created CODING_STYLE_GUIDELINES.md with rules for: - Maximum line length: 100 characters - No trailing whitespace - Proper import statement formatting - Method definition formatting - Pattern matching formatting - Pre-commit checks and tools - Examples of common fixes This establishes consistent standards for all future code changes and provides clear guidelines for maintaining code quality.

The parameter substitution integration has been verified to work correctly: - Named parameters are properly detected and substituted - Parameter values are correctly converted to SQL literals - Integration flow: SparkSession -> ThreadLocal -> SparkSqlParser -> SubstituteParamsParser Example working usage: spark.sql("create or replace view v(c1) AS VALUES (:parm)", Map("parm" -> "hello")) Successfully substitutes :parm with 'hello' before main parsing.

CODING_STYLE_GUIDELINES.md

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

…d named, unnamed errors.

'

This file was not related to the parameter substitution feature and should not be part of this branch.

- Remove unused PositionMapper import from SQLExecution.scala - Remove unused PositionMapper and SQLQueryContext imports from SparkSqlParser.scala - Fix import grouping in SparkSqlParser.scala to comply with scalastyle These changes resolve compilation errors introduced during the translateSqlContext method deduplication refactoring.

srielau added 9 commits August 18, 2025 10:07

Fix linting errors in ParameterContext

a3bdba4

- Remove unused Literal import - Fix trailing whitespace issues

Fix trailing whitespace in SparkSqlParser

a9ad316

Removed trailing whitespace from lines in the parameter substitution integration code. Future commits will ensure all files are free of trailing whitespace.

srielau changed the title ~~Preparser~~ [WIP] Preparser Aug 20, 2025

github-actions bot added the SQL label Aug 20, 2025

Pre-parser

d9a53a8

srielau force-pushed the preparser branch from 7ac6cc7 to d9a53a8 Compare August 20, 2025 17:01

srielau added 2 commits August 20, 2025 10:31

Pre-parser

1445b20

Fix bugs

b30a9c6

srielau force-pushed the preparser branch 2 times, most recently from 816a26f to e415f4f Compare August 20, 2025 19:05

cloud-fan reviewed Aug 21, 2025

View reviewed changes

CODING_STYLE_GUIDELINES.md Outdated Show resolved Hide resolved

cloud-fan reviewed Aug 21, 2025

View reviewed changes

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 Outdated Show resolved Hide resolved

srielau force-pushed the preparser branch from e415f4f to 490ebf8 Compare August 21, 2025 02:53

cloud-fan reviewed Aug 21, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Show resolved Hide resolved

srielau force-pushed the preparser branch from 490ebf8 to 6cd67fa Compare August 21, 2025 04:24

Merge branch 'preparser' of github.com:srielau/spark into preparser

7bb899f

srielau force-pushed the preparser branch from 6cd67fa to 7bb899f Compare August 21, 2025 05:53

srielau added 6 commits August 21, 2025 08:55

More fixes

f941450

Fix empty parameter list

0ced11d

Fix execute immediate

17cbb06

Fix parser errors

3bd0102

Fix more tests

b9f7a52

More fixes

e491061

srielau added 30 commits August 21, 2025 18:36

Fixes

4e1a5f1

Fix EXECUTE IMMEDIATE --- again

0124ba7

More fixes

c673ca6

Duplicate parm name support

76a1cd8

FIx EXECUTE IMMEDIAYTE USING clause with variable references and mixe…

354bf04

…d named, unnamed errors.

Add testcases

f8d0955

Fix final tests

c3b80bf

More fixes

794e744

Fix new testcases and soem sprk connect stuff

aabd827

'

NERF internal errors

3ae2b73

Cleanup Phase 1

39eb86e

UnifiedParameterHandler

d28659e

Phase 3

f643321

Phase 3

3d0b3cc

position mapper

548ec19

Fix compile error

83d92db

Rework error mapping suite.

43666a6

Remove unnecessary CODING_STYLE_GUIDELINES.md

099bfcb

This file was not related to the parameter substitution feature and should not be part of this branch.

interrnal code review

f9f6de9

Fix syntax error mismatches

350f074

Remove log files test_results.log and parameters_test.log

e72c2c4

More cleanup

b6daa1c

Fix more syntax error diffs

5e4135c

simplify grammar

1370d72

Improve position mapper

92c498f

Refine some erorr context work

70dc7f3

More fixes

de3e427

Fix execute immediate error position mapping and pesky SET CATALOG

7d243e5

Changes to execute-immediate and soem rework

3f39216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Preparser #52085

[WIP] Preparser #52085

srielau commented Aug 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] Preparser #52085

Are you sure you want to change the base?

[WIP] Preparser #52085

Conversation

srielau commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srielau commented Aug 20, 2025 •

edited

Loading