Skip to content

Conversation

YuryChebiryak
Copy link

@YuryChebiryak YuryChebiryak commented Mar 31, 2025

Goal: Add support for open-source Spark as a platform for on-premise spark clusters.
Changes: Most macros works the same way as for the Databricks or Snowflake platform. Main change is for hub, satellite and link macros where QUALIFY keyword was being used in the default implementation, which is currently not supported by open-source Spark (see this Jira which is open since 2020 https://issues.apache.org/jira/browse/SPARK-31561 ). As a work-around, we define a CTE named qualify_workaround to perform the windowing functions first (ROW_NUMBER, LAG) and then perform the select.

Testing:
Has been tested for Spark 3.5.0.

Next steps:
Once qualify keyword is supported by open-source Spark (eg if Databricks contributes back to the open-source community), the macros can be simplified greatly by invoking default implementations.

@YuryChebiryak YuryChebiryak changed the title Adds support to open-source Spark. Add support for open-source Spark. Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant