Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update" #51497

sryza · 2025-07-15T15:53:07Z

This reverts commit 8b43757.

What changes were proposed in this pull request?

Reverts SPARK-52576. I.e. truncates + alters instead of drop + recreate, for materialized views and full refreshes.

Why are the changes needed?

Some pipeline runs result in wiping out and replacing all the data for a table:

Every run of a materialized view
Runs of streaming tables that have the "full refresh" flag

Prior to SPARK-52576, this "wipe out and replace" was implemented by:

Truncating the table
Altering the table to drop/update/add columns that don't match the columns in the DataFrame for the current run

However, we discovered that this didn't work on Hive. So we moved to drop + recreate, which did work on Hive. However, compared to truncate + alter, drop + recreate has some undesirable effects. E.g. it interrupts readers of the table and wipes away things like ACLs.

This Hive behavior was fixed here: #51007.

So now we can switch back to truncate + alter.

Does this PR introduce any user-facing change?

Yes, described above

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

This reverts commit 8b43757.

dongjoon-hyun · 2025-07-15T15:59:43Z

cc @szehon-ho , too

szehon-ho

Great it works

szehon-ho · 2025-07-15T18:25:57Z

One minor note, the Hive behavior to allow the replace column is actually with #51373

cloud-fan · 2025-07-15T18:31:54Z

thanks, merging to master!

This reverts commit 8b43757. ### What changes were proposed in this pull request? Reverts SPARK-52576. I.e. truncates + alters instead of drop + recreate, for materialized views and full refreshes. ### Why are the changes needed? Some pipeline runs result in wiping out and replacing all the data for a table: - Every run of a materialized view - Runs of streaming tables that have the "full refresh" flag Prior to SPARK-52576, this "wipe out and replace" was implemented by: - Truncating the table - Altering the table to drop/update/add columns that don't match the columns in the DataFrame for the current run However, we discovered that this didn't work on Hive. So we moved to drop + recreate, which did work on Hive. However, compared to truncate + alter, drop + recreate has some undesirable effects. E.g. it interrupts readers of the table and wipes away things like ACLs. This Hive behavior was fixed here: apache#51007. So now we can switch back to truncate + alter. ### Does this PR introduce _any_ user-facing change? Yes, described above ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? Closes apache#51497 from sryza/revert-drop-recreate. Authored-by: Sandy Ryza <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update"

c62c4ed

This reverts commit 8b43757.

sryza requested review from gengliangwang and cloud-fan July 15, 2025 15:53

github-actions bot added the SQL label Jul 15, 2025

szehon-ho approved these changes Jul 15, 2025

View reviewed changes

dongjoon-hyun approved these changes Jul 15, 2025

View reviewed changes

cloud-fan approved these changes Jul 15, 2025

View reviewed changes

cloud-fan closed this in eaf2017 Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update" #51497

Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update" #51497

Uh oh!

sryza commented Jul 15, 2025

Uh oh!

dongjoon-hyun commented Jul 15, 2025

Uh oh!

szehon-ho left a comment

Uh oh!

szehon-ho commented Jul 15, 2025

Uh oh!

cloud-fan commented Jul 15, 2025

Uh oh!

Uh oh!

Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update" #51497

Revert "[SPARK-52576][SDP] Drop/recreate on full refresh and MV update" #51497

Uh oh!

Conversation

sryza commented Jul 15, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Jul 15, 2025

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Jul 15, 2025

Uh oh!

cloud-fan commented Jul 15, 2025

Uh oh!

Uh oh!