Skip to content

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Aug 23, 2025

What changes were proposed in this pull request?

This PR adds unit tests for truncate(...) partition transforms after SPARK-40295. Currently, truncate-based partitioning is not yet supported by SPJ. Therefore, this PR only covers non-SPJ scenarios (e.g. correctness and query results).

Why are the changes needed?

To validate truncate transforms are working as expected in Iceberg tables and Spark queries. Full SPJ support for truncate requires SPARK-50593. Once that issue is resolved, we can extend the tests to cover shuffle reduction with SPJ enabled.

@github-actions github-actions bot added the spark label Aug 23, 2025
@slfan1989
Copy link
Contributor Author

@huaxingao Could you please help review this PR? Many thanks! While checking the SPJ unit tests, I noticed a TODO: add tests for truncate transforms once SPARK-40295 is released. SPARK-40295 has already been released, but after verification I found that truncate is still not fully supported in SPJ scenarios. Full support will only be available after SPARK-50593. I have added a new unit test to validate truncate and updated the corresponding TODO description.

@slfan1989 slfan1989 changed the title Spark. Add truncate transform tests (non-SPJ only). Spark 4.0: Add truncate transform tests (non-SPJ only). Aug 23, 2025

// TODO: add tests for truncate transforms once SPARK-40295 is released
// TODO: Truncate is not supported by SPJ yet (even after SPARK-40295).
// Add tests for full support once SPARK-50593 is resolved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we change to something like

// TODO: SPJ does not currently leverage truncate(...) partition transforms for partition alignment.
// SPARK-40295 improved related areas, but full truncate support is tracked in SPARK-50593.
// This test documents current behavior; update/extend once SPARK-50593 lands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion! I will improve the code accordingly.

sql(
"CREATE TABLE %s (id BIGINT, int_col INT, dep STRING) "
+ "USING iceberg "
+ "PARTITIONED BY (truncate(4, dep)) "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name says ...IncompatibleTruncateSpecs but both tables use the same partitioning (truncate(4, dep)). Could you change the test name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your suggestion and have renamed the method to testJoinWithTruncatePartitioning.

sql("INSERT INTO %s VALUES (3L, 300, 'software')", tableName(OTHER_TABLE_NAME));

assertPartitioningAwarePlan(
3, /* expected num of shuffles with SPJ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a comment

// TODO(SPARK-50593): Once truncate transforms are leveraged by SPJ, expected shuffles with SPJ should drop to 1. Update expectedNumShufflesWithSPJ accordingly.

- Renamed method from testJoinsWithIncompatibleTruncateSpecs to testJoinsWithTruncatePartitioning
  since both tables use the same truncate(4, dep).

- Added TODO comment referencing SPARK-40295 and SPARK-50593 to document current limitations
  and indicate future improvements.
// This test documents current behavior; update/extend once SPARK-50593 lands.
@TestTemplate
public void testJoinWithTruncatePartitioning() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, I will improve this code.

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sql("INSERT INTO %s VALUES (2L, 200, 'software')", tableName(OTHER_TABLE_NAME));
sql("INSERT INTO %s VALUES (3L, 300, 'software')", tableName(OTHER_TABLE_NAME));

// TODO(SPARK-50593): Once truncate transforms are leveraged by SPJ, expected shuffles with SPJ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what we gain from the tests prior to 50593 going in. I would probably hold off on this unless we think we are getting some other coverage here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback! I understand your point and agree to some extent. However, I believe we should at least improve the comments to indicate that, even though SPARK-40295 is ready, we still need to wait for SPARK-50593 before we can proceed with this TODO.

Let me briefly explain the reason for this improvement:

When I first discovered this TODO, I checked SPARK-40295 and found that it was ready. I then tested it locally and found that, even with SPARK-40295 implemented, truncate still cannot utilize Spark's SPJ features. My initial thought was to directly modify the TODO, but I was concerned that a review might inquire about the issue I encountered. So, I decided to document my verification tests here. Currently, this test can only verify that, under SPARK-40295, truncate still does not support SPJ.

@slfan1989
Copy link
Contributor Author

Thank you very much to @huaxingao , @singhpk234 , and @RussellSpitzer for reviewing this PR and for your time.

The comment below is no longer accurate, and I personally think it’s necessary to modify it.

// TODO: add tests for truncate transforms once SPARK-40295 is released

However, I was also thinking that submitting a PR just to modify a single line of comment might seem a bit trivial.

If I can find another relevant change, I’d be happy to include that as well.

Additionally, I’ve added the JIRA link for SPARK-50593 in SPARK-40295 so that other developers can more easily find the related issue description.

@slfan1989
Copy link
Contributor Author

After SPARK-50593 is addressed, it would be better to update and improve this unit test. Therefore, I will close this PR for now and follow up on SPARK-50593, with the aim of completing the support for the truncate function in SPJ scenarios. Thanks again to @huaxingao , @singhpk234 , and @RussellSpitzer for your attention and comments!

@slfan1989 slfan1989 closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants