Skip to content

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Sep 22, 2025

Description

Contributes to #19939.

Closes #20056. I don't think supporting Cast nodes is something we want to support in libcudf AST. So I think this PR which adds casts ahead of time is the best we can do at the moment.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Sep 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Sep 22, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Sep 22, 2025
@Matt711 Matt711 changed the title Fea/polars/pdsh decimals Align decimal dtypes in predicate before conditional join Sep 22, 2025
@Matt711 Matt711 added bug Something isn't working non-breaking Non-breaking change labels Sep 22, 2025
@Matt711
Copy link
Contributor Author

Matt711 commented Sep 22, 2025

/ok to test d559636

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 29, 2025

/ok to test 8ff793a

@Matt711 Matt711 marked this pull request as ready for review September 30, 2025 20:33
@Matt711 Matt711 requested a review from a team as a code owner September 30, 2025 20:33
@Matt711 Matt711 requested review from vyasr and rjzamora September 30, 2025 20:33
@Matt711
Copy link
Contributor Author

Matt711 commented Oct 1, 2025

/ok to test d8b24c9

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level, I think this makes sense. Assuming libcudf doesn't support comparisons between decimals and floats here, then I don't think we have a better option than casting the predicates (and just the predicates, not the columns), and that will look a bit messy.

Do we need to add tests for this, or is it already covered?

plc.traits.is_fixed_point(left.plc_type)
and plc.traits.is_fixed_point(right.plc_type)
):
raise ValueError("Requires inputs to be decimal types.") # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test that exercises this directly (not through the polars API) and remove the pragma? By directly using

DataType.common_decimal_type(DataType(pl.Float64()), DataType(pl.Float64())

and the combinations with one, but not two decimals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but we only need to test for the raise

Comment on lines 1681 to 1684
(casted := col.astype(target))
and Column(casted.obj, dtype=casted.dtype, name=col.name)
if (target := casts.get(col.name)) is not None
else Column(col.obj, dtype=col.dtype, name=col.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably me being thick, but I'm struggling with this.

I think the main thing confusing me is the casted := col.astype(target) and COlumn(casted.obj, dtype=casted.dtype, name=col.name). I can't figure out what the and is doing there :)

columns = []
for col in df.columns:
    if target := casts.get(col.name) is not None:
        casted = col.astype(obj)
        columns.append(Column(casted.obj, dtype=casted.dtype, name=col.name))
    else:
        columns.append(Column(col.obj, dtype=col.dtype, name=col.name)
        # or just columns.append(col)  ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to do something cute with the walrus-and trick just to keep it in a one-liner since we know col.astype(...) is True, but I agree it’s confusing and not worth it. :)

@Matt711
Copy link
Contributor Author

Matt711 commented Oct 2, 2025

/ok to test 623ab7c

@Matt711
Copy link
Contributor Author

Matt711 commented Oct 7, 2025

/ok to test 176654a

@Matt711
Copy link
Contributor Author

Matt711 commented Oct 8, 2025

/ok to test 3752341

@Matt711 Matt711 requested a review from TomAugspurger October 8, 2025 17:20

if (
left_type.id() != target.id() or left_type.scale() != target.scale()
): # pragma: no cover; no test yet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reviewer: We need this cast for Q11, to work. But I haven't been able to reproduce the failure outside of Q11 (in an actual test). I'm planning on leaving this for now and following up with a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #20213

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this looks nice. It does a good job of scoping these special casts to the spots where they're needed.

@Matt711
Copy link
Contributor Author

Matt711 commented Oct 8, 2025

/ok to test 6136d0d

@Matt711
Copy link
Contributor Author

Matt711 commented Oct 8, 2025

/merge

@rapids-bot rapids-bot bot merged commit 01f4ad0 into rapidsai:branch-25.12 Oct 8, 2025
130 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Oct 8, 2025
@Matt711 Matt711 deleted the fea/polars/pdsh-decimals branch October 8, 2025 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf-polars Issues specific to cudf-polars non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Support Cast expressions/nodes in libcudf AST for conditional joins
2 participants