-
Notifications
You must be signed in to change notification settings - Fork 976
Align decimal dtypes in predicate before conditional join #20060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align decimal dtypes in predicate before conditional join #20060
Conversation
Signed-off-by: Matthew Murray <[email protected]>
/ok to test d559636 |
/ok to test 8ff793a |
/ok to test d8b24c9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, I think this makes sense. Assuming libcudf doesn't support comparisons between decimals and floats here, then I don't think we have a better option than casting the predicates (and just the predicates, not the columns), and that will look a bit messy.
Do we need to add tests for this, or is it already covered?
plc.traits.is_fixed_point(left.plc_type) | ||
and plc.traits.is_fixed_point(right.plc_type) | ||
): | ||
raise ValueError("Requires inputs to be decimal types.") # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test that exercises this directly (not through the polars API) and remove the pragma
? By directly using
DataType.common_decimal_type(DataType(pl.Float64()), DataType(pl.Float64())
and the combinations with one, but not two decimals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, but we only need to test for the raise
(casted := col.astype(target)) | ||
and Column(casted.obj, dtype=casted.dtype, name=col.name) | ||
if (target := casts.get(col.name)) is not None | ||
else Column(col.obj, dtype=col.dtype, name=col.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably me being thick, but I'm struggling with this.
I think the main thing confusing me is the casted := col.astype(target) and COlumn(casted.obj, dtype=casted.dtype, name=col.name
). I can't figure out what the and
is doing there :)
columns = []
for col in df.columns:
if target := casts.get(col.name) is not None:
casted = col.astype(obj)
columns.append(Column(casted.obj, dtype=casted.dtype, name=col.name))
else:
columns.append(Column(col.obj, dtype=col.dtype, name=col.name)
# or just columns.append(col) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to do something cute with the walrus-and trick just to keep it in a one-liner since we know col.astype(...)
is True, but I agree it’s confusing and not worth it. :)
/ok to test 623ab7c |
/ok to test 176654a |
/ok to test 3752341 |
|
||
if ( | ||
left_type.id() != target.id() or left_type.scale() != target.scale() | ||
): # pragma: no cover; no test yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reviewer: We need this cast for Q11, to work. But I haven't been able to reproduce the failure outside of Q11 (in an actual test). I'm planning on leaving this for now and following up with a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #20213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this looks nice. It does a good job of scoping these special casts to the spots where they're needed.
/ok to test 6136d0d |
/merge |
Description
Contributes to #19939.
Closes #20056. I don't think supporting Cast nodes is something we want to support in libcudf AST. So I think this PR which adds casts ahead of time is the best we can do at the moment.
Checklist