Skip to content

Conversation

@rich-t-kid-datadog
Copy link

@rich-t-kid-datadog rich-t-kid-datadog commented Jul 15, 2025

Which issue does this PR close?

This PR works towards closing the larger REE Epic

Rationale for this change

Add operations onto the REE datatype such as

  • Sum
  • Sum_checked
  • IS DISTINCT FROM
  • Max/Min
  • Distinct (Arrow already handles this apparently)

What changes are included in this PR?

Allows for REE columns to be used for the previously mentioned functions correctly and efficently.

Are these changes tested?

Yes, comprehensive tests have been added in arrow-ord/src/cmp.rs and arrow-arith/src/aggregate.rs:

  • Basic functionality tests: Same values, different values, mixed scenarios
  • Edge case tests: Empty arrays, single runs, all nulls, mixed nulls and values
  • Data type tests: Float64 and Timestamp types
  • Both operations: distinct and not_distinct operations
  • The tests verify that REE distinct operations work correctly without array expansion and handle null values properly.

Are there any user-facing changes?

Performance improvement: REE distinct operations are now much faster for datasets with repeated values
No API changes: Existing distinct() and not_distinct() functions work the same way but are now more efficient for REE arrays
No breaking changes: All existing functionality is preserved

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

…ncludes helper functions for expanding REE into logical represention
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 15, 2025
@rich-t-kid-datadog rich-t-kid-datadog force-pushed the baah/AggregationSupport branch from 4ded72e to 05fb3c0 Compare July 17, 2025 18:47
@rich-t-kid-datadog rich-t-kid-datadog force-pushed the baah/AggregationSupport branch from 05fb3c0 to 72bd81a Compare July 17, 2025 18:52
@rich-t-kid-datadog rich-t-kid-datadog force-pushed the baah/AggregationSupport branch from 5688ad3 to 9d00687 Compare July 24, 2025 15:01
@rich-t-kid-datadog rich-t-kid-datadog changed the title [Draft] implements Sum,sum_checked,min,max,is Distict,inverse for REE. Implements Sum,sum_checked,min,max,is Distict,inverse for REE. Jul 24, 2025
@rich-t-kid-datadog rich-t-kid-datadog marked this pull request as ready for review July 25, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants