Skip to content

Conversation

haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Jul 13, 2025

Which issue does this PR close?

Rationale for this change

improve LiteralGuarantee to handle the case like
(a=1 AND b=1) OR (a=2 AND b=3) or (a IN ("foo", "bar") AND b = 5) OR (a IN ("bar") AND b=6)

What changes are included in this PR?

add the logical to extract (a=1 AND b=1) OR (a=2 AND b=3) to in_guarantee("a", [1, 2]), in_guarantee("b", [1, 3]);

  1. splits each disjunction into its constituent conjunctions and filters for equality operations
  2. the find_common_columns function that identifies columns present in all termsets
  3. iterates through common columns and builds guarantees

Are these changes tested?

yes, add some test case

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 13, 2025
@haohuaijin
Copy link
Contributor Author

cc @debajyoti-truefoundry @alamb

@alamb alamb changed the title feat: imporve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) feat: improve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3) Jul 14, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin -- this looks like a great start to me

I think we need a few more tests to show it doesn't incorrectly pick up literal guarantees for NOT IN / != terms, but otherwise I think it is good

@haohuaijin
Copy link
Contributor Author

haohuaijin commented Jul 15, 2025

Thanks fo you reviews @alamb , i address you comment in 89dc6be

@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

I am sorry @haohuaijin -- I will review this more carefully soon. I just need to sit down and think through the details to make sure it doesn't have any correctness problems

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @haohuaijin -- I reviewed the code and tests carefully and I think this PR looks good to me.

It is a very nice improvement

@alamb alamb added the performance Make DataFusion faster label Jul 21, 2025
@haohuaijin
Copy link
Contributor Author

Thanks for you reviews @alamb

@alamb alamb merged commit 3c95281 into apache:main Jul 23, 2025
27 checks passed
@haohuaijin haohuaijin deleted the hj/guarantee-optimize branch July 23, 2025 15:22
adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
…=2 AND b=3)` (apache#16762)

* feat: imporve LiteralGuarantee for the case like (a=1 AND b=1) OR (a=2 AND b=3)

* support inlist

* fmt and clippy

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Make DataFusion faster physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee)
2 participants