-
Notifications
You must be signed in to change notification settings - Fork 979
Avoid accessing range values in cudf::strings::contains_re logic #20122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
Benchmark results show up to 3x improvement
|
}; | ||
|
||
template <positional P> | ||
struct reljunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm missing something about this name :D
Is it rel_junk? If yes, why junk, and if not, what does it stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is re_lj_unk
where the re
is regex , lj
is long something, and unk
is unknown.
The name is from code that this is based on so I'm mostly keeping parity since I don't have a better one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought it might be unknown, but could not come up with any meaning for the lj part. Thanks!
Do we need a test for this? |
This is not a new function but an improvement for |
Description
Improves the performance for
cudf::strings::contains_re
by only setting necessary state values during matching.The regex state includes positional values where the match occurs (2 ints). These values are not needed by contains and so do not need to be written or read to/from memory. This saves some memory access overhead in the regex state engine.
Checklist