Skip to content

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Sep 26, 2025

Description

Improves the performance for cudf::strings::contains_re by only setting necessary state values during matching.
The regex state includes positional values where the match occurs (2 ints). These values are not needed by contains and so do not need to be written or read to/from memory. This saves some memory access overhead in the regex state engine.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt self-assigned this Sep 26, 2025
@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 26, 2025
Copy link

copy-pr-bot bot commented Sep 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@davidwendt
Copy link
Contributor Author

/ok to test

@davidwendt davidwendt changed the title Improve performance of contains_re Avoid accessing range values in cudf::strings::contains_re logic Sep 26, 2025
@davidwendt
Copy link
Contributor Author

Benchmark results show up to 3x improvement

## [0] NVIDIA RTX A6000

| row_width | num_rows | hit_rate | pattern |   Ref Time |   Cmp Time |           Diff |   %Diff |   x  |
|-----------|----------|----------|---------|------------|------------|----------------|---------|------|
|    32     |  32768   |    50    |    0    |  84.880 us |  66.226 us |     -18.654 us | -21.98% | 1.28 |
|    64     |  32768   |    50    |    0    |  82.817 us |  70.426 us |     -12.391 us | -14.96% | 1.18 |
|    128    |  32768   |    50    |    0    |  85.015 us |  72.737 us |     -12.279 us | -14.44% | 1.17 |
|    256    |  32768   |    50    |    0    |  89.628 us |  75.436 us |     -14.192 us | -15.83% | 1.19 |
|    32     |  262144  |    50    |    0    | 367.757 us | 237.065 us |    -130.692 us | -35.54% | 1.55 |
|    64     |  262144  |    50    |    0    | 472.409 us | 262.342 us |    -210.067 us | -44.47% | 1.80 |
|    128    |  262144  |    50    |    0    | 617.181 us | 332.456 us |    -284.724 us | -46.13% | 1.86 |
|    256    |  262144  |    50    |    0    | 675.445 us | 348.191 us |    -327.254 us | -48.45% | 1.94 |
|    32     | 2097152  |    50    |    0    |   2.549 ms |   1.555 ms |    -993.455 us | -38.98% | 1.64 |
|    64     | 2097152  |    50    |    0    |   3.668 ms |   1.689 ms |   -1978.677 us | -53.94% | 2.17 |
|    128    | 2097152  |    50    |    0    |   4.828 ms |   2.056 ms |   -2771.683 us | -57.41% | 2.35 |
|    256    | 2097152  |    50    |    0    |   5.154 ms |   2.162 ms |   -2992.024 us | -58.06% | 2.38 |
|    32     |  32768   |   100    |    0    |  57.110 us |  48.758 us |      -8.352 us | -14.62% | 1.17 |
|    64     |  32768   |   100    |    0    |  62.079 us |  52.679 us |      -9.401 us | -15.14% | 1.18 |
|    128    |  32768   |   100    |    0    |  64.030 us |  55.888 us |      -8.142 us | -12.72% | 1.15 |
|    256    |  32768   |   100    |    0    |  66.813 us |  57.991 us |      -8.822 us | -13.20% | 1.15 |
|    32     |  262144  |   100    |    0    | 249.820 us | 162.416 us |     -87.404 us | -34.99% | 1.54 |
|    64     |  262144  |   100    |    0    | 329.970 us | 196.571 us |    -133.399 us | -40.43% | 1.68 |
|    128    |  262144  |   100    |    0    | 494.158 us | 300.581 us |    -193.576 us | -39.17% | 1.64 |
|    256    |  262144  |   100    |    0    | 586.054 us | 347.662 us |    -238.392 us | -40.68% | 1.69 |
|    32     | 2097152  |   100    |    0    |   1.898 ms |   1.034 ms |    -863.843 us | -45.52% | 1.84 |
|    64     | 2097152  |   100    |    0    |   2.630 ms |   1.203 ms |   -1427.396 us | -54.26% | 2.19 |
|    128    | 2097152  |   100    |    0    |   3.939 ms |   2.174 ms |   -1765.225 us | -44.81% | 1.81 |
|    256    | 2097152  |   100    |    0    |   4.593 ms |   2.606 ms |   -1987.563 us | -43.27% | 1.76 |
|    32     |  32768   |    50    |    1    | 342.395 us | 219.890 us |    -122.505 us | -35.78% | 1.56 |
|    64     |  32768   |    50    |    1    | 636.982 us | 472.793 us |    -164.190 us | -25.78% | 1.35 |
|    128    |  32768   |    50    |    1    |   1.469 ms |   1.110 ms |    -359.114 us | -24.44% | 1.32 |
|    256    |  32768   |    50    |    1    |   2.874 ms |   2.119 ms |    -755.516 us | -26.28% | 1.36 |
|    32     |  262144  |    50    |    1    |   2.852 ms |   1.062 ms |   -1789.506 us | -62.75% | 2.69 |
|    64     |  262144  |    50    |    1    |   7.796 ms |   2.459 ms |   -5337.048 us | -68.46% | 3.17 |
|    128    |  262144  |    50    |    1    |  19.808 ms |   6.483 ms |  -13325.432 us | -67.27% | 3.06 |
|    256    |  262144  |    50    |    1    |  42.777 ms |  13.644 ms |  -29133.846 us | -68.11% | 3.14 |
|    32     | 2097152  |    50    |    1    |  23.584 ms |   7.517 ms |  -16066.176 us | -68.12% | 3.14 |
|    64     | 2097152  |    50    |    1    |  63.071 ms |  17.623 ms |  -45448.241 us | -72.06% | 3.58 |
|    128    | 2097152  |    50    |    1    | 164.530 ms |  45.835 ms | -118695.093 us | -72.14% | 3.59 |
|    256    | 2097152  |    50    |    1    | 357.181 ms | 100.121 ms | -257060.029 us | -71.97% | 3.57 |
|    32     |  32768   |   100    |    1    | 213.582 us | 158.035 us |     -55.548 us | -26.01% | 1.35 |
|    64     |  32768   |   100    |    1    | 408.740 us | 305.672 us |    -103.068 us | -25.22% | 1.34 |
|    128    |  32768   |   100    |    1    | 864.712 us | 642.476 us |    -222.236 us | -25.70% | 1.35 |
|    256    |  32768   |   100    |    1    |   1.792 ms |   1.272 ms |    -520.539 us | -29.04% | 1.41 |
|    32     |  262144  |   100    |    1    |   1.551 ms | 734.618 us |    -816.644 us | -52.64% | 2.11 |
|    64     |  262144  |   100    |    1    |   4.048 ms |   1.689 ms |   -2358.850 us | -58.28% | 2.40 |
|    128    |  262144  |   100    |    1    |  10.417 ms |   4.877 ms |   -5539.981 us | -53.18% | 2.14 |
|    256    |  262144  |   100    |    1    |  23.177 ms |  10.853 ms |  -12324.289 us | -53.17% | 2.14 |
|    32     | 2097152  |   100    |    1    |  12.294 ms |   5.432 ms |   -6862.297 us | -55.82% | 2.26 |
|    64     | 2097152  |   100    |    1    |  32.860 ms |  12.002 ms |  -20857.893 us | -63.47% | 2.74 |
|    128    | 2097152  |   100    |    1    |  86.033 ms |  37.852 ms |  -48180.712 us | -56.00% | 2.27 |
|    256    | 2097152  |   100    |    1    | 189.889 ms |  84.857 ms | -105031.737 us | -55.31% | 2.24 |
|    32     |  32768   |    50    |    2    |  54.310 us |  47.961 us |      -6.349 us | -11.69% | 1.13 |
|    64     |  32768   |    50    |    2    |  81.201 us |  74.307 us |      -6.894 us |  -8.49% | 1.09 |
|    128    |  32768   |    50    |    2    | 138.416 us | 120.827 us |     -17.589 us | -12.71% | 1.15 |
|    256    |  32768   |    50    |    2    | 235.375 us | 213.360 us |     -22.015 us |  -9.35% | 1.10 |
|    32     |  262144  |    50    |    2    | 230.575 us | 160.672 us |     -69.903 us | -30.32% | 1.44 |
|    64     |  262144  |    50    |    2    | 437.275 us | 270.724 us |    -166.551 us | -38.09% | 1.62 |
|    128    |  262144  |    50    |    2    | 804.462 us | 551.866 us |    -252.597 us | -31.40% | 1.46 |
|    256    |  262144  |    50    |    2    |   1.312 ms | 933.396 us |    -378.736 us | -28.86% | 1.41 |
|    32     | 2097152  |    50    |    2    |   1.591 ms | 990.718 us |    -600.409 us | -37.73% | 1.61 |
|    64     | 2097152  |    50    |    2    |   2.852 ms |   1.725 ms |   -1126.971 us | -39.52% | 1.65 |
|    128    | 2097152  |    50    |    2    |   5.190 ms |   3.371 ms |   -1819.045 us | -35.05% | 1.54 |
|    256    | 2097152  |    50    |    2    |   8.220 ms |   6.027 ms |   -2192.869 us | -26.68% | 1.36 |
|    32     |  32768   |   100    |    2    |  46.847 us |  41.622 us |      -5.225 us | -11.15% | 1.13 |
|    64     |  32768   |   100    |    2    |  51.117 us |  45.970 us |      -5.147 us | -10.07% | 1.11 |
|    128    |  32768   |   100    |    2    |  58.063 us |  52.556 us |      -5.507 us |  -9.48% | 1.10 |
|    256    |  32768   |   100    |    2    |  59.112 us |  53.186 us |      -5.926 us | -10.03% | 1.11 |
|    32     |  262144  |   100    |    2    | 187.527 us | 131.309 us |     -56.218 us | -29.98% | 1.43 |
|    64     |  262144  |   100    |    2    | 266.731 us | 164.929 us |    -101.802 us | -38.17% | 1.62 |
|    128    |  262144  |   100    |    2    | 443.275 us | 296.642 us |    -146.633 us | -33.08% | 1.49 |
|    256    |  262144  |   100    |    2    | 542.102 us | 357.287 us |    -184.815 us | -34.09% | 1.52 |
|    32     | 2097152  |   100    |    2    |   1.378 ms | 798.423 us |    -579.813 us | -42.07% | 1.73 |
|    64     | 2097152  |   100    |    2    |   2.057 ms |   1.030 ms |   -1027.071 us | -49.94% | 2.00 |
|    128    | 2097152  |   100    |    2    |   3.195 ms |   2.065 ms |   -1129.494 us | -35.35% | 1.55 |
|    256    | 2097152  |   100    |    2    |   3.973 ms |   2.530 ms |   -1443.464 us | -36.33% | 1.57 |

@davidwendt davidwendt marked this pull request as ready for review October 3, 2025 13:52
@davidwendt davidwendt requested a review from a team as a code owner October 3, 2025 13:52
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 14, 2025
};

template <positional P>
struct reljunk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm missing something about this name :D
Is it rel_junk? If yes, why junk, and if not, what does it stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is re_lj_unk where the re is regex , lj is long something, and unk is unknown.
The name is from code that this is based on so I'm mostly keeping parity since I don't have a better one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought it might be unknown, but could not come up with any meaning for the lj part. Thanks!

@mhaseeb123
Copy link
Member

Do we need a test for this?

@davidwendt
Copy link
Contributor Author

Do we need a test for this?

This is not a new function but an improvement for contains_re so the existing tests passing is verification.
The main difference is the performance boost which comes from the existing benchmarks with results posted above.

@GregoryKimball
Copy link
Contributor

Here is a closer look at the results from Pattern 1 posted above:
image

Great to see much better scaling behavior at higher row counts for the new implementation!

@GregoryKimball GregoryKimball moved this to Burndown in libcudf Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

Status: Burndown

Development

Successfully merging this pull request may close these issues.

4 participants