`<regex>`: Process greedy simple loops non-recursively #5790

muellerj2 · 2025-10-18T19:58:16Z

Towards #997 and #1528.

This implements processing of greedy simple loops in a non-recursive way. After this PR, we actually do greedy matching when greedy quantifiers are used. But in exchange, we have to store more frames in the _Frames vector. But I think we should now take the chance to actually process the regex as the drafter of the regex requested. (Before this PR, "more memory" would have meant "more stack", so the correct processing order would have resulted in many more stack overflows.)

The greedy processing is implemented as follows:

In _N_rep, we first try to match the repeated pattern, if the maximum number of reps is greater than zero. We also push a frame with opcode _Loop_simple_greedy in this case if the minimum number of reps is zero. If the maximum number of reps is zero, we instead try to match the remainder of the regex.
In _N_end_rep, we also first try to match the repeated pattern, if the maximum number of reps hasn't been reached yet, and push a frame with opcode _Loop_simple_greedy if the minimum number of reps has been reached. If the maximum number of reps has been reached, we instead try to match the remainder of the regex. We also avoid UB now by checking for potential overflow of _Loop_idx.
During backtracking, we try to match the tail when processing _Loop_simple_greedy if matching has failed, resetting the position in the input and the state of capturing groups appropriately. (We do not have to handle _Longest here because we always perform non-greedy matching when _Longest is true.) As already argued in <regex>: Process minimum number of reps in simple loops non-recursively #5762, we don't have to restore loop state here because simple loops are branchless and non-reentrant.

The increase in size of the _Frames vector is no longer accurately reflected by the preexisting stack usage counter. But this is deliberate choice to preserve backwards compatibility: If the stack usage count were increased as well, we might throw a regex_error(error_stack) on inputs that were previously accepted.

It is possible to avoid this increase in size of the _Frames vector. But this would majorly complicate this PR and a few more changes I consider more relevant, so I would like to defer this to a later PR.

I felt again that the existing test coverage was a bit insufficient, so I added a few more tests. This includes tests that greedy quantifiers and non-greedy quantifiers (and leftmost-longest mode) can yield matches of different length when searching.

stl/inc/regex

tests/std/tests/VSO_0000000_regex_use/test.cpp

StephanTLavavej · 2025-10-20T20:58:16Z

Thanks as always for the exceptionally clear writeup, careful changes, and added test coverage! 😻

I really appreciate the attention to avoiding disruption here:

The increase in size of the _Frames vector is no longer accurately reflected by the preexisting stack usage counter. But this is deliberate choice to preserve backwards compatibility: If the stack usage count were increased as well, we might throw a regex_error(error_stack) on inputs that were previously accepted.

I pushed a comment typo fix and minor test fixes. Also I noticed how you predict what the PR number is going to be, so your test mentions it correctly on the first try 😹

<regex>: Process greedy simple loops non-recursively

e2c6bc1

muellerj2 requested a review from a team as a code owner October 18, 2025 19:58

github-project-automation bot added this to STL Code Reviews Oct 18, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 18, 2025

StephanTLavavej added enhancement Something can be improved regex meow is a substring of homeowner labels Oct 20, 2025

StephanTLavavej self-assigned this Oct 20, 2025

StephanTLavavej added 4 commits October 20, 2025 13:19

Comment cleanups.

e341706

Add missing options.

f0f2943

Drop duplicated test.

142b9c1

Add test cases.

84dc9ae

StephanTLavavej reviewed Oct 20, 2025

View reviewed changes

StephanTLavavej approved these changes Oct 20, 2025

View reviewed changes

StephanTLavavej removed their assignment Oct 20, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`<regex>`: Process greedy simple loops non-recursively #5790

`<regex>`: Process greedy simple loops non-recursively #5790

Uh oh!

muellerj2 commented Oct 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

<regex>: Process greedy simple loops non-recursively #5790

Are you sure you want to change the base?

<regex>: Process greedy simple loops non-recursively #5790

Uh oh!

Conversation

muellerj2 commented Oct 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`<regex>`: Process greedy simple loops non-recursively #5790

`<regex>`: Process greedy simple loops non-recursively #5790