Skip to content

Commit b7575c6

Browse files
committed
gh-136618: lookbehind assertions doc at regex
1 parent 8ac7613 commit b7575c6

File tree

1 file changed

+67
-1
lines changed

1 file changed

+67
-1
lines changed

Doc/howto/regex.rst

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
:Author: A.M. Kuchling <[email protected]>
88

99
.. TODO:
10-
Document lookbehind assertions
1110
Better way of displaying a RE, a string, and what it matches
1211
Mention optional argument to match.groups()
1312
Unicode (at least a reference)
@@ -1061,6 +1060,73 @@ end in either ``bat`` or ``exe``:
10611060
``.*[.](?!bat$|exe$)[^.]*$``
10621061

10631062

1063+
Lookbehind Assertions
1064+
--------------------
1065+
1066+
Lookbehind assertions work similarly to lookahead assertions, but they look
1067+
backwards in the string instead of forwards. They are available in both
1068+
positive and negative form, and look like this:
1069+
1070+
``(?<=...)``
1071+
Positive lookbehind assertion. This succeeds if the contained regular
1072+
expression, represented here by ``...``, successfully matches ending at the
1073+
current location, and fails otherwise. The matching engine doesn't advance;
1074+
the rest of the pattern is tried right where the assertion started.
1075+
1076+
``(?<!...)``
1077+
Negative lookbehind assertion. This is the opposite of the positive assertion;
1078+
it succeeds if the contained expression *doesn't* match ending at the current
1079+
position in the string.
1080+
1081+
Here's a comparison of lookahead and lookbehind assertions:
1082+
1083+
+------------------+------------------+------------------+
1084+
| Type | Lookahead | Lookbehind |
1085+
+==================+==================+==================+
1086+
| Positive | ``(?=...)`` | ``(?<=...)`` |
1087+
+------------------+------------------+------------------+
1088+
| Negative | ``(?!...)`` | ``(?<!...)`` |
1089+
+------------------+------------------+------------------+
1090+
| Direction | Forward | Backward |
1091+
+------------------+------------------+------------------+
1092+
| Checks | What comes after | What came before |
1093+
+------------------+------------------+------------------+
1094+
1095+
Examples
1096+
~~~~~~~~
1097+
1098+
*Positive assertions:*
1099+
- Lookahead: ``Python(?= )`` matches "Python" only when followed by a space
1100+
- Lookbehind: ``(?<=Hello )Python`` matches "Python" only when preceded by "Hello "
1101+
1102+
*Negative assertions:*
1103+
- Lookahead: ``Python(?! )`` matches "Python" only when NOT followed by a space
1104+
- Lookbehind: ``(?<!Hello )Python`` matches "Python" only when NOT preceded by "Hello "
1105+
1106+
*Practical examples:*
1107+
- Lookahead: ``\d+(?=\$)`` matches digits that are followed by a dollar sign
1108+
- Lookbehind: ``(?<=\$)\d+`` matches digits that are preceded by a dollar sign
1109+
1110+
Key differences
1111+
~~~~~~~~~~~~~~
1112+
1113+
1. **Direction**: Lookahead checks forward in the string, lookbehind checks backward
1114+
2. **Limitations**: Lookbehind assertions must match fixed-width strings (no
1115+
variable quantifiers like ``*``, ``+``, or ``{m,n}``)
1116+
3. **Performance**: Lookahead is generally more efficient because it follows the
1117+
natural left-to-right parsing of strings. Lookbehind, especially when emulated
1118+
or extended with variable-width support (as in some advanced regex engines),
1119+
can be computationally expensive.
1120+
1121+
For example, this is valid for lookahead but not for lookbehind:
1122+
- Lookahead: ``(?=a*)def`` ✓ (valid)
1123+
- Lookbehind: ``(?<=a*)def`` ✗ (error: variable-width lookbehind)
1124+
1125+
This limitation exists because the regex engine processes the string from left to
1126+
right, and variable-width lookbehind would require the engine to look back an
1127+
unknown distance, which is computationally expensive and not supported.
1128+
1129+
10641130
Modifying Strings
10651131
=================
10661132

0 commit comments

Comments
 (0)