|
7 | 7 | :Author: A.M. Kuchling < [email protected]>
|
8 | 8 |
|
9 | 9 | .. TODO:
|
10 |
| - Document lookbehind assertions |
11 | 10 | Better way of displaying a RE, a string, and what it matches
|
12 | 11 | Mention optional argument to match.groups()
|
13 | 12 | Unicode (at least a reference)
|
@@ -1061,6 +1060,73 @@ end in either ``bat`` or ``exe``:
|
1061 | 1060 | ``.*[.](?!bat$|exe$)[^.]*$``
|
1062 | 1061 |
|
1063 | 1062 |
|
| 1063 | +Lookbehind Assertions |
| 1064 | +-------------------- |
| 1065 | + |
| 1066 | +Lookbehind assertions work similarly to lookahead assertions, but they look |
| 1067 | +backwards in the string instead of forwards. They are available in both |
| 1068 | +positive and negative form, and look like this: |
| 1069 | + |
| 1070 | +``(?<=...)`` |
| 1071 | + Positive lookbehind assertion. This succeeds if the contained regular |
| 1072 | + expression, represented here by ``...``, successfully matches ending at the |
| 1073 | + current location, and fails otherwise. The matching engine doesn't advance; |
| 1074 | + the rest of the pattern is tried right where the assertion started. |
| 1075 | + |
| 1076 | +``(?<!...)`` |
| 1077 | + Negative lookbehind assertion. This is the opposite of the positive assertion; |
| 1078 | + it succeeds if the contained expression *doesn't* match ending at the current |
| 1079 | + position in the string. |
| 1080 | + |
| 1081 | +Here's a comparison of lookahead and lookbehind assertions: |
| 1082 | + |
| 1083 | ++------------------+------------------+------------------+ |
| 1084 | +| Type | Lookahead | Lookbehind | |
| 1085 | ++==================+==================+==================+ |
| 1086 | +| Positive | ``(?=...)`` | ``(?<=...)`` | |
| 1087 | ++------------------+------------------+------------------+ |
| 1088 | +| Negative | ``(?!...)`` | ``(?<!...)`` | |
| 1089 | ++------------------+------------------+------------------+ |
| 1090 | +| Direction | Forward | Backward | |
| 1091 | ++------------------+------------------+------------------+ |
| 1092 | +| Checks | What comes after | What came before | |
| 1093 | ++------------------+------------------+------------------+ |
| 1094 | + |
| 1095 | +Examples |
| 1096 | +~~~~~~~~ |
| 1097 | + |
| 1098 | +*Positive assertions:* |
| 1099 | +- Lookahead: ``Python(?= )`` matches "Python" only when followed by a space |
| 1100 | +- Lookbehind: ``(?<=Hello )Python`` matches "Python" only when preceded by "Hello " |
| 1101 | + |
| 1102 | +*Negative assertions:* |
| 1103 | +- Lookahead: ``Python(?! )`` matches "Python" only when NOT followed by a space |
| 1104 | +- Lookbehind: ``(?<!Hello )Python`` matches "Python" only when NOT preceded by "Hello " |
| 1105 | + |
| 1106 | +*Practical examples:* |
| 1107 | +- Lookahead: ``\d+(?=\$)`` matches digits that are followed by a dollar sign |
| 1108 | +- Lookbehind: ``(?<=\$)\d+`` matches digits that are preceded by a dollar sign |
| 1109 | + |
| 1110 | +Key differences |
| 1111 | +~~~~~~~~~~~~~~ |
| 1112 | + |
| 1113 | +1. **Direction**: Lookahead checks forward in the string, lookbehind checks backward |
| 1114 | +2. **Limitations**: Lookbehind assertions must match fixed-width strings (no |
| 1115 | + variable quantifiers like ``*``, ``+``, or ``{m,n}``) |
| 1116 | +3. **Performance**: Lookahead is generally more efficient because it follows the |
| 1117 | + natural left-to-right parsing of strings. Lookbehind, especially when emulated |
| 1118 | + or extended with variable-width support (as in some advanced regex engines), |
| 1119 | + can be computationally expensive. |
| 1120 | + |
| 1121 | +For example, this is valid for lookahead but not for lookbehind: |
| 1122 | +- Lookahead: ``(?=a*)def`` ✓ (valid) |
| 1123 | +- Lookbehind: ``(?<=a*)def`` ✗ (error: variable-width lookbehind) |
| 1124 | + |
| 1125 | +This limitation exists because the regex engine processes the string from left to |
| 1126 | +right, and variable-width lookbehind would require the engine to look back an |
| 1127 | +unknown distance, which is computationally expensive and not supported. |
| 1128 | + |
| 1129 | + |
1064 | 1130 | Modifying Strings
|
1065 | 1131 | =================
|
1066 | 1132 |
|
|
0 commit comments