⚡️ Speed up function should_use_regex by 16%
#285
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 16% (0.16x) speedup for
should_use_regexinpandas/core/array_algos/replace.py⏱️ Runtime :
30.2 milliseconds→25.9 milliseconds(best of44runs)📝 Explanation and details
The optimization achieves a 16% speedup by eliminating redundant regex compilation operations and introducing strategic short-circuits in the
should_use_regexfunction.Key Optimizations Applied:
Fast-path for compiled Pattern objects: When
to_replaceis already a compiled regex Pattern, the optimized code directly accessesto_replace.patterninstead of callingis_re_compilableandre.compileagain. This eliminates expensive redundant operations for Pattern inputs.Single compilation strategy: For non-Pattern inputs, the original code called
re.compiletwice - once inis_re_compilableand again for the empty pattern check. The optimized version compiles once and reuses the result, reducing compilation overhead by ~50% for string inputs.Early termination logic: Added
if not regex or not is_re_compilable(to_replace): return Falseto short-circuit when conditions aren't met, avoiding unnecessary compilation in the main function.Fast-path in
is_re_compilable: Addedif isinstance(obj, Pattern): return Trueto immediately return for already-compiled patterns without attempting recompilation.Performance Impact Analysis:
The line profiler shows the most significant gains occur when processing compiled Pattern objects. For example:
test_basic_compiled_pattern_regex_false: 2.89μs → 860ns (236% faster)test_large_compiled_pattern: 13.9μs → 296ns (4584% faster)These dramatic improvements happen because the optimization eliminates the expensive
re.compilecall entirely for Pattern inputs, which were the biggest bottleneck in the original implementation (82% of total time was spent inis_re_compilable).Workload Benefits:
The optimization is particularly effective for workloads involving pre-compiled regex patterns or mixed input types, where avoiding redundant compilation provides substantial performance gains while maintaining identical functionality.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-should_use_regex-mho8r9f1and push.