Skip to content

⚡️ Speed up function regex_match by 150% #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 30, 2025

📄 150% (1.50x) speedup for regex_match in src/dsa/various.py

⏱️ Runtime : 3.50 milliseconds 1.40 milliseconds (best of 236 runs)

📝 Explanation and details

The optimized code achieves a 149% speedup through two key optimizations:

1. Pre-compilation of regex pattern
The original code calls re.match(pattern, s) inside the loop, which recompiles the regex pattern for every string comparison. The optimized version uses re.compile(pattern) once before the loop, creating a compiled pattern object that can be reused efficiently. This eliminates redundant pattern parsing and compilation overhead.

2. List comprehension instead of explicit loop
The optimized code replaces the explicit for loop with append operations with a list comprehension. List comprehensions are typically faster in Python due to optimized C-level iteration and reduced function call overhead.

Performance analysis from line profiler:

  • Original: 88.4% of time spent in re.match(pattern, s) calls (40.3ms total)
  • Optimized: 91.7% of time spent in one-time re.compile(pattern) call (17.3ms total)
  • The actual matching loop becomes much faster, taking only 8.3% of total time

Test case performance patterns:

  • Small lists (basic tests): 18-64% speedup, showing the compilation overhead reduction
  • Large lists (1000+ items): 145-217% speedup, where the benefits compound dramatically
  • Empty lists: Actually slower (72.7%) due to compilation overhead without amortization
  • Complex patterns: Consistent speedup regardless of pattern complexity (unicode, lookaheads, alternation)

The optimization is most effective for scenarios with multiple strings to match against the same pattern, where the one-time compilation cost is amortized across many match operations. For single-string matching or very small lists, the compilation overhead might not be worth it, but for typical use cases with multiple strings, this provides substantial performance gains.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 73 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import random
import re
import string

# imports
import pytest  # used for our unit tests
from src.dsa.various import regex_match

# --------------------------
# Unit Tests for regex_match
# --------------------------

# 1. BASIC TEST CASES

def test_basic_exact_match():
    # Simple exact match, should return all strings that are exactly 'abc'
    strings = ['abc', 'def', 'abc', 'abcd', 'ab']
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.08μs (46.2% faster)

def test_basic_startswith():
    # Pattern matches strings that start with 'a'
    strings = ['apple', 'banana', 'apricot', 'grape', 'aardvark']
    pattern = r'^a'
    codeflash_output = regex_match(strings, pattern) # 1.71μs -> 1.04μs (64.0% faster)

def test_basic_digit_match():
    # Pattern matches strings starting with a digit
    strings = ['1abc', 'abc', '2def', 'ghi', '3']
    pattern = r'^\d'
    codeflash_output = regex_match(strings, pattern) # 1.62μs -> 1.12μs (44.4% faster)

def test_basic_dot_wildcard():
    # Pattern matches any single character followed by 'bc'
    strings = ['abc', 'xbc', 'bc', 'aabc']
    pattern = r'^.bc
    codeflash_output = regex_match(strings, pattern) # 1.33μs -> 1.00μs (33.3% faster)

def test_basic_empty_list():
    # Should return empty list if input is empty
    codeflash_output = regex_match([], r'.*') # 125ns -> 458ns (72.7% slower)

def test_basic_no_match():
    # No string matches the pattern
    strings = ['foo', 'bar', 'baz']
    pattern = r'^qux
    codeflash_output = regex_match(strings, pattern) # 1.04μs -> 833ns (25.1% faster)

def test_basic_case_sensitive():
    # Should be case-sensitive
    strings = ['Test', 'test', 'TEST']
    pattern = r'^test
    codeflash_output = regex_match(strings, pattern) # 1.08μs -> 917ns (18.1% faster)

# 2. EDGE TEST CASES

def test_edge_empty_pattern():
    # Empty pattern should match every string (since re.match('', s) always matches at start)
    strings = ['abc', '', '123']
    pattern = ''
    codeflash_output = regex_match(strings, pattern) # 1.33μs -> 1.00μs (33.4% faster)

def test_edge_empty_strings_in_list():
    # Only empty string should match pattern for empty string
    strings = ['abc', '', 'def']
    pattern = r'^
    codeflash_output = regex_match(strings, pattern) # 1.17μs -> 875ns (33.3% faster)

def test_edge_special_characters():
    # Pattern with special regex characters
    strings = ['a.c', 'abc', 'a-c', 'a$c', 'a c']
    pattern = r'^a\.c
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.08μs (46.2% faster)

def test_edge_anchors():
    # Pattern with start (^) and end ($) anchors
    strings = ['foo', 'barfoo', 'foobar', 'foo\n', 'foo']
    pattern = r'^foo
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.12μs (40.7% faster)

def test_edge_unicode():
    # Unicode characters in strings and pattern
    strings = ['café', 'cafe', 'CAFÉ', 'café123']
    pattern = r'^café
    codeflash_output = regex_match(strings, pattern) # 1.50μs -> 1.00μs (50.0% faster)

def test_edge_greedy_quantifiers():
    # Pattern with greedy quantifier
    strings = ['aaaa', 'aa', 'a', '', 'aaaab']
    pattern = r'^a+
    codeflash_output = regex_match(strings, pattern) # 1.75μs -> 1.17μs (50.0% faster)

def test_edge_non_greedy_quantifiers():
    # Pattern with non-greedy quantifier (should match whole string if possible)
    strings = ['aaab', 'aab', 'ab', 'b']
    pattern = r'^a+?b
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.12μs (40.7% faster)

def test_edge_alternation():
    # Alternation (|) in pattern
    strings = ['cat', 'dog', 'bat', 'rat']
    pattern = r'^(cat|dog)
    codeflash_output = regex_match(strings, pattern) # 1.54μs -> 1.21μs (27.6% faster)

def test_edge_lookahead():
    # Lookahead assertion
    strings = ['foo1', 'foo2', 'foo', 'foo3bar']
    pattern = r'^foo(?=\d)
    codeflash_output = regex_match(strings, pattern) # 1.42μs -> 1.04μs (35.9% faster)

def test_edge_lookahead_with_digit():
    # Lookahead assertion, matching foo followed by a digit
    strings = ['foo1', 'foo2', 'foo', 'foo3bar']
    pattern = r'^foo(?=\d)'
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.04μs (40.1% faster)

def test_edge_lookbehind():
    # Lookbehind assertion
    strings = ['abc123', '123abc', 'def123', '123def']
    pattern = r'(?<=abc)123'
    # re.match only matches at start, so only 'abc123' will match at position 0
    codeflash_output = regex_match(strings, pattern) # 1.21μs -> 916ns (31.9% faster)

def test_edge_multiline():
    # Multiline string, should only match at start of string, not after newline
    strings = ['foo\nbar', 'bar\nfoo', 'foo', 'bar']
    pattern = r'^foo'
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.04μs (40.0% faster)

def test_edge_long_pattern():
    # Very long pattern
    strings = ['a' * 100, 'a' * 99 + 'b', 'b' + 'a' * 99]
    pattern = r'^a{100}
    codeflash_output = regex_match(strings, pattern) # 1.25μs -> 1.00μs (25.0% faster)

def test_edge_match_none_and_all():
    # Pattern that matches nothing and pattern that matches everything
    strings = ['abc', 'def', '', '123']
    pattern_none = r'(?!)'  # always fails
    pattern_all = r'.*'     # matches everything (including empty string)
    codeflash_output = regex_match(strings, pattern_none) # 1.21μs -> 958ns (26.1% faster)
    codeflash_output = regex_match(strings, pattern_all) # 1.04μs -> 666ns (56.5% faster)

def test_edge_escape_sequences():
    # Pattern with escape sequences
    strings = ['\t', '\\t', 't', '\n']
    pattern = r'^\t
    codeflash_output = regex_match(strings, pattern) # 1.29μs -> 959ns (34.7% faster)

def test_edge_non_ascii():
    # Non-ASCII characters
    strings = ['你好', 'hello', 'こんにちは', '안녕하세요']
    pattern = r'^[\u4e00-\u9fff]+  # Chinese characters
    codeflash_output = regex_match(strings, pattern) # 1.62μs -> 1.17μs (39.2% faster)

def test_edge_whitespace():
    # Pattern matching whitespace
    strings = [' ', '\t', '\n', 'a', '  ']
    pattern = r'^\s+
    codeflash_output = regex_match(strings, pattern) # 1.75μs -> 1.17μs (50.0% faster)

def test_edge_match_at_start_only():
    # re.match matches only at start, not in the middle
    strings = ['abc123', '123abc', 'xabc123']
    pattern = r'abc'
    codeflash_output = regex_match(strings, pattern) # 1.29μs -> 958ns (34.9% faster)

def test_edge_match_with_group():
    # Pattern with capturing group
    strings = ['ab12', 'ab34', 'cd12', 'ab']
    pattern = r'^(ab)(\d+)
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.21μs (31.0% faster)

def test_edge_match_with_optional():
    # Pattern with optional group
    strings = ['color', 'colour', 'colr']
    pattern = r'^colou?r
    codeflash_output = regex_match(strings, pattern) # 1.29μs -> 1.04μs (24.1% faster)

def test_edge_match_with_repetition():
    # Pattern with repetition
    strings = ['ha', 'haha', 'hahaha', 'haaa']
    pattern = r'^(ha)+
    codeflash_output = regex_match(strings, pattern) # 2.08μs -> 1.62μs (28.2% faster)

# 3. LARGE SCALE TEST CASES

def test_large_all_match():
    # Large list where all strings should match
    strings = ['abc'] * 1000
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 179μs -> 66.8μs (168% faster)

def test_large_none_match():
    # Large list where no string should match
    strings = ['def'] * 1000
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 154μs -> 51.2μs (201% faster)

def test_large_some_match():
    # Large list with a mix of matching and non-matching strings
    strings = ['abc' if i % 2 == 0 else 'def' for i in range(1000)]
    pattern = r'^abc
    expected = ['abc'] * 500
    codeflash_output = regex_match(strings, pattern) # 171μs -> 61.2μs (180% faster)

def test_large_random_strings_with_pattern():
    # Large list of random strings, only a few match the pattern
    random.seed(42)
    strings = [
        ''.join(random.choices(string.ascii_lowercase, k=5))
        for _ in range(995)
    ] + ['hello', 'hello', 'hello', 'hello', 'hello']
    pattern = r'^hello
    codeflash_output = regex_match(strings, pattern) # 151μs -> 52.1μs (191% faster)

def test_large_long_strings():
    # Large list of long strings, only some match
    base = 'a' * 500
    strings = [base, base + 'b', 'b' + base] * 333 + [base]
    pattern = r'^a{500}
    expected = [base] * 334
    codeflash_output = regex_match(strings, pattern) # 275μs -> 164μs (68.1% faster)

def test_large_varied_patterns():
    # Large list with varied patterns, ensure only correct matches
    strings = ['abc' * i for i in range(1, 1001)]
    pattern = r'^(abc){10}
    codeflash_output = regex_match(strings, pattern) # 228μs -> 125μs (81.7% faster)

def test_large_unicode_strings():
    # Large list of unicode strings, only some match
    strings = ['你好'] * 500 + ['hello'] * 500
    pattern = r'^[\u4e00-\u9fff]+
    codeflash_output = regex_match(strings, pattern) # 174μs -> 66.7μs (162% faster)

def test_large_edge_empty_strings():
    # Large list of empty strings, pattern should match all
    strings = [''] * 1000
    pattern = r'^
    codeflash_output = regex_match(strings, pattern) # 172μs -> 65.0μs (166% faster)

def test_large_alternation():
    # Large list, alternation pattern
    strings = ['cat', 'dog', 'bat'] * 333 + ['cat']
    pattern = r'^(cat|dog)
    expected = ['cat', 'dog'] * 333 + ['cat']
    # Remove 'bat's, which don't match
    expected = [s for s in strings if s in ('cat', 'dog')]
    codeflash_output = regex_match(strings, pattern) # 167μs -> 62.8μs (167% faster)

def test_large_performance_with_mixed_lengths():
    # Large list of strings with varying lengths
    strings = ['a' * i for i in range(1, 1001)]
    pattern = r'^a{1000}
    codeflash_output = regex_match(strings, pattern) # 150μs -> 50.2μs (200% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import random
import re
import string

# imports
import pytest  # used for our unit tests
from src.dsa.various import regex_match

# unit tests

# ---------------------
# Basic Test Cases
# ---------------------

def test_basic_exact_match():
    # Should match strings that are exactly 'abc'
    strings = ['abc', 'abcd', 'zabc', 'abc']
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 1.54μs -> 1.08μs (42.3% faster)

def test_basic_startswith():
    # Should match strings that start with 'foo'
    strings = ['foo', 'foobar', 'barfoo', 'foo123']
    pattern = r'^foo'
    codeflash_output = regex_match(strings, pattern) # 1.50μs -> 1.00μs (50.0% faster)

def test_basic_endswith():
    # Should match strings that end with 'xyz'
    strings = ['abcxyz', 'xyz', 'xyza', 'axyz']
    pattern = r'xyz
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.00μs (45.8% faster)

def test_basic_digit_match():
    # Should match strings that are a single digit
    strings = ['1', 'a', '9', '10', '', '123']
    pattern = r'^\d
    codeflash_output = regex_match(strings, pattern) # 1.75μs -> 1.12μs (55.6% faster)

def test_basic_wildcard_dot():
    # Should match any 3-character string
    strings = ['abc', 'ab', 'abcd', '123', 'xyz']
    pattern = r'^...
    codeflash_output = regex_match(strings, pattern) # 1.58μs -> 1.04μs (51.9% faster)

def test_basic_alternation():
    # Should match 'cat' or 'dog'
    strings = ['cat', 'dog', 'bat', 'catalog', 'dogma']
    pattern = r'^(cat|dog)
    codeflash_output = regex_match(strings, pattern) # 1.79μs -> 1.21μs (48.3% faster)

# ---------------------
# Edge Test Cases
# ---------------------

def test_edge_empty_string():
    # Should match the empty string only
    strings = ['', ' ', 'a', '']
    pattern = r'^
    codeflash_output = regex_match(strings, pattern) # 1.38μs -> 1.00μs (37.5% faster)

def test_edge_empty_list():
    # Should return empty list if input list is empty
    strings = []
    pattern = r'.*'
    codeflash_output = regex_match(strings, pattern) # 125ns -> 458ns (72.7% slower)

def test_edge_empty_pattern():
    # Empty pattern matches every string at position 0
    strings = ['a', '', 'abc', ' ']
    pattern = r''
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.04μs (39.9% faster)

def test_edge_special_characters():
    # Should match strings with special regex characters
    strings = ['a.c', 'abc', 'a-c', 'a*c']
    pattern = r'^a\.c
    codeflash_output = regex_match(strings, pattern) # 1.54μs -> 1.12μs (37.0% faster)

def test_edge_unicode():
    # Should match unicode characters
    strings = ['café', 'cafe', 'caffè', 'cafè']
    pattern = r'^caf.\Z'
    codeflash_output = regex_match(strings, pattern) # 1.71μs -> 1.12μs (51.8% faster)

def test_edge_newline_in_string():
    # Should match strings that start with 'foo' even if they contain newlines
    strings = ['foo\nbar', 'foobar', 'barfoo', 'foo']
    pattern = r'^foo'
    codeflash_output = regex_match(strings, pattern) # 1.50μs -> 1.04μs (44.1% faster)

def test_edge_match_none():
    # Pattern that matches nothing
    strings = ['abc', 'def', 'ghi']
    pattern = r'(?!)'
    codeflash_output = regex_match(strings, pattern) # 1.12μs -> 916ns (22.8% faster)

def test_edge_match_everything():
    # Pattern that matches everything (including empty string)
    strings = ['abc', '', 'def']
    pattern = r'.*'
    codeflash_output = regex_match(strings, pattern) # 1.29μs -> 958ns (34.9% faster)

def test_edge_single_character():
    # Should match strings that are a single character
    strings = ['a', 'b', 'ab', '', '1']
    pattern = r'^.
    codeflash_output = regex_match(strings, pattern) # 1.54μs -> 1.04μs (48.0% faster)

def test_edge_lookahead():
    # Should match 'foo' only if followed by 'bar'
    strings = ['foobar', 'foobaz', 'foo', 'barfoo']
    pattern = r'^foo(?=bar)'
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.04μs (40.1% faster)

def test_edge_lookbehind():
    # Should match 'bar' only if preceded by 'foo'
    strings = ['foobar', 'barfoo', 'foo bar', 'foobarbar']
    pattern = r'(?<=foo)bar'
    codeflash_output = regex_match(strings, pattern) # 1.33μs -> 916ns (45.5% faster)

def test_edge_greedy_vs_nongreedy():
    # Should match the shortest possible string with 'a.*?b'
    strings = ['acb', 'a123b', 'ab', 'a b']
    pattern = r'^a.*?b
    codeflash_output = regex_match(strings, pattern) # 1.71μs -> 1.21μs (41.3% faster)

def test_edge_case_sensitive():
    # Should match only lowercase 'abc'
    strings = ['abc', 'ABC', 'Abc', 'aBc']
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 1.42μs -> 958ns (47.8% faster)


def test_edge_multiline_anchor():
    # Should match only if string starts with 'foo'
    strings = ['foo\nbar', 'bar\nfoo', 'foo', 'barfoo']
    pattern = r'^foo'
    codeflash_output = regex_match(strings, pattern) # 1.83μs -> 1.38μs (33.4% faster)

def test_edge_word_boundary():
    # Should match 'cat' as a whole word
    strings = ['cat', 'concatenate', 'bobcat', 'the cat ', 'catnip']
    pattern = r'\bcat\b'
    codeflash_output = regex_match(strings, pattern) # 1.83μs -> 1.29μs (41.9% faster)

def test_edge_long_pattern():
    # Should match a string of 100 'a's
    s = 'a' * 100
    strings = [s, s + 'b', 'a' * 99, '']
    pattern = r'^a{100}
    codeflash_output = regex_match(strings, pattern) # 1.67μs -> 1.25μs (33.4% faster)

def test_edge_non_ascii():
    # Should match non-ASCII characters
    strings = ['你好', 'hello', 'こんにちは', '안녕하세요']
    pattern = r'^[^\x00-\x7F]+
    codeflash_output = regex_match(strings, pattern) # 1.88μs -> 1.42μs (32.3% faster)

def test_edge_escape_sequences():
    # Should match tab character
    strings = ['\t', ' ', 'a\tb', '\n']
    pattern = r'^\t
    codeflash_output = regex_match(strings, pattern) # 1.46μs -> 1.04μs (40.2% faster)

# ---------------------
# Large Scale Test Cases
# ---------------------

def test_large_all_match():
    # All strings match the pattern
    strings = ['abc'] * 1000
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 182μs -> 67.3μs (171% faster)

def test_large_none_match():
    # No string matches the pattern
    strings = ['def'] * 1000
    pattern = r'^abc
    codeflash_output = regex_match(strings, pattern) # 162μs -> 51.2μs (217% faster)

def test_large_half_match():
    # Half the strings match the pattern
    strings = ['abc' if i % 2 == 0 else 'def' for i in range(1000)]
    pattern = r'^abc
    expected = ['abc'] * 500
    codeflash_output = regex_match(strings, pattern) # 178μs -> 61.4μs (190% faster)

def test_large_varied_strings():
    # Large input with varied strings and a digit pattern
    strings = [str(i) for i in range(1000)] + ['abc'] * 100
    pattern = r'^\d+
    expected = [str(i) for i in range(1000)]
    codeflash_output = regex_match(strings, pattern) # 201μs -> 82.0μs (145% faster)

def test_large_long_strings():
    # Match strings that are 500 'a's long
    long_a = 'a' * 500
    long_b = 'b' * 500
    strings = [long_a, long_b, long_a + 'b', 'a' * 499, long_a]
    pattern = r'^a{500}
    codeflash_output = regex_match(strings, pattern) # 2.25μs -> 1.71μs (31.7% faster)

def test_large_random_strings_and_pattern():
    # Random strings, only those starting with 'abc' should match
    random.seed(0)
    strings = ['abc' + ''.join(random.choices(string.ascii_letters, k=10)) for _ in range(500)]
    strings += ['def' + ''.join(random.choices(string.ascii_letters, k=10)) for _ in range(500)]
    pattern = r'^abc'
    codeflash_output = regex_match(strings, pattern) # 163μs -> 60.1μs (173% faster)

def test_large_unicode_strings():
    # Large list with unicode strings, match those starting with 'ü'
    strings = ['ü' + ''.join(random.choices(string.ascii_letters, k=5)) for _ in range(500)]
    strings += ['a' + ''.join(random.choices(string.ascii_letters, k=5)) for _ in range(500)]
    pattern = r'^ü'
    codeflash_output = regex_match(strings, pattern) # 165μs -> 59.2μs (179% faster)

def test_large_empty_strings():
    # Large list of empty strings, pattern should match only empty string
    strings = [''] * 1000 + ['a'] * 100
    pattern = r'^
    codeflash_output = regex_match(strings, pattern) # 196μs -> 70.5μs (179% faster)

def test_large_mixed_lengths():
    # Strings of varying lengths, match those of exactly length 3
    strings = ['a', 'ab', 'abc', 'abcd', 'abc'] * 200
    pattern = r'^...
    expected = ['abc', 'abc'] * 200
    codeflash_output = regex_match(strings, pattern) # 167μs -> 61.8μs (171% faster)

def test_large_special_characters():
    # Large list with special regex characters, match those with a literal '.'
    strings = ['a.c', 'abc', 'a-c', 'a.c'] * 250
    pattern = r'^a\.c
    expected = ['a.c', 'a.c'] * 250
    codeflash_output = regex_match(strings, pattern) # 180μs -> 65.2μs (176% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from src.dsa.various import regex_match

def test_regex_match():
    regex_match([''], '')

To edit these changes git checkout codeflash/optimize-regex_match-mdpcermt and push.

Codeflash

The optimized code achieves a 149% speedup through two key optimizations:

**1. Pre-compilation of regex pattern**
The original code calls `re.match(pattern, s)` inside the loop, which recompiles the regex pattern for every string comparison. The optimized version uses `re.compile(pattern)` once before the loop, creating a compiled pattern object that can be reused efficiently. This eliminates redundant pattern parsing and compilation overhead.

**2. List comprehension instead of explicit loop**
The optimized code replaces the explicit `for` loop with append operations with a list comprehension. List comprehensions are typically faster in Python due to optimized C-level iteration and reduced function call overhead.

**Performance analysis from line profiler:**
- Original: 88.4% of time spent in `re.match(pattern, s)` calls (40.3ms total)
- Optimized: 91.7% of time spent in one-time `re.compile(pattern)` call (17.3ms total)
- The actual matching loop becomes much faster, taking only 8.3% of total time

**Test case performance patterns:**
- **Small lists (basic tests)**: 18-64% speedup, showing the compilation overhead reduction
- **Large lists (1000+ items)**: 145-217% speedup, where the benefits compound dramatically
- **Empty lists**: Actually slower (72.7%) due to compilation overhead without amortization
- **Complex patterns**: Consistent speedup regardless of pattern complexity (unicode, lookaheads, alternation)

The optimization is most effective for scenarios with multiple strings to match against the same pattern, where the one-time compilation cost is amortized across many match operations. For single-string matching or very small lists, the compilation overhead might not be worth it, but for typical use cases with multiple strings, this provides substantial performance gains.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 30, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 30, 2025 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants