Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 19% (0.19x) speedup for CanonicalStrategy._is_standalone_structural_token in python/sglang/srt/parser/harmony_parser.py

⏱️ Runtime : 1.14 milliseconds 961 microseconds (best of 221 runs)

📝 Explanation and details

The optimization replaces a costly O(n) list lookup with an O(1) set lookup by pre-computing a set of structural tokens during initialization.

Key Changes:

  1. Pre-computed set: Added self._structural_tokens_set = set(self.guard_tokens) in __init__() to create a hash set from the token list
  2. Eliminated redundant list creation: Removed the inline list creation structural_tokens = [...] that occurred on every function call
  3. Direct set membership: Changed from content_stripped in structural_tokens (O(n) list scan) to content.strip() in self._structural_tokens_set (O(1) hash lookup)

Why This Is Faster:

  • Avoided repeated work: The original code recreated the same 7-element list on every call (5,542 times), wasting CPU cycles
  • Better algorithmic complexity: Set membership uses hash table lookup (O(1) average case) vs. linear list scan (O(n))
  • Reduced memory allocations: Eliminates 5,542 temporary list allocations during execution

Performance Impact:
The optimization delivers consistent 12-67% speedups across all test scenarios, with particularly strong gains for:

  • Non-matching strings (32-67% faster) - early hash misses vs. full list scans
  • Large-scale operations (13-26% faster) - the O(1) advantage compounds with volume
  • Basic token matching (16-34% faster) - eliminates list recreation overhead

This optimization is especially valuable for parsing workloads where _is_standalone_structural_token is called frequently, as the constant-time lookup scales much better than linear search as call frequency increases.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5577 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from sglang.srt.parser.harmony_parser import CanonicalStrategy

# unit tests

# Helper: get the actual function
@pytest.fixture
def strategy():
    return CanonicalStrategy()

# 1. BASIC TEST CASES

def test_exact_structural_tokens(strategy):
    # Test all structural tokens (should return True)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token) # 1.97μs -> 1.52μs (29.4% faster)

def test_structural_tokens_with_whitespace(strategy):
    # Test tokens with leading/trailing whitespace (should return True)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"   {token}   ") # 2.23μs -> 1.92μs (16.0% faster)

def test_non_structural_token(strategy):
    # Test a random string (should return False)
    codeflash_output = strategy._is_standalone_structural_token("hello world") # 685ns -> 446ns (53.6% faster)
    codeflash_output = strategy._is_standalone_structural_token("not_a_token") # 291ns -> 219ns (32.9% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|start|> extra") # 245ns -> 156ns (57.1% faster)
    codeflash_output = strategy._is_standalone_structural_token("extra <|start|>") # 208ns -> 149ns (39.6% faster)

def test_empty_string(strategy):
    # Empty string (should return False)
    codeflash_output = strategy._is_standalone_structural_token("") # 590ns -> 393ns (50.1% faster)

def test_whitespace_only(strategy):
    # String with only whitespace (should return False)
    codeflash_output = strategy._is_standalone_structural_token("   ") # 666ns -> 491ns (35.6% faster)

# 2. EDGE TEST CASES

def test_structural_token_with_newline(strategy):
    # Token with newline characters (should return True after stripping)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\n{token}\n") # 2.19μs -> 1.94μs (13.0% faster)

def test_structural_token_with_tabs(strategy):
    # Token with tabs (should return True after stripping)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\t{token}\t") # 2.08μs -> 1.81μs (14.6% faster)

def test_structural_token_case_sensitivity(strategy):
    # Token with different case (should return False)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token.upper()) # 2.15μs -> 1.70μs (26.8% faster)

def test_structural_token_substring(strategy):
    # Substrings or superstrings of tokens (should return False)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token[:-1]) # 1.86μs -> 1.53μs (21.1% faster)
        codeflash_output = strategy._is_standalone_structural_token(token + "x")
        codeflash_output = strategy._is_standalone_structural_token("x" + token) # 1.61μs -> 1.31μs (23.6% faster)

def test_structural_token_embedded(strategy):
    # Token embedded in other text (should return False)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"foo {token} bar") # 1.86μs -> 1.62μs (15.1% faster)

def test_structural_token_unicode_whitespace(strategy):
    # Token with unicode whitespace (should return True after stripping)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\u2003{token}\u2003") # 2.81μs -> 2.45μs (14.7% faster)

def test_structural_token_with_multiple_whitespace_types(strategy):
    # Token with mixed whitespace (should return True)
    for token in strategy.guard_tokens:
        mixed_ws = f"\t \n{token}\t \n"
        codeflash_output = strategy._is_standalone_structural_token(mixed_ws) # 2.16μs -> 1.89μs (14.6% faster)

def test_structural_token_with_control_characters(strategy):
    # Token with control characters (should return True after stripping)
    for token in strategy.guard_tokens:
        control_ws = f"\x0b{token}\x0c"
        codeflash_output = strategy._is_standalone_structural_token(control_ws) # 2.10μs -> 1.82μs (15.6% faster)

def test_structural_token_with_surrounding_non_whitespace(strategy):
    # Token with non-whitespace surrounding (should return False)
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"abc{token}xyz") # 1.94μs -> 1.70μs (13.9% faster)

# 3. LARGE SCALE TEST CASES

def test_large_batch_of_structural_tokens(strategy):
    # Test a large list of valid tokens with whitespace (should all be True)
    tokens = [f"  {token}  " for token in strategy.guard_tokens for _ in range(50)]  # 350 tokens
    for t in tokens:
        codeflash_output = strategy._is_standalone_structural_token(t) # 71.0μs -> 62.1μs (14.4% faster)

def test_large_batch_of_non_structural_tokens(strategy):
    # Test a large list of invalid tokens (should all be False)
    tokens = [f"foo{n}" for n in range(500)]
    for t in tokens:
        codeflash_output = strategy._is_standalone_structural_token(t) # 101μs -> 82.0μs (24.2% faster)

def test_mixed_large_batch(strategy):
    # Mix valid and invalid tokens, check each result
    valid = [f"\n{token}\n" for token in strategy.guard_tokens for _ in range(40)]  # 280 valid
    invalid = [f"not{n}" for n in range(280)]
    mixed = valid + invalid
    results = [strategy._is_standalone_structural_token(x) for x in mixed]

def test_performance_large_scale(strategy):
    # Performance: ensure function is fast for large input
    import time
    tokens = [f"   {token}   " for token in strategy.guard_tokens for _ in range(100)]
    start = time.time()
    for t in tokens:
        codeflash_output = strategy._is_standalone_structural_token(t) # 142μs -> 124μs (14.5% faster)
    duration = time.time() - start

def test_structural_tokens_with_various_whitespace_large_scale(strategy):
    # Large batch with various whitespace characters
    whitespace_variants = [" ", "\t", "\n", "\u2003", "\x0b", "\x0c"]
    tokens = []
    for token in strategy.guard_tokens:
        for ws in whitespace_variants:
            tokens.append(f"{ws}{token}{ws}")
    for t in tokens:
        codeflash_output = strategy._is_standalone_structural_token(t) # 10.4μs -> 9.18μs (13.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from sglang.srt.parser.harmony_parser import CanonicalStrategy

# unit tests

@pytest.fixture
def strategy():
    # Fixture to create a CanonicalStrategy instance for all tests
    return CanonicalStrategy()

# 1. Basic Test Cases

def test_exact_structural_tokens(strategy):
    # Test each token exactly as defined
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token) # 1.94μs -> 1.45μs (34.2% faster)

def test_structural_tokens_with_whitespace(strategy):
    # Test tokens with leading/trailing whitespace
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"  {token}  ") # 2.22μs -> 1.85μs (19.8% faster)

def test_non_structural_token(strategy):
    # Test a random string not in the structural tokens
    codeflash_output = strategy._is_standalone_structural_token("hello world") # 687ns -> 455ns (51.0% faster)
    codeflash_output = strategy._is_standalone_structural_token("start") # 279ns -> 168ns (66.1% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|not_a_token|>") # 225ns -> 158ns (42.4% faster)

def test_empty_string(strategy):
    # Test empty string
    codeflash_output = strategy._is_standalone_structural_token("") # 576ns -> 434ns (32.7% faster)

def test_only_whitespace(strategy):
    # Test string containing only whitespace
    codeflash_output = strategy._is_standalone_structural_token("   ") # 689ns -> 488ns (41.2% faster)

# 2. Edge Test Cases

def test_structural_token_with_extra_characters(strategy):
    # Token with extra characters appended/prepended
    codeflash_output = strategy._is_standalone_structural_token("<|start|>extra") # 628ns -> 451ns (39.2% faster)
    codeflash_output = strategy._is_standalone_structural_token("extra<|start|>") # 286ns -> 232ns (23.3% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|start|> extra") # 217ns -> 157ns (38.2% faster)
    codeflash_output = strategy._is_standalone_structural_token("extra <|start|>") # 210ns -> 155ns (35.5% faster)

def test_structural_token_case_sensitivity(strategy):
    # Token with different casing
    codeflash_output = strategy._is_standalone_structural_token("<|START|>") # 645ns -> 455ns (41.8% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|Start|>") # 265ns -> 220ns (20.5% faster)

def test_structural_token_with_newlines(strategy):
    # Token with newlines around it
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\n{token}\n") # 2.22μs -> 1.98μs (12.3% faster)

def test_structural_token_with_tabs(strategy):
    # Token with tabs around it
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\t{token}\t") # 2.12μs -> 1.80μs (17.6% faster)

def test_structural_token_with_multiple_spaces_inside(strategy):
    # Token with extra spaces inside the token (should not match)
    codeflash_output = strategy._is_standalone_structural_token("<| start | >") # 636ns -> 412ns (54.4% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|  start  |>") # 342ns -> 229ns (49.3% faster)

def test_partial_structural_token(strategy):
    # Partial token (missing angle brackets or pipes)
    codeflash_output = strategy._is_standalone_structural_token("start|>") # 658ns -> 394ns (67.0% faster)
    codeflash_output = strategy._is_standalone_structural_token("<|start|") # 317ns -> 246ns (28.9% faster)
    codeflash_output = strategy._is_standalone_structural_token("|start|") # 246ns -> 160ns (53.8% faster)

def test_structural_token_with_unicode_whitespace(strategy):
    # Token with unicode whitespace characters
    for token in strategy.guard_tokens:
        codeflash_output = strategy._is_standalone_structural_token(f"\u2003{token}\u2003") # 2.79μs -> 2.41μs (15.8% faster)

def test_structural_token_with_surrounding_non_whitespace(strategy):
    # Token with non-whitespace, non-token characters around
    codeflash_output = strategy._is_standalone_structural_token("foo<|start|>bar") # 593ns -> 395ns (50.1% faster)

# 3. Large Scale Test Cases

def test_large_list_of_structural_tokens(strategy):
    # Test a large list of tokens, all valid
    large_tokens = strategy.guard_tokens * 100  # 700 tokens
    for token in large_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token) # 137μs -> 109μs (25.9% faster)

def test_large_list_of_mixed_tokens(strategy):
    # Test a large list of tokens, mix of valid and invalid
    valid = strategy.guard_tokens * 50  # 350 valid
    invalid = [f"{token}_invalid" for token in strategy.guard_tokens] * 50  # 350 invalid
    mixed = valid + invalid
    for i, token in enumerate(mixed):
        if i < 350:
            codeflash_output = strategy._is_standalone_structural_token(token)
        else:
            codeflash_output = strategy._is_standalone_structural_token(token)

def test_large_random_strings(strategy):
    # Test a large number of random strings that should not match
    for i in range(1000):
        s = f"random_string_{i}"
        codeflash_output = strategy._is_standalone_structural_token(s) # 204μs -> 171μs (19.3% faster)

def test_large_structural_tokens_with_whitespace(strategy):
    # Test a large number of valid tokens with whitespace
    large_tokens = [f"   {token}   " for token in strategy.guard_tokens for _ in range(100)]  # 700 tokens
    for token in large_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token) # 144μs -> 127μs (12.9% faster)

def test_large_structural_tokens_with_newlines(strategy):
    # Test a large number of valid tokens with newlines
    large_tokens = [f"\n{token}\n" for token in strategy.guard_tokens for _ in range(100)]  # 700 tokens
    for token in large_tokens:
        codeflash_output = strategy._is_standalone_structural_token(token) # 142μs -> 125μs (13.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from sglang.srt.parser.harmony_parser import CanonicalStrategy

def test_CanonicalStrategy__is_standalone_structural_token():
    CanonicalStrategy._is_standalone_structural_token(CanonicalStrategy(), '')
🔎 Concolic Coverage Tests and Runtime

To edit these changes git checkout codeflash/optimize-CanonicalStrategy._is_standalone_structural_token-mhoosu6v and push.

Codeflash Static Badge

The optimization replaces a costly O(n) list lookup with an O(1) set lookup by pre-computing a set of structural tokens during initialization.

**Key Changes:**
1. **Pre-computed set**: Added `self._structural_tokens_set = set(self.guard_tokens)` in `__init__()` to create a hash set from the token list
2. **Eliminated redundant list creation**: Removed the inline list creation `structural_tokens = [...]` that occurred on every function call
3. **Direct set membership**: Changed from `content_stripped in structural_tokens` (O(n) list scan) to `content.strip() in self._structural_tokens_set` (O(1) hash lookup)

**Why This Is Faster:**
- **Avoided repeated work**: The original code recreated the same 7-element list on every call (5,542 times), wasting CPU cycles
- **Better algorithmic complexity**: Set membership uses hash table lookup (O(1) average case) vs. linear list scan (O(n))
- **Reduced memory allocations**: Eliminates 5,542 temporary list allocations during execution

**Performance Impact:**
The optimization delivers consistent 12-67% speedups across all test scenarios, with particularly strong gains for:
- Non-matching strings (32-67% faster) - early hash misses vs. full list scans
- Large-scale operations (13-26% faster) - the O(1) advantage compounds with volume
- Basic token matching (16-34% faster) - eliminates list recreation overhead

This optimization is especially valuable for parsing workloads where `_is_standalone_structural_token` is called frequently, as the constant-time lookup scales much better than linear search as call frequency increases.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 10:02
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant