Skip to content

⚡️ Speed up function _infer_docstring_style by 31% #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: try-refinement
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 22, 2025

📄 31% (0.31x) speedup for _infer_docstring_style in pydantic_ai_slim/pydantic_ai/_griffe.py

⏱️ Runtime : 6.08 milliseconds 4.64 milliseconds (best of 109 runs)

📝 Explanation and details

Here’s how to optimize your _infer_docstring_style function for both speed and memory usage.

  • Avoid generator usage with any() for inner loop: Instead of using a generator expression (which creates a generator and then iterates in any()), a simple for loop with early break is slightly faster and allows us to exit on the first match directly.
  • Pre-compile patterns: Compiling the regex patterns at function runtime wastes time. For maximum speed, these should be compiled once. Since the _docstring_style_patterns data comes from a read-only module, we will compile on demand within the function, but cache them locally with a simple dict for future calls (i.e., LRU caching for compiled regex).
  • Minimize .format calls: Pre-formatting patterns (for all replacements) and re-using if this function is called many times.

Notes.

  • We introduced a module-level _regex_cache dict to ensure each compiled regex is re-used, speeding up repeated style checks.
  • The nested loop is now more explicit and will short-circuit on the first found match, ensuring fewer total regex searches.
  • All behaviors and types remain unchanged.

This version is optimal for both single calls and repeated calls (where the caching shines).

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 81 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
from typing import Literal

# imports
import pytest  # used for our unit tests
from pydantic_ai._griffe import _infer_docstring_style

DocstringStyle = Literal['google', 'numpy', 'sphinx']
from pydantic_ai._griffe import _infer_docstring_style

# See https://github.com/mkdocstrings/griffe/issues/329#issuecomment-2425017804
_docstring_style_patterns: list[tuple[str, list[str], DocstringStyle]] = [
    (
        r'\n[ \t]*:{0}([ \t]+\w+)*:([ \t]+.+)?\n',
        [
            'param',
            'parameter',
            'arg',
            'argument',
            'key',
            'keyword',
            'type',
            'var',
            'ivar',
            'cvar',
            'vartype',
            'returns',
            'return',
            'rtype',
            'raises',
            'raise',
            'except',
            'exception',
        ],
        'sphinx',
    ),
    (
        r'\n[ \t]*{0}:([ \t]+.+)?\n[ \t]+.+',
        [
            'args',
            'arguments',
            'params',
            'parameters',
            'keyword args',
            'keyword arguments',
            'other args',
            'other arguments',
            'other params',
            'other parameters',
            'raises',
            'exceptions',
            'returns',
            'yields',
            'receives',
            'examples',
            'attributes',
            'functions',
            'methods',
            'classes',
            'modules',
            'warns',
            'warnings',
        ],
        'google',
    ),
    (
        r'\n[ \t]*{0}\n[ \t]*---+\n',
        [
            'deprecated',
            'parameters',
            'other parameters',
            'returns',
            'yields',
            'receives',
            'raises',
            'warns',
            'attributes',
            'functions',
            'methods',
            'classes',
            'modules',
        ],
        'numpy',
    ),
]


# unit tests

# --------------------
# 1. BASIC TEST CASES
# --------------------

def test_google_style_simple():
    """
    Test a simple Google-style docstring with 'Args:' section.
    """
    doc = """
    Does something.

    Args:
        x: The x value.
        y: The y value.
    """
    codeflash_output = _infer_docstring_style(doc) # 17.8μs -> 5.00μs (257% faster)

def test_google_style_with_returns():
    """
    Test a Google-style docstring with 'Returns:' section.
    """
    doc = """
    Computes the sum.

    Args:
        a: First value.
        b: Second value.

    Returns:
        The sum of a and b.
    """
    codeflash_output = _infer_docstring_style(doc) # 18.8μs -> 6.08μs (209% faster)

def test_numpy_style_simple():
    """
    Test a simple Numpy-style docstring with 'Parameters' and dashes.
    """
    doc = """
    Compute the sum.

    Parameters
    ----------
    a : int
        First value.
    b : int
        Second value.

    Returns
    -------
    int
        The sum of a and b.
    """
    codeflash_output = _infer_docstring_style(doc) # 49.0μs -> 23.1μs (112% faster)

def test_numpy_style_with_warns():
    """
    Test Numpy-style with 'Warns' section.
    """
    doc = """
    Does something.

    Warns
    -----
    UserWarning
        If something goes wrong.
    """
    codeflash_output = _infer_docstring_style(doc) # 44.8μs -> 15.4μs (190% faster)

def test_sphinx_style_param():
    """
    Test a Sphinx-style docstring with ':param:' fields.
    """
    doc = """
    Does something.

    :param x: The x value.
    :param y: The y value.
    :returns: The result.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.21μs -> 1.25μs (157% faster)

def test_sphinx_style_raises():
    """
    Test Sphinx-style with ':raises:' field.
    """
    doc = """
    Divide numbers.

    :param a: Numerator.
    :param b: Denominator.
    :raises ZeroDivisionError: If b is zero.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.21μs -> 1.17μs (175% faster)

def test_sphinx_style_alternate_keywords():
    """
    Sphinx-style using ':argument:' and ':return:'.
    """
    doc = """
    Does something.

    :argument foo: Foo argument.
    :return: The result.
    """
    codeflash_output = _infer_docstring_style(doc) # 6.04μs -> 2.08μs (190% faster)

def test_google_style_with_examples():
    """
    Google-style with 'Examples:' section.
    """
    doc = """
    Does something.

    Examples:
        >>> foo()
        bar
    """
    codeflash_output = _infer_docstring_style(doc) # 30.9μs -> 10.0μs (208% faster)

def test_numpy_style_deprecated():
    """
    Numpy-style with 'Deprecated' section.
    """
    doc = """
    Does something.

    Deprecated
    ----------
    This function will be removed in future versions.
    """
    codeflash_output = _infer_docstring_style(doc) # 36.9μs -> 11.7μs (215% faster)

# --------------------
# 2. EDGE TEST CASES
# --------------------

def test_empty_docstring():
    """
    Empty docstring should default to 'google'.
    """
    doc = ""
    codeflash_output = _infer_docstring_style(doc) # 36.6μs -> 4.50μs (713% faster)

def test_no_sections():
    """
    Docstring with no recognizable sections should default to 'google'.
    """
    doc = "This function does something."
    codeflash_output = _infer_docstring_style(doc) # 36.6μs -> 5.00μs (632% faster)

def test_only_whitespace():
    """
    Docstring with only whitespace should default to 'google'.
    """
    doc = "   \n\t  "
    codeflash_output = _infer_docstring_style(doc) # 36.0μs -> 5.00μs (621% faster)

def test_sphinx_style_with_tabs_and_spaces():
    """
    Sphinx-style with mixed tabs and spaces.
    """
    doc = """
    Does something.

    \t:param\tfoo:\tFoo value.
    \t:returns:\tResult.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.25μs -> 1.29μs (152% faster)

def test_google_style_with_indented_args():
    """
    Google-style with indented 'Args:' section.
    """
    doc = """
    Does something.

        Args:
            foo: Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 17.7μs -> 4.88μs (263% faster)

def test_numpy_style_with_short_dashes():
    """
    Numpy-style with short dashes (minimum 3 dashes).
    """
    doc = """
    Parameters
    ---
    foo : int
        Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 37.2μs -> 11.2μs (233% faster)

def test_sphinx_style_case_insensitive():
    """
    Sphinx-style with uppercase ':PARAM:'.
    """
    doc = """
    Does something.

    :PARAM foo: Foo value.
    """
    # Should match regardless of case
    codeflash_output = _infer_docstring_style(doc) # 3.21μs -> 1.17μs (175% faster)

def test_google_style_case_insensitive():
    """
    Google-style with uppercase 'ARGS:'.
    """
    doc = """
    Does something.

    ARGS:
        foo: Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 17.1μs -> 4.54μs (277% faster)

def test_numpy_style_case_insensitive():
    """
    Numpy-style with uppercase 'PARAMETERS' and dashes.
    """
    doc = """
    PARAMETERS
    ----------
    foo : int
        Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 37.8μs -> 11.4μs (233% faster)

def test_sphinx_style_with_colons_in_description():
    """
    Sphinx-style with colon in parameter description.
    """
    doc = """
    :param foo: The value: must be an int.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.29μs -> 1.12μs (193% faster)

def test_google_style_with_colons_in_description():
    """
    Google-style with colon in description.
    """
    doc = """
    Args:
        foo: The value: must be an int.
    """
    codeflash_output = _infer_docstring_style(doc) # 16.8μs -> 3.96μs (323% faster)

def test_numpy_style_with_colons_in_description():
    """
    Numpy-style with colon in description.
    """
    doc = """
    Parameters
    ----------
    foo : int
        The value: must be an int.
    """
    codeflash_output = _infer_docstring_style(doc) # 37.6μs -> 11.4μs (230% faster)

def test_multiple_styles_present_prefers_first_match():
    """
    If multiple styles are present, should return the first matching style by order in _docstring_style_patterns.
    """
    doc = """
    :param foo: Foo value.

    Parameters
    ----------
    foo : int
        Foo value.
    """
    # Sphinx pattern is checked before numpy, so should return 'sphinx'
    codeflash_output = _infer_docstring_style(doc) # 3.12μs -> 1.12μs (178% faster)

def test_no_match_fallback_google():
    """
    If no pattern matches, fallback is 'google'.
    """
    doc = """
    This is a docstring with no sections.
    """
    codeflash_output = _infer_docstring_style(doc) # 40.8μs -> 8.83μs (362% faster)

def test_sphinx_style_with_multiple_param_types():
    """
    Sphinx-style with :param, :ivar, :cvar, :vartype.
    """
    doc = """
    :param foo: Foo value.
    :ivar bar: Bar value.
    :cvar baz: Baz value.
    :vartype foo: int
    """
    codeflash_output = _infer_docstring_style(doc) # 3.12μs -> 1.12μs (178% faster)

def test_google_style_with_other_sections():
    """
    Google-style with 'Other Parameters:' section.
    """
    doc = """
    Other Parameters:
        foo: Foo value.
        bar: Bar value.
    """
    codeflash_output = _infer_docstring_style(doc) # 25.0μs -> 7.12μs (250% faster)

def test_numpy_style_with_other_parameters():
    """
    Numpy-style with 'Other Parameters' section.
    """
    doc = """
    Other Parameters
    ---------------
    foo : int
        Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 38.3μs -> 11.7μs (227% faster)

def test_sphinx_style_with_colon_and_whitespace():
    """
    Sphinx-style with extra whitespace after colon.
    """
    doc = """
    :param    foo:    Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.17μs -> 1.17μs (171% faster)

def test_google_style_with_keyword_arguments():
    """
    Google-style with 'Keyword Arguments:' section.
    """
    doc = """
    Keyword Arguments:
        foo: Foo value.
    """
    codeflash_output = _infer_docstring_style(doc) # 20.8μs -> 5.29μs (292% faster)

def test_numpy_style_with_functions_section():
    """
    Numpy-style with 'Functions' section.
    """
    doc = """
    Functions
    ---------
    foo
        Does something.
    """
    codeflash_output = _infer_docstring_style(doc) # 44.2μs -> 13.6μs (225% faster)

# --------------------
# 3. LARGE SCALE TEST CASES
# --------------------

def test_large_google_style_docstring():
    """
    Google-style docstring with many parameters and sections.
    """
    params = "\n".join([f"    param{i}: Description of param{i}." for i in range(500)])
    doc = f"""
    Does something big.

    Args:
{params}

    Returns:
        Something.
    """
    codeflash_output = _infer_docstring_style(doc) # 233μs -> 220μs (5.87% faster)

def test_large_numpy_style_docstring():
    """
    Numpy-style docstring with many parameters.
    """
    params = "\n".join([f"param{i} : int\n    Description of param{i}." for i in range(500)])
    doc = f"""
    Does something big.

    Parameters
    ----------
{params}

    Returns
    -------
    int
        Something.
    """
    codeflash_output = _infer_docstring_style(doc) # 1.05ms -> 1.02ms (2.38% faster)

def test_large_sphinx_style_docstring():
    """
    Sphinx-style docstring with many :param: fields.
    """
    params = "\n".join([f":param param{i}: Description of param{i}." for i in range(500)])
    doc = f"""
    Does something big.

{params}

    :returns: Something.
    """
    codeflash_output = _infer_docstring_style(doc) # 3.38μs -> 1.21μs (179% faster)

def test_performance_large_mixed_docstring():
    """
    Large docstring with lots of text and a Sphinx-style section at the end.
    Should still detect the correct style efficiently.
    """
    # Create a large block of unrelated text
    unrelated = "\n".join([f"This is line {i}." for i in range(800)])
    doc = f"""
    {unrelated}

    :param foo: Foo value.
    :returns: Result.
    """
    codeflash_output = _infer_docstring_style(doc) # 13.4μs -> 10.9μs (22.5% faster)

def test_large_docstring_no_match():
    """
    Large docstring with no matching style, should fallback to 'google'.
    """
    unrelated = "\n".join([f"This is line {i}." for i in range(900)])
    doc = f"""
    {unrelated}

    This is the end.
    """
    codeflash_output = _infer_docstring_style(doc) # 755μs -> 699μs (8.04% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import re
from typing import Literal

# imports
import pytest  # used for our unit tests
from pydantic_ai._griffe import _infer_docstring_style

# function to test
DocstringStyle = Literal['google', 'numpy', 'sphinx']
from pydantic_ai._griffe import _infer_docstring_style

# See https://github.com/mkdocstrings/griffe/issues/329#issuecomment-2425017804
_docstring_style_patterns: list[tuple[str, list[str], DocstringStyle]] = [
    (
        r'\n[ \t]*:{0}([ \t]+\w+)*:([ \t]+.+)?\n',
        [
            'param',
            'parameter',
            'arg',
            'argument',
            'key',
            'keyword',
            'type',
            'var',
            'ivar',
            'cvar',
            'vartype',
            'returns',
            'return',
            'rtype',
            'raises',
            'raise',
            'except',
            'exception',
        ],
        'sphinx',
    ),
    (
        r'\n[ \t]*{0}:([ \t]+.+)?\n[ \t]+.+',
        [
            'args',
            'arguments',
            'params',
            'parameters',
            'keyword args',
            'keyword arguments',
            'other args',
            'other arguments',
            'other params',
            'other parameters',
            'raises',
            'exceptions',
            'returns',
            'yields',
            'receives',
            'examples',
            'attributes',
            'functions',
            'methods',
            'classes',
            'modules',
            'warns',
            'warnings',
        ],
        'google',
    ),
    (
        r'\n[ \t]*{0}\n[ \t]*---+\n',
        [
            'deprecated',
            'parameters',
            'other parameters',
            'returns',
            'yields',
            'receives',
            'raises',
            'warns',
            'attributes',
            'functions',
            'methods',
            'classes',
            'modules',
        ],
        'numpy',
    ),
]

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_google_style_basic():
    """
    Test a basic Google style docstring.
    """
    doc = """
    Summary line.

    Args:
        x (int): The x value.
        y (str): The y value.

    Returns:
        bool: True if successful.
    """
    codeflash_output = _infer_docstring_style(doc) # 19.5μs -> 6.50μs (199% faster)

def test_numpy_style_basic():
    """
    Test a basic Numpy style docstring.
    """
    doc = """
    Summary line.

    Parameters
    ----------
    x : int
        The x value.
    y : str
        The y value.

    Returns
    -------
    bool
        True if successful.
    """
    codeflash_output = _infer_docstring_style(doc) # 49.6μs -> 23.4μs (112% faster)

def test_sphinx_style_basic():
    """
    Test a basic Sphinx style docstring.
    """
    doc = """
    Summary line.

    :param x: The x value.
    :type x: int
    :param y: The y value.
    :type y: str
    :returns: True if successful.
    :rtype: bool
    """
    codeflash_output = _infer_docstring_style(doc) # 3.29μs -> 1.25μs (163% faster)

def test_google_style_variants():
    """
    Test Google style with different section names and indentation.
    """
    doc = """
    Function summary.

    Arguments:
        foo (str): foo arg

    Raises:
        ValueError: if bad input
    """
    codeflash_output = _infer_docstring_style(doc) # 19.6μs -> 6.42μs (206% faster)

def test_numpy_style_variants():
    """
    Test Numpy style with "Other Parameters" and mixed-case section headers.
    """
    doc = """
    Function summary.

    Other Parameters
    ---------------
    bar : float
        bar arg
    """
    codeflash_output = _infer_docstring_style(doc) # 40.8μs -> 13.8μs (195% faster)

def test_sphinx_style_variants():
    """
    Test Sphinx style with :raises: and :rtype: fields.
    """
    doc = """
    Does something.

    :raises ValueError: If something goes wrong.
    :rtype: None
    """
    codeflash_output = _infer_docstring_style(doc) # 13.7μs -> 4.00μs (242% faster)

# -------------------- EDGE TEST CASES --------------------

def test_empty_docstring():
    """
    Test empty docstring returns fallback style (google).
    """
    doc = ""
    codeflash_output = _infer_docstring_style(doc) # 36.5μs -> 4.42μs (725% faster)

def test_no_sections():
    """
    Test docstring with no recognizable sections.
    """
    doc = "Just a summary line with no sections."
    codeflash_output = _infer_docstring_style(doc) # 37.2μs -> 4.88μs (662% faster)

def test_only_summary_and_blank_lines():
    """
    Test docstring with only summary and blank lines.
    """
    doc = "\n\nA summary.\n\n"
    codeflash_output = _infer_docstring_style(doc) # 39.8μs -> 7.79μs (411% faster)

def test_multiple_styles_present_prefers_first_match():
    """
    Test docstring with both Sphinx and Numpy sections; should match Sphinx first.
    """
    doc = """
    Summary.

    :param x: X value.
    :returns: Result.

    Parameters
    ----------
    x : int
        Description.
    """
    # Sphinx pattern is checked before Numpy, so should return 'sphinx'
    codeflash_output = _infer_docstring_style(doc) # 3.38μs -> 1.21μs (179% faster)

def test_indented_sections():
    """
    Test docstring with indented section headers.
    """
    doc = """
    Summary.

        Args:
            foo (int): foo argument
    """
    codeflash_output = _infer_docstring_style(doc) # 17.7μs -> 5.04μs (250% faster)

def test_colon_in_text_not_section():
    """
    Test docstring with colons in text but not as section headers.
    """
    doc = """
    This function: does something.
    It is very: important.
    """
    codeflash_output = _infer_docstring_style(doc) # 42.8μs -> 10.6μs (303% faster)

def test_sphinx_section_with_extra_whitespace():
    """
    Test Sphinx style with extra whitespace and tabs.
    """
    doc = """
    Summary.

    :param    x   :   The x value.
    :returns  :   True if successful.
    """
    codeflash_output = _infer_docstring_style(doc) # 46.3μs -> 13.8μs (235% faster)

def test_google_section_with_extra_whitespace():
    """
    Test Google style with extra whitespace.
    """
    doc = """
    Summary.

    Args:    x (int): x value

        y (str): y value
    """
    codeflash_output = _infer_docstring_style(doc) # 47.2μs -> 15.5μs (205% faster)

def test_numpy_section_with_short_underline():
    """
    Test Numpy style with short underline.
    """
    doc = """
    Parameters
    ----
    x : int
        Description.
    """
    codeflash_output = _infer_docstring_style(doc) # 37.5μs -> 11.4μs (228% faster)

def test_numpy_section_with_long_underline_and_spaces():
    """
    Test Numpy style with long underline and leading spaces.
    """
    doc = """
    Returns
        -------
    bool
        Result.
    """
    codeflash_output = _infer_docstring_style(doc) # 39.6μs -> 12.4μs (219% faster)

def test_sphinx_section_case_insensitive():
    """
    Test Sphinx style section headers with different casing.
    """
    doc = """
    :PARAM x: value
    :RETURNS: result
    """
    codeflash_output = _infer_docstring_style(doc) # 3.25μs -> 1.21μs (169% faster)

def test_google_section_case_insensitive():
    """
    Test Google style section headers with different casing.
    """
    doc = """
    a summary.

    ARGS:
        foo (int): foo
    """
    codeflash_output = _infer_docstring_style(doc) # 17.2μs -> 4.75μs (262% faster)

def test_numpy_section_case_insensitive():
    """
    Test Numpy style section headers with different casing.
    """
    doc = """
    PARAMETERS
    ----------
    foo : int
        foo
    """
    codeflash_output = _infer_docstring_style(doc) # 37.6μs -> 11.4μs (230% faster)

def test_section_headers_in_middle_of_text():
    """
    Test docstring with section-like words in the middle of lines.
    """
    doc = """
    This function parameters are x and y.
    It returns the result.
    """
    codeflash_output = _infer_docstring_style(doc) # 43.2μs -> 10.8μs (301% faster)

def test_sphinx_with_multiple_colons():
    """
    Test Sphinx style with multiple colons in a line.
    """
    doc = """
    :param x: foo: bar: baz
    """
    codeflash_output = _infer_docstring_style(doc) # 3.12μs -> 1.12μs (178% faster)

def test_google_with_multiple_colons():
    """
    Test Google style with multiple colons in a line.
    """
    doc = """
    Args: foo: bar: baz
        x (int): description
    """
    codeflash_output = _infer_docstring_style(doc) # 16.8μs -> 4.08μs (311% faster)

def test_numpy_with_multiple_hyphens():
    """
    Test Numpy style with more than three hyphens in the underline.
    """
    doc = """
    Parameters
    -------------
    x : int
        description
    """
    codeflash_output = _infer_docstring_style(doc) # 37.2μs -> 11.8μs (216% faster)

def test_section_headers_with_tabs():
    """
    Test section headers with tabs instead of spaces.
    """
    doc = """
    \tArgs:
    \t\tx (int): description
    """
    codeflash_output = _infer_docstring_style(doc) # 16.7μs -> 4.08μs (309% faster)

def test_section_headers_with_mixed_whitespace():
    """
    Test section headers with mixed tabs and spaces.
    """
    doc = """
        Parameters
        ----------
        x : int
            description
    """
    codeflash_output = _infer_docstring_style(doc) # 40.0μs -> 14.0μs (185% faster)

def test_sphinx_with_raise_and_raises():
    """
    Test Sphinx style with both :raise: and :raises:.
    """
    doc = """
    :raise ValueError: if bad
    :raises TypeError: if worse
    """
    codeflash_output = _infer_docstring_style(doc) # 14.3μs -> 4.08μs (250% faster)

def test_google_with_keyword_arguments():
    """
    Test Google style with 'Keyword Arguments' section.
    """
    doc = """
    Keyword Arguments:
        foo (str): description
    """
    codeflash_output = _infer_docstring_style(doc) # 20.5μs -> 5.42μs (278% faster)

def test_numpy_with_deprecated_section():
    """
    Test Numpy style with 'Deprecated' section.
    """
    doc = """
    Deprecated
    ----------
    This function will be removed.
    """
    codeflash_output = _infer_docstring_style(doc) # 35.4μs -> 9.46μs (274% faster)

def test_sphinx_with_exception_section():
    """
    Test Sphinx style with :exception: section.
    """
    doc = """
    :exception ValueError: if bad
    """
    codeflash_output = _infer_docstring_style(doc) # 15.8μs -> 3.92μs (303% faster)

def test_sphinx_with_except_section():
    """
    Test Sphinx style with :except: section.
    """
    doc = """
    :except Exception: on error
    """
    codeflash_output = _infer_docstring_style(doc) # 14.9μs -> 3.54μs (320% faster)

def test_google_with_examples_section():
    """
    Test Google style with 'Examples' section.
    """
    doc = """
    Examples:
        >>> foo()
        bar
    """
    codeflash_output = _infer_docstring_style(doc) # 29.4μs -> 8.62μs (241% faster)

def test_google_with_methods_section():
    """
    Test Google style with 'Methods' section.
    """
    doc = """
    Methods:
        foo: does foo
    """
    codeflash_output = _infer_docstring_style(doc) # 30.6μs -> 7.67μs (299% faster)

def test_numpy_with_methods_section():
    """
    Test Numpy style with 'Methods' section and underline.
    """
    doc = """
    Methods
    ------
    foo
        does foo
    """
    codeflash_output = _infer_docstring_style(doc) # 44.8μs -> 13.1μs (242% faster)

def test_sphinx_with_vartype_section():
    """
    Test Sphinx style with :vartype: section.
    """
    doc = """
    :vartype x: int
    """
    codeflash_output = _infer_docstring_style(doc) # 11.0μs -> 2.71μs (305% faster)

def test_sphinx_with_ivar_section():
    """
    Test Sphinx style with :ivar: section.
    """
    doc = """
    :ivar x: description
    """
    codeflash_output = _infer_docstring_style(doc) # 9.29μs -> 2.25μs (313% faster)

def test_sphinx_with_cvar_section():
    """
    Test Sphinx style with :cvar: section.
    """
    doc = """
    :cvar x: description
    """
    codeflash_output = _infer_docstring_style(doc) # 9.88μs -> 2.50μs (295% faster)

def test_sphinx_with_var_section():
    """
    Test Sphinx style with :var: section.
    """
    doc = """
    :var x: description
    """
    codeflash_output = _infer_docstring_style(doc) # 8.54μs -> 2.08μs (310% faster)

def test_google_with_warns_section():
    """
    Test Google style with 'Warns' section.
    """
    doc = """
    Warns:
        UserWarning: if something is odd
    """
    codeflash_output = _infer_docstring_style(doc) # 32.9μs -> 8.42μs (291% faster)

def test_numpy_with_warns_section():
    """
    Test Numpy style with 'Warns' section and underline.
    """
    doc = """
    Warns
    -----
    UserWarning
        If something is odd.
    """
    codeflash_output = _infer_docstring_style(doc) # 42.8μs -> 13.0μs (228% faster)

def test_sphinx_with_type_section():
    """
    Test Sphinx style with :type: section.
    """
    doc = """
    :type x: int
    """
    codeflash_output = _infer_docstring_style(doc) # 7.79μs -> 2.12μs (267% faster)

def test_sphinx_with_rtype_section():
    """
    Test Sphinx style with :rtype: section.
    """
    doc = """
    :rtype: int
    """
    codeflash_output = _infer_docstring_style(doc) # 12.6μs -> 2.88μs (339% faster)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_google_docstring():
    """
    Test a large Google style docstring with many arguments.
    """
    args_section = "\n".join([f"    arg{i} (int): description" for i in range(500)])
    doc = f"""
    Summary.

    Args:
{args_section}

    Returns:
        int: result
    """
    codeflash_output = _infer_docstring_style(doc) # 215μs -> 202μs (6.41% faster)

def test_large_numpy_docstring():
    """
    Test a large Numpy style docstring with many parameters.
    """
    params_section = "\n".join([f"arg{i} : int\n    description" for i in range(500)])
    doc = f"""
    Summary.

    Parameters
    ----------
{params_section}

    Returns
    -------
    int
        result
    """
    codeflash_output = _infer_docstring_style(doc) # 974μs -> 937μs (3.91% faster)

def test_large_sphinx_docstring():
    """
    Test a large Sphinx style docstring with many :param: fields.
    """
    params_section = "\n".join([f":param arg{i}: description" for i in range(500)])
    doc = f"""
    Summary.

{params_section}

    :returns: result
    """
    codeflash_output = _infer_docstring_style(doc) # 3.38μs -> 1.21μs (179% faster)

def test_large_mixed_docstring_prefers_sphinx():
    """
    Test a large docstring with both Sphinx and Numpy sections; Sphinx is matched first.
    """
    sphinx_section = "\n".join([f":param arg{i}: description" for i in range(300)])
    numpy_section = "\n".join([f"arg{i} : int\n    description" for i in range(300)])
    doc = f"""
    Summary.

{sphinx_section}

    Parameters
    ----------
{numpy_section}

    :returns: result
    """
    # Sphinx pattern is checked before Numpy, so should return 'sphinx'
    codeflash_output = _infer_docstring_style(doc) # 3.33μs -> 1.12μs (196% faster)

def test_large_mixed_docstring_prefers_numpy():
    """
    Test a large docstring with only Numpy and Google sections; Numpy is matched first.
    """
    numpy_section = "\n".join([f"arg{i} : int\n    description" for i in range(300)])
    google_section = "\n".join([f"    arg{i} (int): description" for i in range(300)])
    doc = f"""
    Summary.

    Parameters
    ----------
{numpy_section}

    Args:
{google_section}

    Returns:
        int: result
    """
    # Numpy pattern is checked before Google, so should return 'numpy'
    codeflash_output = _infer_docstring_style(doc) # 332μs -> 317μs (4.81% faster)

def test_large_unmatched_docstring():
    """
    Test a large docstring with no recognizable sections; should fallback to 'google'.
    """
    lines = "\n".join([f"This is line {i} of text." for i in range(800)])
    doc = f"""
    Summary.

{lines}
    """
    codeflash_output = _infer_docstring_style(doc) # 739μs -> 732μs (0.933% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from pydantic_ai._griffe import _infer_docstring_style

def test__infer_docstring_style():
    _infer_docstring_style('')

To edit these changes git checkout codeflash/optimize-_infer_docstring_style-mdeycsff and push.

Codeflash

Here’s how to optimize your `_infer_docstring_style` function for both speed and memory usage.

- **Avoid generator usage with `any()` for inner loop**: Instead of using a generator expression (which creates a generator and then iterates in `any()`), a simple `for` loop with early break is slightly faster and allows us to exit on the first match directly.
- **Pre-compile patterns**: Compiling the regex patterns at function runtime wastes time. For maximum speed, these should be compiled once. Since the `_docstring_style_patterns` data comes from a read-only module, we will compile on demand within the function, but cache them locally with a simple `dict` for future calls (i.e., LRU caching for compiled regex).
- **Minimize `.format` calls**: Pre-formatting patterns (for all replacements) and re-using if this function is called many times.




**Notes**.
- We introduced a module-level `_regex_cache` dict to ensure each compiled regex is re-used, speeding up repeated style checks.
- The nested loop is now more explicit and will short-circuit on the first found match, ensuring fewer total regex searches.
- All behaviors and types remain unchanged.  

This version is optimal for both single calls and repeated calls (where the caching shines).
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 22, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 22, 2025 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants