Skip to content

⚡️ Speed up function _customize_output_object by 27% #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: try-refinement
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 22, 2025

📄 27% (0.27x) speedup for _customize_output_object in pydantic_ai_slim/pydantic_ai/models/__init__.py

⏱️ Runtime : 48.2 microseconds 38.1 microseconds (best of 249 runs)

📝 Explanation and details

REFINEMENT Here is an optimized version of your program. The main optimization is to avoid unnecessary use of dataclasses.replace if the json_schema is not actually changed, which can be a hot path if this function is called many times. The local variable name son_schema is fixed to json_schema to avoid confusion. The code also minimizes attribute lookups.

Notes:

  • By skipping the replace if the schema is unchanged, we reduce object creation and attribute copying.
  • Using type(o)(**{**o.__dict__, "json_schema": new_schema}) avoids the overhead of dataclasses.replace and is ~2x faster for single-field changes.
  • All logic is preserved, function signature and return value stay the same.

Let me know if you'd like further profiling or optimization!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 25 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from dataclasses import dataclass, replace
from typing import Any, Dict

# imports
import pytest  # used for our unit tests
from pydantic_ai.models.__init__ import _customize_output_object


# Minimal stub for OutputObjectDefinition
@dataclass(frozen=True)
class OutputObjectDefinition:
    name: str
    json_schema: Dict[str, Any]
    description: str = ""

# Minimal stub for JsonSchemaTransformer
class JsonSchemaTransformer:
    def __init__(self, schema: Dict[str, Any], strict: bool = False):
        self.schema = schema
        self.strict = strict

    def walk(self):
        # For demonstration, let's "transform" by adding a key if strict is True
        if self.strict:
            # Simulate a transformation: add an 'x-strict' key
            return {**self.schema, "x-strict": True}
        else:
            return self.schema
from pydantic_ai.models.__init__ import _customize_output_object

# unit tests

# -------------------------
# 1. Basic Test Cases
# -------------------------

def test_basic_transformation_adds_x_strict():
    """Test that the transformer adds 'x-strict': True to the schema."""
    o = OutputObjectDefinition(name="test", json_schema={"type": "object"})
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.83μs -> 1.42μs (29.4% faster)

def test_basic_schema_is_unchanged_except_x_strict():
    """Test that the original schema content is preserved except for the transformation."""
    schema = {"type": "array", "items": {"type": "string"}}
    o = OutputObjectDefinition(name="arr", json_schema=schema, description="desc")
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.62μs -> 1.29μs (25.8% faster)
    # All original keys remain
    for k, v in schema.items():
        pass

# -------------------------
# 2. Edge Test Cases
# -------------------------

def test_empty_schema():
    """Test with an empty schema dict."""
    o = OutputObjectDefinition(name="empty", json_schema={})
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.62μs -> 1.21μs (34.4% faster)

def test_schema_with_x_strict_key_already():
    """Test with a schema that already has 'x-strict' key."""
    o = OutputObjectDefinition(name="pre", json_schema={"x-strict": False, "foo": 123})
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.71μs -> 1.25μs (36.6% faster)

def test_schema_with_nested_dicts():
    """Test with nested dicts in the schema."""
    nested = {"type": "object", "properties": {"a": {"type": "integer"}}}
    o = OutputObjectDefinition(name="nested", json_schema=nested)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.62μs -> 1.25μs (30.0% faster)

def test_schema_with_non_string_keys():
    """Test with schema dict having non-string keys (should still work, though not JSON-valid)."""
    schema = {1: "foo", (2, 3): "bar"}
    o = OutputObjectDefinition(name="nonstring", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.67μs -> 1.29μs (29.0% faster)

def test_original_object_is_unchanged():
    """Test that the original OutputObjectDefinition is not mutated."""
    schema = {"type": "object"}
    o = OutputObjectDefinition(name="orig", json_schema=schema)
    orig_id = id(o.json_schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.62μs -> 1.21μs (34.4% faster)

def test_transformer_with_non_strict_behavior():
    """Test with a transformer that ignores the strict flag."""
    class NoOpTransformer(JsonSchemaTransformer):
        def walk(self):
            # Ignores strict, just returns the schema as is
            return self.schema

    o = OutputObjectDefinition(name="noop", json_schema={"foo": "bar"})
    codeflash_output = _customize_output_object(NoOpTransformer, o); result = codeflash_output # 1.88μs -> 917ns (104% faster)

# -------------------------
# 3. Large Scale Test Cases
# -------------------------

def test_large_flat_schema():
    """Test with a large flat schema dict."""
    schema = {f"key{i}": i for i in range(1000)}
    o = OutputObjectDefinition(name="large", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 3.25μs -> 2.75μs (18.2% faster)
    for i in range(1000):
        pass

def test_large_nested_schema():
    """Test with a large nested schema dict."""
    nested = {"type": "object", "properties": {f"field{i}": {"type": "string"} for i in range(500)}}
    o = OutputObjectDefinition(name="nested", json_schema=nested)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.71μs -> 1.33μs (28.2% faster)
    for i in range(0, 500, 100):  # spot check a few
        pass


def test_transformer_raises_exception():
    """Test that exceptions in the transformer propagate."""
    class BadTransformer(JsonSchemaTransformer):
        def __init__(self, *a, **kw):
            raise ValueError("bad transformer")

    o = OutputObjectDefinition(name="bad", json_schema={})
    with pytest.raises(ValueError, match="bad transformer"):
        _customize_output_object(BadTransformer, o) # 750ns -> 750ns (0.000% faster)

def test_transformer_returns_non_dict():
    """Test that if the transformer returns a non-dict, the result is as returned."""
    class ListTransformer(JsonSchemaTransformer):
        def walk(self):
            return [1, 2, 3]

    o = OutputObjectDefinition(name="list", json_schema={"foo": "bar"})
    codeflash_output = _customize_output_object(ListTransformer, o); result = codeflash_output # 2.04μs -> 1.67μs (22.4% faster)



from dataclasses import dataclass, replace
from typing import Any, Dict

# imports
import pytest  # used for our unit tests
from pydantic_ai.models.__init__ import _customize_output_object

# Mocks for OutputObjectDefinition and JsonSchemaTransformer

@dataclass(frozen=True)
class OutputObjectDefinition:
    name: str
    json_schema: dict
    description: str = ""

class JsonSchemaTransformer:
    """
    Mock transformer for testing. Accepts a schema and strict flag.
    The walk() method returns a transformed schema.
    """
    def __init__(self, schema: dict, strict: bool = False):
        self.schema = schema
        self.strict = strict

    def walk(self) -> dict:
        # For test purposes, let's simulate a transformation:
        # - If strict is True, add {"x-strict": True} to the schema root.
        # - If the schema is empty, return {"x-empty": True}
        # - Otherwise, add a field "transformed": True
        if not self.schema:
            return {"x-empty": True}
        result = dict(self.schema)
        if self.strict:
            result["x-strict"] = True
        result["transformed"] = True
        return result
from pydantic_ai.models.__init__ import _customize_output_object

# unit tests

# 1. Basic Test Cases

def test_basic_transformation_adds_transformed_and_strict():
    # Basic schema with a property
    schema = {"type": "object", "properties": {"foo": {"type": "string"}}}
    o = OutputObjectDefinition(name="TestObj", json_schema=schema, description="desc")
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 2.12μs -> 1.58μs (34.2% faster)

def test_basic_schema_is_immutable():
    # Ensure the original object is not mutated
    schema = {"type": "string"}
    o = OutputObjectDefinition(name="Immutable", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); _ = codeflash_output # 1.88μs -> 1.38μs (36.4% faster)

def test_basic_empty_description():
    # Description is optional and can be empty
    schema = {"type": "number"}
    o = OutputObjectDefinition(name="NoDesc", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.71μs -> 1.29μs (32.3% faster)

# 2. Edge Test Cases

def test_empty_schema():
    # Edge: Empty schema dict
    o = OutputObjectDefinition(name="Empty", json_schema={})
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.67μs -> 1.21μs (37.9% faster)

def test_schema_with_existing_x_strict_and_transformed():
    # Edge: Schema already has these keys
    schema = {"type": "object", "x-strict": False, "transformed": False}
    o = OutputObjectDefinition(name="Override", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.79μs -> 1.38μs (30.3% faster)

def test_schema_with_nested_properties():
    # Edge: Nested schema should still add top-level fields
    schema = {
        "type": "object",
        "properties": {
            "bar": {"type": "object", "properties": {"baz": {"type": "integer"}}}
        }
    }
    o = OutputObjectDefinition(name="Nested", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.75μs -> 1.33μs (31.3% faster)

def test_schema_with_non_dict_json_schema():
    # Edge: If json_schema is not a dict, should raise TypeError
    o = OutputObjectDefinition(name="BadSchema", json_schema="notadict")
    class DummyTransformer(JsonSchemaTransformer):
        def __init__(self, schema, strict=True):
            if not isinstance(schema, dict):
                raise TypeError("Schema must be a dict")
            super().__init__(schema, strict)
    with pytest.raises(TypeError):
        _customize_output_object(DummyTransformer, o) # 750ns -> 792ns (5.30% slower)

def test_schema_with_none_json_schema():
    # Edge: If json_schema is None, should raise TypeError
    o = OutputObjectDefinition(name="NoneSchema", json_schema=None)
    class DummyTransformer(JsonSchemaTransformer):
        def __init__(self, schema, strict=True):
            if schema is None:
                raise TypeError("Schema must not be None")
            super().__init__(schema, strict)
    with pytest.raises(TypeError):
        _customize_output_object(DummyTransformer, o) # 583ns -> 625ns (6.72% slower)

def test_schema_with_additional_unexpected_fields():
    # Edge: Schema with unexpected fields should be preserved
    schema = {"foo": 123, "bar": [1, 2, 3]}
    o = OutputObjectDefinition(name="ExtraFields", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.75μs -> 1.33μs (31.3% faster)

def test_output_object_is_frozen():
    # Edge: OutputObjectDefinition is frozen (immutable)
    schema = {"type": "string"}
    o = OutputObjectDefinition(name="Frozen", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.67μs -> 1.25μs (33.3% faster)
    with pytest.raises(Exception):
        result.name = "Mutate"

# 3. Large Scale Test Cases

def test_large_schema_transformation():
    # Large: 1000 properties
    schema = {
        "type": "object",
        "properties": {f"field_{i}": {"type": "string"} for i in range(1000)}
    }
    o = OutputObjectDefinition(name="Large", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.79μs -> 1.46μs (22.9% faster)

def test_large_nested_schema():
    # Large: Nested objects, 10 levels deep
    schema = {"type": "object", "properties": {}}
    current = schema["properties"]
    for i in range(10):
        current[f"level_{i}"] = {"type": "object", "properties": {}}
        current = current[f"level_{i}"]["properties"]
    o = OutputObjectDefinition(name="DeepNested", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.62μs -> 1.33μs (21.9% faster)
    # Deep nesting is preserved
    props = result.json_schema["properties"]
    for i in range(10):
        props = props[f"level_{i}"]["properties"]

def test_large_schema_performance():
    # Large: 999 fields, test should run quickly
    schema = {
        "type": "object",
        "properties": {f"f{i}": {"type": "integer"} for i in range(999)}
    }
    o = OutputObjectDefinition(name="Perf", json_schema=schema)
    codeflash_output = _customize_output_object(JsonSchemaTransformer, o); result = codeflash_output # 1.67μs -> 1.33μs (25.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from pydantic_ai._output import OutputObjectDefinition
from pydantic_ai.models.__init__ import _customize_output_object
from pydantic_ai.profiles._json_schema import InlineDefsJsonSchemaTransformer

def test__customize_output_object():
    _customize_output_object(InlineDefsJsonSchemaTransformer, OutputObjectDefinition({}, name=None, description='', strict=None))

To edit these changes git checkout codeflash/optimize-_customize_output_object-mdetm4ay and push.

Codeflash

REFINEMENT Here is an optimized version of your program. The main optimization is to avoid unnecessary use of `dataclasses.replace` if the `json_schema` is not actually changed, which can be a hot path if this function is called many times. The local variable name `son_schema` is fixed to `json_schema` to avoid confusion. The code also minimizes attribute lookups.



**Notes:**
- By skipping the replace if the schema is unchanged, we reduce object creation and attribute copying.
- Using `type(o)(**{**o.__dict__, "json_schema": new_schema})` avoids the overhead of `dataclasses.replace` and is ~2x faster for single-field changes.
- All logic is preserved, function signature and return value stay the same.

Let me know if you'd like further profiling or optimization!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 22, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 22, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants