Skip to content

⚡️ Speed up method GoogleJsonSchemaTransformer.transform by 70,084% #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: debug2
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 28, 2025

📄 70,084% (700.84x) speedup for GoogleJsonSchemaTransformer.transform in pydantic_ai_slim/pydantic_ai/profiles/google.py

⏱️ Runtime : 43.2 milliseconds 61.5 microseconds (best of 83 runs)

📝 Explanation and details

Here is an optimized version of your program with the following key speedups.

  • Removed unnecessary dict copies and creation (e.g., construct original_schema only if needed, and do it in-place for the warning).
  • Minimizing calls to dict.pop: Instead of repeated calls, batch removals when possible, and factor out pops which are not dependent on branch conditions into a single loop.
  • Faster enum conversion: Inline and avoid extra assignment/get calls.
  • Avoiding usage of intermediate containers and unnecessary checks: For example, the use of lists and membership tests during prefixItems consolidation is avoided in favor of direct processing for the most common single-item case.
  • Refined warning logic: Only constructs the full original_schema if and when the warning will be emitted.
  • Removed the forced time.sleep during transform, as it serves no functionality in production code (except for artificial debugging, which should not be present in optimized code).

All function signatures and return values remain precisely the same, and all docstrings and non-internal comments are preserved per the requirement.

Notes:

  • The method is now branch-short-circuited and cache-friendly, reducing the number of passes over the schema dict.
  • No unnecessary creation of objects or calls are performed.
  • The artificial time.sleep(0.0010) has been omitted for production performance, but can be returned if you specifically want to keep simulation of delay.
  • All comments relevant to logic are preserved.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 50 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import time
import warnings
from abc import ABC
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.exceptions import UserError
from pydantic_ai.profiles._json_schema import JsonSchema, JsonSchemaTransformer
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer

JsonSchema = dict[str, Any]

@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    """Walks a JSON schema, applying transformations to it at each level."""
    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema
        self.strict = strict
        self.is_strict_compatible = True
        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions
        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()


# unit tests

@pytest.fixture
def transformer():
    # Provide a default transformer with an empty schema for tests
    return GoogleJsonSchemaTransformer({})

# 1. BASIC TEST CASES

def test_remove_title(transformer):
    # Should remove 'title' from schema
    schema = {'type': 'string', 'title': 'MyTitle'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.07ms -> 1.00μs (106421% faster)

def test_remove_default(transformer):
    # Should remove 'default' from schema
    schema = {'type': 'integer', 'default': 42}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.08ms -> 875ns (122838% faster)

def test_remove_schema(transformer):
    # Should remove '$schema' from schema
    schema = {'type': 'string', '$schema': 'http://json-schema.org/draft-07/schema#'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.18ms -> 833ns (140961% faster)


def test_remove_examples(transformer):
    # Should remove 'examples' from schema
    schema = {'type': 'string', 'examples': ['foo', 'bar']}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 834ns (150260% faster)

def test_remove_exclusive_max_min(transformer):
    # Should remove 'exclusiveMaximum' and 'exclusiveMinimum'
    schema = {'type': 'number', 'exclusiveMaximum': 10, 'exclusiveMinimum': 1}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 958ns (130833% faster)



def test_enum_already_string(transformer):
    # Should keep string enums as strings
    schema = {'type': 'string', 'enum': ['a', 'b']}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.33μs (94008% faster)

def test_format_moves_to_description(transformer):
    # Should move 'format' to description if description exists
    schema = {'type': 'string', 'format': 'date-time', 'description': 'A date'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.02ms -> 1.12μs (90630% faster)

def test_format_moves_to_description_no_desc(transformer):
    # Should add description if none exists
    schema = {'type': 'string', 'format': 'uuid'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.00μs (125325% faster)

# 2. EDGE TEST CASES

def test_additional_properties_warns_and_removes(transformer):
    # Should warn and remove 'additionalProperties'
    schema = {'type': 'object', 'additionalProperties': False}
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.19ms -> 916ns (130113% faster)


def test_oneof_not_converted_if_type_present(transformer):
    # Should not convert 'oneOf' to 'anyOf' if 'type' present
    schema = {'type': 'object', 'oneOf': [{'type': 'string'}, {'type': 'integer'}]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.15ms -> 750ns (153178% faster)

def test_ref_raises_user_error(transformer):
    # Should raise UserError if '$ref' present
    schema = {'type': 'object', '$ref': '#/definitions/foo'}
    with pytest.raises(UserError):
        transformer.transform(schema.copy()) # 1.03ms -> 1.96μs (52520% faster)


def test_prefix_items_with_items(transformer):
    # Should merge 'items' and 'prefixItems' correctly
    schema = {
        'type': 'array',
        'items': {'type': 'string'},
        'prefixItems': [{'type': 'string'}, {'type': 'integer'}]
    }
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.58μs (79185% faster)

def test_prefix_items_single(transformer):
    # Should set items to the single prefixItem if only one unique
    schema = {'type': 'array', 'prefixItems': [{'type': 'string'}]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.21ms -> 1.04μs (116255% faster)


def test_no_op_schema(transformer):
    # Should not modify schema with no special fields
    schema = {'type': 'boolean'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 667ns (187906% faster)

def test_enum_and_const(transformer):
    # Should prioritize const over enum if both present
    schema = {'type': 'integer', 'const': 1, 'enum': [1, 2, 3]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.33μs (94064% faster)

# 3. LARGE SCALE TEST CASES



def test_large_schema_with_all_features(transformer):
    # Should process a schema with many fields and features
    schema = {
        'type': 'object',
        'title': 'Big',
        'default': {},
        'discriminator': 'type',
        'examples': [{'foo': 1}],
        'exclusiveMaximum': 10,
        'exclusiveMinimum': 0,
        'properties': {
            'a': {'type': 'integer', 'enum': list(range(500))},
            'b': {'type': 'string', 'format': 'email', 'description': 'Email'},
            'c': {'type': 'array', 'prefixItems': [{'type': 'string'}, {'type': 'integer'}]},
        },
        'additionalProperties': False,
    }
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.25μs (100217% faster)
        # All removable fields should be gone
        for field in ['title', 'default', 'discriminator', 'examples', 'exclusiveMaximum', 'exclusiveMinimum', 'additionalProperties']:
            pass

def test_large_noop_schema(transformer):
    # Should not modify large schema with only simple fields
    schema = {'type': 'object', 'properties': {str(i): {'type': 'integer'} for i in range(1000)}}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.17ms -> 833ns (140721% faster)
    for v in result['properties'].values():
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import time
import warnings
from abc import ABC
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer


# Exception used in transform
class UserError(Exception):
    pass

# Type alias for JSON Schema
JsonSchema = dict[str, Any]

# Minimal base transformer class
@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema
        self.strict = strict
        self.is_strict_compatible = True
        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions
        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer


# Helper to instantiate transformer
def transform(schema: JsonSchema) -> JsonSchema:
    return GoogleJsonSchemaTransformer(schema.copy()).transform(schema.copy())

# unit tests

# ---------------- BASIC TEST CASES ----------------

























from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer

def test_GoogleJsonSchemaTransformer_transform():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'const': ''})

def test_GoogleJsonSchemaTransformer_transform_2():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'additionalProperties': '\x00'})

def test_GoogleJsonSchemaTransformer_transform_3():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'default': '', '': 0, 'oneOf': 0})

To edit these changes git checkout codeflash/optimize-GoogleJsonSchemaTransformer.transform-mdmicwpe and push.

Codeflash

Here is an optimized version of your program with the following key speedups.

- **Removed unnecessary dict copies and creation** (e.g., construct original_schema only if needed, and do it in-place for the warning).
- **Minimizing calls to `dict.pop`**: Instead of repeated calls, batch removals when possible, and factor out pops which are not dependent on branch conditions into a single loop.
- **Faster enum conversion**: Inline and avoid extra assignment/`get` calls.
- **Avoiding usage of intermediate containers and unnecessary checks**: For example, the use of lists and membership tests during prefixItems consolidation is avoided in favor of direct processing for the most common single-item case.
- **Refined warning logic**: Only constructs the full original_schema if and when the warning will be emitted.
- **Removed the forced `time.sleep` during transform**, as it serves no functionality in production code (except for artificial debugging, which should not be present in optimized code).

**All function signatures and return values remain precisely the same, and all docstrings and non-internal comments are preserved per the requirement.**



**Notes:**
- The method is now branch-short-circuited and cache-friendly, reducing the number of passes over the `schema` dict.
- No unnecessary creation of objects or calls are performed.
- The artificial `time.sleep(0.0010)` has been omitted for production performance, but can be returned if you specifically want to keep simulation of delay.
- All comments relevant to logic are preserved.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 28, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 July 28, 2025 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants