Skip to content

⚡️ Speed up method GoogleJsonSchemaTransformer.transform by 11,721% #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: debug2
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 28, 2025

📄 11,721% (117.21x) speedup for GoogleJsonSchemaTransformer.transform in pydantic_ai_slim/pydantic_ai/profiles/google.py

⏱️ Runtime : 61.7 milliseconds 522 microseconds (best of 150 runs)

📝 Explanation and details

Here is a faster version of your program with runtime optimizations and improved memory usage, especially around dict operations.
Key improvements:

  • Use dict.pop() only once where possible (avoid double lookups).
  • Avoid unnecessary dict copying (removal of original_schema dict copy).
  • Store get lookups in locals once if used several times.
  • Remove redundant and slow type checks (eg, popping keys that likely are not present).
  • Micro-optimize enum string conversion loop to avoid repeated method resolution.
  • Remove time.sleep(0.0010) which artificially slows down code unless it's purposefully throttling.
  • Minimize setdefault calls and branch deeper only if actually needed.

All functionality and output is preserved. Comments on changed code have been minimally updated to match reality.

Summary:

  • All hotspots (dict pop, unnecessary lookups, repeated deletes, avoidable object creation) have been streamlined, branch prediction made more favorable, and code redundancy reduced — for better runtime and memory.
  • The time.sleep removal gives an immediate speedup. If it's needed for backoff/throttling etc, re-insert as needed.
  • Output and semantics are 100% preserved.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 101 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import time
import warnings
from abc import ABC
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.exceptions import UserError
from pydantic_ai.profiles._json_schema import JsonSchema, JsonSchemaTransformer
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer

JsonSchema = dict[str, Any]


@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    """Walks a JSON schema, applying transformations to it at each level.

    Note: We may eventually want to rework tools to build the JSON schema from the type directly, using a subclass of
    pydantic.json_schema.GenerateJsonSchema, rather than making use of this machinery.
    """

    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema

        self.strict = strict
        self.is_strict_compatible = True  # Can be set to False by subclasses to set `strict` on `ToolDefinition` when set not set by user explicitly

        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions

        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()


# unit tests

@pytest.fixture
def transformer():
    # Provide a default transformer for tests; schema is not used in .transform
    return GoogleJsonSchemaTransformer(schema={})

# ---------------------------
# Basic Test Cases
# ---------------------------

def test_remove_title(transformer):
    # Should remove 'title' from schema
    schema = {'type': 'string', 'title': 'Test Title'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.06ms -> 834ns (126714% faster)

def test_remove_default(transformer):
    # Should remove 'default' from schema
    schema = {'type': 'integer', 'default': 42}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 833ns (150560% faster)

def test_remove_schema_keyword(transformer):
    # Should remove '$schema' from schema
    schema = {'type': 'number', '$schema': 'http://json-schema.org/draft-07/schema#'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 792ns (158412% faster)

def test_transform_const_to_enum(transformer):
    # Should convert 'const' to single-value 'enum' (as string)
    schema = {'type': 'integer', 'const': 5}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.50μs (83592% faster)

def test_remove_discriminator(transformer):
    # Should remove 'discriminator' from schema
    schema = {'type': 'object', 'discriminator': 'kind'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 833ns (150500% faster)

def test_remove_examples(transformer):
    # Should remove 'examples' from schema
    schema = {'type': 'string', 'examples': ['a', 'b']}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 792ns (158339% faster)

def test_remove_exclusive_maximum_and_minimum(transformer):
    # Should remove 'exclusiveMaximum' and 'exclusiveMinimum'
    schema = {'type': 'number', 'exclusiveMaximum': 10, 'exclusiveMinimum': 1}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 791ns (158518% faster)

def test_enum_values_are_strings(transformer):
    # Should convert enum values to strings and set type to string
    schema = {'type': 'integer', 'enum': [1, 2, 3]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.19ms -> 1.38μs (86718% faster)

def test_string_format_moves_to_description(transformer):
    # Should move 'format' to 'description' if type is string
    schema = {'type': 'string', 'format': 'date'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 917ns (136996% faster)

def test_string_format_appends_to_description(transformer):
    # Should append format to existing description
    schema = {'type': 'string', 'format': 'uuid', 'description': 'Identifier'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.12μs (111522% faster)

# ---------------------------
# Edge Test Cases
# ---------------------------

def test_additional_properties_warning(transformer):
    # Should warn and remove 'additionalProperties'
    schema = {'type': 'object', 'additionalProperties': False}
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 4.00μs (31267% faster)

def test_additional_properties_true(transformer):
    # Should warn and remove 'additionalProperties' even if True
    schema = {'type': 'object', 'additionalProperties': True}
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.09ms -> 3.00μs (36304% faster)

def test_ref_raises_user_error(transformer):
    # Should raise UserError if $ref is present
    schema = {'type': 'object', '$ref': '#/$defs/SomeDef'}
    with pytest.raises(UserError) as excinfo:
        transformer.transform(schema.copy()) # 1.03ms -> 1.46μs (70559% faster)

def test_oneof_to_anyof(transformer):
    # Should convert 'oneOf' to 'anyOf' if 'type' not present
    schema = {'oneOf': [{'type': 'string'}, {'type': 'number'}]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.01ms -> 1.04μs (97298% faster)

def test_prefix_items_to_items(transformer):
    # Should convert 'prefixItems' to 'items', set minItems/maxItems
    schema = {'type': 'array', 'prefixItems': [{'type': 'string'}, {'type': 'number'}]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.50μs (83656% faster)

def test_prefix_items_with_existing_items(transformer):
    # Should merge prefixItems with existing items
    schema = {
        'type': 'array',
        'prefixItems': [{'type': 'string'}, {'type': 'number'}],
        'items': {'type': 'string'}
    }
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.12ms -> 1.54μs (72414% faster)

def test_enum_and_const(transformer):
    # Should prioritize const over enum and convert to string
    schema = {'type': 'integer', 'enum': [1, 2], 'const': 2}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.04ms -> 1.42μs (73015% faster)

def test_no_changes(transformer):
    # Should leave schema unchanged if no special keys
    schema = {'type': 'boolean'}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 708ns (177101% faster)

def test_empty_schema(transformer):
    # Should handle empty schema dict
    schema = {}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 833ns (150570% faster)

def test_enum_with_strings(transformer):
    # Should keep string enums as strings
    schema = {'type': 'string', 'enum': ['a', 'b', 'c']}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.29μs (97004% faster)

def test_enum_with_mixed_types(transformer):
    # Should convert all enum values to strings
    schema = {'type': 'string', 'enum': [1, 'b', True, None]}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.29μs (97017% faster)

# ---------------------------
# Large Scale Test Cases
# ---------------------------

def test_large_enum(transformer):
    # Should handle large enums efficiently
    schema = {'type': 'integer', 'enum': list(range(1000))}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.29ms -> 39.8μs (3154% faster)

def test_large_prefix_items(transformer):
    # Should handle large prefixItems list
    prefix_items = [{'type': 'integer', 'const': i} for i in range(100)]
    schema = {'type': 'array', 'prefixItems': prefix_items}
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.36ms -> 89.1μs (1428% faster)
    # Each item should have been transformed (const->enum)
    for i in range(100):
        pass

def test_large_object_with_additional_properties(transformer):
    # Should handle large object with additionalProperties
    schema = {'type': 'object', 'properties': {f'field{i}': {'type': 'string'} for i in range(500)}, 'additionalProperties': False}
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.14ms -> 88.4μs (1188% faster)

def test_many_formats(transformer):
    # Should handle many fields with format
    for fmt in ['date', 'uuid', 'email', 'uri']:
        schema = {'type': 'string', 'format': fmt}
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 4.66ms -> 2.21μs (211119% faster)

def test_many_types(transformer):
    # Should handle many different types in one schema
    schema = {
        'oneOf': [
            {'type': 'string', 'format': 'email'},
            {'type': 'integer'},
            {'type': 'boolean'}
        ]
    }
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.13ms -> 875ns (129076% faster)

def test_large_nested_schema(transformer):
    # Should handle deeply nested schemas (but not recursive)
    schema = {
        'type': 'object',
        'properties': {
            'a': {'type': 'object', 'properties': {
                'b': {'type': 'object', 'properties': {
                    'c': {'type': 'string', 'format': 'date'}
                }}
            }}
        }
    }
    # Only the top-level keys are processed, so inner formats remain unless transform is called recursively
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 750ns (167194% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import time
import warnings
from abc import ABC
from dataclasses import dataclass
from typing import Any

# imports
import pytest  # used for our unit tests
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer


# Simulate UserError from pydantic_ai.exceptions
class UserError(Exception):
    pass

JsonSchema = dict[str, Any]

@dataclass(init=False)
class JsonSchemaTransformer(ABC):
    def __init__(
        self,
        schema: JsonSchema,
        *,
        strict: bool | None = None,
        prefer_inlined_defs: bool = False,
        simplify_nullable_unions: bool = False,
    ):
        self.schema = schema
        self.strict = strict
        self.is_strict_compatible = True
        self.prefer_inlined_defs = prefer_inlined_defs
        self.simplify_nullable_unions = simplify_nullable_unions
        self.defs: dict[str, JsonSchema] = self.schema.get('$defs', {})
        self.refs_stack: list[str] = []
        self.recursive_refs = set[str]()
from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer

# unit tests

# BASIC TEST CASES

def test_remove_title_and_default():
    """Should remove 'title' and 'default' fields from the schema."""
    schema = {'type': 'string', 'title': 'My Title', 'default': 'abc'}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.03ms -> 917ns (111923% faster)

def test_remove_schema_and_examples():
    """Should remove '$schema' and 'examples' fields from the schema."""
    schema = {'type': 'number', '$schema': 'http://json-schema.org/draft-07/schema#', 'examples': [1, 2, 3]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 958ns (130898% faster)

def test_remove_exclusive_maximum_minimum():
    """Should remove 'exclusiveMaximum' and 'exclusiveMinimum'."""
    schema = {'type': 'integer', 'exclusiveMaximum': 10, 'exclusiveMinimum': 1}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.24ms -> 834ns (148896% faster)

def test_const_to_enum():
    """Should convert 'const' to 'enum' with a single value."""
    schema = {'type': 'string', 'const': 'foo'}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.46μs (85968% faster)

def test_enum_to_string_enum():
    """Should convert enum values to strings and set type to 'string'."""
    schema = {'type': 'integer', 'enum': [1, 2, 3]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 1.38μs (91164% faster)

def test_format_moves_to_description():
    """Should append format to description or create it if not present."""
    # With description
    schema = {'type': 'string', 'format': 'email', 'description': 'User email'}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.06ms -> 1.17μs (90584% faster)
    # Without description
    schema2 = {'type': 'string', 'format': 'date-time'}
    transformer = GoogleJsonSchemaTransformer(schema2.copy())
    codeflash_output = transformer.transform(schema2.copy()); result2 = codeflash_output # 1.03ms -> 667ns (154304% faster)

def test_remove_discriminator():
    """Should remove 'discriminator' from the schema."""
    schema = {'type': 'object', 'discriminator': {'propertyName': 'type'}}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.11ms -> 792ns (140104% faster)

# EDGE TEST CASES

def test_additional_properties_warning():
    """Should remove 'additionalProperties' and issue a warning."""
    schema = {'type': 'object', 'additionalProperties': False}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.18ms -> 3.79μs (31075% faster)

def test_oneof_to_anyof():
    """Should convert 'oneOf' to 'anyOf' if type is not present."""
    schema = {'oneOf': [{'type': 'string'}, {'type': 'number'}]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 916ns (136786% faster)

def test_ref_raises_usererror():
    """Should raise UserError if '$ref' is present."""
    schema = {'type': 'object', '$ref': '#/$defs/Foo'}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    with pytest.raises(UserError) as exc:
        transformer.transform(schema.copy())

def test_prefix_items_to_items():
    """Should convert 'prefixItems' to 'items' and set minItems/maxItems."""
    # Case: items not present
    schema = {'type': 'array', 'prefixItems': [{'type': 'string'}, {'type': 'number'}]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.08ms -> 1.75μs (61838% faster)

    # Case: items present and matches one prefix
    schema2 = {'type': 'array', 'prefixItems': [{'type': 'string'}], 'items': {'type': 'string'}}
    transformer2 = GoogleJsonSchemaTransformer(schema2.copy())
    codeflash_output = transformer2.transform(schema2.copy()); result2 = codeflash_output # 1.10ms -> 1.08μs (101426% faster)

def test_prefix_items_with_duplicate_items():
    """Should deduplicate items in prefixItems when converting to items."""
    schema = {'type': 'array', 'prefixItems': [{'type': 'string'}, {'type': 'string'}]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.16ms -> 1.50μs (77314% faster)

def test_const_and_enum_both_present():
    """Should prioritize const over enum, converting const to enum and removing both original fields."""
    schema = {'type': 'string', 'const': 'foo', 'enum': ['foo', 'bar']}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.02ms -> 1.42μs (72089% faster)

def test_enum_with_non_string_types():
    """Should convert all enum values to strings, even if they're bool or None."""
    schema = {'type': 'boolean', 'enum': [True, False, None]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.11ms -> 1.29μs (86191% faster)

def test_schema_with_no_transformable_fields():
    """Should leave schema unchanged if no transformable fields are present."""
    schema = {'type': 'number', 'minimum': 0}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.06ms -> 750ns (141089% faster)

def test_items_and_prefix_items_both_none():
    """Should handle prefixItems when items is None."""
    schema = {'type': 'array', 'prefixItems': [{'type': 'string'}]}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.26ms -> 1.33μs (94124% faster)

# LARGE SCALE TEST CASES

def test_large_enum_list():
    """Should handle large enums efficiently and convert all to strings."""
    enum_values = list(range(500))
    schema = {'type': 'integer', 'enum': enum_values}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.25ms -> 21.1μs (5804% faster)

def test_large_prefix_items():
    """Should handle large prefixItems arrays."""
    prefix_items = [{'type': 'integer', 'minimum': i} for i in range(100)]
    schema = {'type': 'array', 'prefixItems': prefix_items}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.35ms -> 88.0μs (1435% faster)

def test_large_schema_with_many_fields():
    """Should process a schema with many fields, removing all forbidden keys."""
    schema = {
        'type': 'object',
        'properties': {f'field_{i}': {'type': 'string', 'title': f'Title {i}', 'default': f'default_{i}'} for i in range(200)},
        'title': 'Big Object',
        'default': {},
        'examples': [],
        'exclusiveMaximum': 1000,
        'exclusiveMinimum': 0,
        'const': None,
        'additionalProperties': True,
    }
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.42ms -> 138μs (927% faster)

def test_performance_large_schema(monkeypatch):
    """Should not be quadratic or slow for large schemas (simulate with time)."""
    # Patch time.sleep to speed up test
    monkeypatch.setattr(time, "sleep", lambda x: None)
    schema = {'type': 'object', 'properties': {f'k{i}': {'type': 'string'} for i in range(900)}}
    transformer = GoogleJsonSchemaTransformer(schema.copy())
    codeflash_output = transformer.transform(schema.copy()); result = codeflash_output # 1.08μs -> 792ns (36.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from pydantic_ai.profiles.google import GoogleJsonSchemaTransformer

def test_GoogleJsonSchemaTransformer_transform():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'const': ''})

def test_GoogleJsonSchemaTransformer_transform_2():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'additionalProperties': '\x00'})

def test_GoogleJsonSchemaTransformer_transform_3():
    GoogleJsonSchemaTransformer.transform(GoogleJsonSchemaTransformer({}, strict=None), {'default': '', '': 0, 'oneOf': 0})

To edit these changes git checkout codeflash/optimize-GoogleJsonSchemaTransformer.transform-mdmf76mv and push.

Codeflash

Here is a faster version of your program with runtime optimizations and improved memory usage, especially around dict operations.  
**Key improvements:**
- Use `dict.pop()` only once where possible (avoid double lookups).
- Avoid unnecessary dict copying (removal of `original_schema` dict copy).
- Store `get` lookups in locals once if used several times.
- Remove redundant and slow type checks (eg, popping keys that likely are not present).
- Micro-optimize enum string conversion loop to avoid repeated method resolution.
- Remove `time.sleep(0.0010)` which artificially slows down code unless it's purposefully throttling.
- Minimize `setdefault` calls and branch deeper only if actually needed.

**All functionality and output is preserved. Comments on changed code have been minimally updated to match reality.**



**Summary:**  
- All hotspots (dict pop, unnecessary lookups, repeated deletes, avoidable object creation) have been streamlined, branch prediction made more favorable, and code redundancy reduced — for better runtime and memory.
- The `time.sleep` removal gives an immediate speedup. If it's needed for backoff/throttling etc, re-insert as needed.
- Output and semantics are 100% preserved.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 28, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 July 28, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants