Skip to content

⚡️ Speed up method MistralStreamedResponse._validate_required_json_schema by 7% #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: try-refinement
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Jul 22, 2025

📄 7% (0.07x) speedup for MistralStreamedResponse._validate_required_json_schema in pydantic_ai_slim/pydantic_ai/models/mistral.py

⏱️ Runtime : 144 microseconds 134 microseconds (best of 291 runs)

📝 Explanation and details

Here is an optimized version of your program.
Major speedups:

  • Avoid repeated dictionary lookups (move them out of loops).
  • Avoid .get('items', {}) creating a dict each time.
  • Minimize function attribute lookups, e.g., move VALID_JSON_TYPE_MAPPING to local variable at top.
  • Instead of recursing on every detected dict, do it only if 'required' in schema to avoid unnecessary function calls.
  • Cache json_dict[param] to a local variable to avoid repeated lookup.
  • Remove unnecessary if/else structure for clearer short-circuiting.

All comments preserved unless directly changed—see inline.

Summary of changes:

  • Uses local variables for repeated lookups (dict, type_mapping, param_schema, value).
  • Avoids repeated .get() with default {} that can create many temporary dicts.
  • Does not recurse for dicts unless there is actually a nested schema with required keys.
  • Processes params in a single for loop with short-circuiting, no else chaining.

All return values are unchanged and API is preserved.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 30 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from collections.abc import AsyncIterable
# function to test
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Literal, Union

# imports
import pytest  # used for our unit tests
from pydantic_ai.models.mistral import MistralStreamedResponse

# -------------------- UNIT TESTS --------------------

# Basic Test Cases

def test_single_required_string_present():
    # Test with a single required string field, present and correct type
    schema = {
        "type": "object",
        "properties": {"name": {"type": "string"}},
        "required": ["name"],
    }
    data = {"name": "Alice"}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 583ns -> 625ns (6.72% slower)

def test_single_required_string_missing():
    # Test with a single required string field, missing from data
    schema = {
        "type": "object",
        "properties": {"name": {"type": "string"}},
        "required": ["name"],
    }
    data = {}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 291ns -> 292ns (0.342% slower)

def test_single_required_integer_correct_type():
    # Test with a required integer field, correct type
    schema = {
        "type": "object",
        "properties": {"age": {"type": "integer"}},
        "required": ["age"],
    }
    data = {"age": 25}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 541ns -> 583ns (7.20% slower)

def test_single_required_integer_wrong_type():
    # Test with a required integer field, but string provided
    schema = {
        "type": "object",
        "properties": {"age": {"type": "integer"}},
        "required": ["age"],
    }
    data = {"age": "25"}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 541ns -> 583ns (7.20% slower)

def test_multiple_required_fields_all_present():
    # Test with multiple required fields, all present and correct types
    schema = {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "active": {"type": "boolean"}
        },
        "required": ["name", "age", "active"],
    }
    data = {"name": "Bob", "age": 30, "active": True}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 916ns -> 1.00μs (8.40% slower)

def test_multiple_required_fields_one_missing():
    # Test with multiple required fields, one missing
    schema = {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "active": {"type": "boolean"}
        },
        "required": ["name", "age", "active"],
    }
    data = {"name": "Bob", "active": True}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 583ns -> 625ns (6.72% slower)

def test_array_of_integers_correct():
    # Test with a required array of integers, all correct
    schema = {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {"type": "integer"}
            }
        },
        "required": ["numbers"],
    }
    data = {"numbers": [1, 2, 3]}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 875ns -> 958ns (8.66% slower)

def test_array_of_integers_wrong_type_in_array():
    # Test with a required array of integers, but one element is wrong type
    schema = {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {"type": "integer"}
            }
        },
        "required": ["numbers"],
    }
    data = {"numbers": [1, "two", 3]}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 666ns -> 750ns (11.2% slower)

def test_array_field_is_not_list():
    # Test with a required array, but value is not a list
    schema = {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {"type": "integer"}
            }
        },
        "required": ["numbers"],
    }
    data = {"numbers": "not a list"}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 458ns -> 542ns (15.5% slower)

# Edge Test Cases

def test_empty_required_list():
    # Test with no required fields
    schema = {
        "type": "object",
        "properties": {
            "foo": {"type": "string"}
        },
        "required": [],
    }
    data = {}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 250ns -> 250ns (0.000% faster)

def test_null_type_field():
    # Test with a required null type field
    schema = {
        "type": "object",
        "properties": {
            "foo": {"type": "null"}
        },
        "required": ["foo"],
    }
    data = {"foo": None}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 542ns -> 625ns (13.3% slower)

def test_null_type_field_wrong_value():
    # Test with a required null type field, but value is not None
    schema = {
        "type": "object",
        "properties": {
            "foo": {"type": "null"}
        },
        "required": ["foo"],
    }
    data = {"foo": 0}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 500ns -> 542ns (7.75% slower)

def test_nested_object_all_required_present():
    # Test with nested object, all required fields present
    schema = {
        "type": "object",
        "properties": {
            "address": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "zip": {"type": "integer"}
                },
                "required": ["city", "zip"]
            }
        },
        "required": ["address"]
    }
    data = {"address": {"city": "NY", "zip": 10001}}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 1.33μs -> 1.38μs (3.05% slower)

def test_nested_object_missing_required():
    # Test with nested object, missing required field in nested object
    schema = {
        "type": "object",
        "properties": {
            "address": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "zip": {"type": "integer"}
                },
                "required": ["city", "zip"]
            }
        },
        "required": ["address"]
    }
    data = {"address": {"city": "NY"}}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 1.04μs -> 1.04μs (0.096% slower)

def test_nested_object_wrong_type():
    # Test with nested object, wrong type for nested field
    schema = {
        "type": "object",
        "properties": {
            "address": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "zip": {"type": "integer"}
                },
                "required": ["city", "zip"]
            }
        },
        "required": ["address"]
    }
    data = {"address": {"city": "NY", "zip": "not an int"}}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 1.08μs -> 1.17μs (7.11% slower)

def test_array_of_objects_nested_required():
    # Test with array of objects, each with required fields
    schema = {
        "type": "object",
        "properties": {
            "people": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "age": {"type": "integer"}
                    },
                    "required": ["name", "age"]
                }
            }
        },
        "required": ["people"]
    }
    data = {
        "people": [
            {"name": "Alice", "age": 30},
            {"name": "Bob", "age": 25}
        ]
    }
    # The function as written does not recursively validate array items as objects, so this will pass.
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 916ns -> 916ns (0.000% faster)

def test_required_field_with_additional_unexpected_fields():
    # Test with required fields present, plus extra fields not in schema
    schema = {
        "type": "object",
        "properties": {
            "foo": {"type": "string"}
        },
        "required": ["foo"]
    }
    data = {"foo": "bar", "extra": 123}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 500ns -> 583ns (14.2% slower)

def test_required_field_wrong_type_but_optional_field_right():
    # Test with required field wrong type, optional field correct
    schema = {
        "type": "object",
        "properties": {
            "foo": {"type": "string"},
            "bar": {"type": "integer"}
        },
        "required": ["foo"]
    }
    data = {"foo": 123, "bar": 456}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 500ns -> 541ns (7.58% slower)

def test_required_field_type_boolean_true():
    # Test with required boolean field, True value
    schema = {
        "type": "object",
        "properties": {
            "flag": {"type": "boolean"}
        },
        "required": ["flag"]
    }
    data = {"flag": True}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 583ns -> 666ns (12.5% slower)

def test_required_field_type_boolean_false():
    # Test with required boolean field, False value
    schema = {
        "type": "object",
        "properties": {
            "flag": {"type": "boolean"}
        },
        "required": ["flag"]
    }
    data = {"flag": False}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 541ns -> 583ns (7.20% slower)

def test_required_field_type_boolean_wrong_type():
    # Test with required boolean field, but value is string
    schema = {
        "type": "object",
        "properties": {
            "flag": {"type": "boolean"}
        },
        "required": ["flag"]
    }
    data = {"flag": "True"}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 458ns -> 542ns (15.5% slower)

def test_required_field_type_number_float():
    # Test with required number field, float value
    schema = {
        "type": "object",
        "properties": {
            "score": {"type": "number"}
        },
        "required": ["score"]
    }
    data = {"score": 3.14}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 583ns -> 583ns (0.000% faster)

def test_required_field_type_number_int():
    # Test with required number field, integer value (should pass, as int is instance of float in Python)
    schema = {
        "type": "object",
        "properties": {
            "score": {"type": "number"}
        },
        "required": ["score"]
    }
    data = {"score": 5}
    # In Python, isinstance(5, float) is False, so this will fail, but in JSON Schema, number includes int.
    # The function as written will return False, so we expect False.
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 458ns -> 500ns (8.40% slower)

def test_required_field_type_integer_float_value():
    # Test with required integer field, but float value
    schema = {
        "type": "object",
        "properties": {
            "count": {"type": "integer"}
        },
        "required": ["count"]
    }
    data = {"count": 3.0}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 500ns -> 542ns (7.75% slower)

# Large Scale Test Cases

def test_large_number_of_required_fields_all_present():
    # Test with 500 required fields, all present and correct type
    schema = {
        "type": "object",
        "properties": {f"field{i}": {"type": "integer"} for i in range(500)},
        "required": [f"field{i}" for i in range(500)]
    }
    data = {f"field{i}": i for i in range(500)}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 57.2μs -> 53.4μs (7.18% faster)

def test_large_number_of_required_fields_one_missing():
    # Test with 500 required fields, one missing
    schema = {
        "type": "object",
        "properties": {f"field{i}": {"type": "integer"} for i in range(500)},
        "required": [f"field{i}" for i in range(500)]
    }
    data = {f"field{i}": i for i in range(499)}  # missing field499
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 56.3μs -> 52.3μs (7.65% faster)

def test_large_array_of_integers():
    # Test with a required array of 500 integers
    schema = {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {"type": "integer"}
            }
        },
        "required": ["numbers"]
    }
    data = {"numbers": list(range(500))}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 6.88μs -> 4.92μs (39.8% faster)

def test_large_array_of_integers_one_wrong_type():
    # Test with a required array of 500 integers, one is wrong type
    schema = {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {"type": "integer"}
            }
        },
        "required": ["numbers"]
    }
    numbers = list(range(500))
    numbers[250] = "not an int"
    data = {"numbers": numbers}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 3.92μs -> 2.75μs (42.4% faster)

def test_large_nested_object():
    # Test with a large nested object (depth 3), all required fields present
    schema = {
        "type": "object",
        "properties": {
            "level1": {
                "type": "object",
                "properties": {
                    "level2": {
                        "type": "object",
                        "properties": {
                            "level3": {"type": "string"}
                        },
                        "required": ["level3"]
                    }
                },
                "required": ["level2"]
            }
        },
        "required": ["level1"]
    }
    data = {"level1": {"level2": {"level3": "deep"}}}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 1.29μs -> 1.33μs (3.08% slower)

def test_large_nested_object_missing_deep_field():
    # Test with a large nested object (depth 3), missing the deepest required field
    schema = {
        "type": "object",
        "properties": {
            "level1": {
                "type": "object",
                "properties": {
                    "level2": {
                        "type": "object",
                        "properties": {
                            "level3": {"type": "string"}
                        },
                        "required": ["level3"]
                    }
                },
                "required": ["level2"]
            }
        },
        "required": ["level1"]
    }
    data = {"level1": {"level2": {}}}
    codeflash_output = MistralStreamedResponse._validate_required_json_schema(data, schema) # 1.08μs -> 1.04μs (4.03% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from collections.abc import AsyncIterable
# function to test
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Literal, Union

# imports
import pytest  # used for our unit tests
from pydantic_ai.models.mistral import MistralStreamedResponse

# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.


from pydantic_ai.models.mistral import MistralStreamedResponse

def test_MistralStreamedResponse__validate_required_json_schema():
    MistralStreamedResponse._validate_required_json_schema({'': 0, '\x00': '', '\x01': {}}, {'required': '\x01\x00'})

def test_MistralStreamedResponse__validate_required_json_schema_2():
    MistralStreamedResponse._validate_required_json_schema({}, {'required': '\x00'})

To edit these changes git checkout codeflash/optimize-MistralStreamedResponse._validate_required_json_schema-mddt7trc and push.

Codeflash

…chema` by 7%

Here is an optimized version of your program.  
**Major speedups:**  
- Avoid repeated dictionary lookups (move them out of loops).
- Avoid `.get('items', {})` creating a dict each time.
- Minimize function attribute lookups, e.g., move `VALID_JSON_TYPE_MAPPING` to local variable at top.
- Instead of recursing on every detected dict, do it only if `'required'` in schema to avoid unnecessary function calls.
- Cache `json_dict[param]` to a local variable to avoid repeated lookup.
- Remove unnecessary `if`/`else` structure for clearer short-circuiting.

**All comments preserved unless directly changed—see inline.**


**Summary of changes:**  
- Uses local variables for repeated lookups (dict, type_mapping, param_schema, value).
- Avoids repeated `.get()` with default `{}` that can create many temporary dicts.
- Does not recurse for dicts unless there is actually a nested schema with required keys.
- Processes params in a single for loop with short-circuiting, no else chaining.

**All return values are unchanged and API is preserved.**
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 22, 2025
@codeflash-ai codeflash-ai bot requested a review from aseembits93 July 22, 2025 00:42
Copy link
Owner

@aseembits93 aseembits93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concolic Coverage Tests not appearning in PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant