⚡️ Speed up function _simple_json_normalize by 44%
#302
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 44% (0.44x) speedup for
_simple_json_normalizeinpandas/io/json/_normalize.py⏱️ Runtime :
3.18 milliseconds→2.21 milliseconds(best of218runs)📝 Explanation and details
The optimization achieves a 43% speedup by eliminating redundant dictionary operations and improving memory allocation patterns in
_normalise_json_ordered.Key optimizations applied:
Single-pass data partitioning: Instead of iterating through
data.items()twice with dict comprehensions to separate flat vs nested values, the optimized version uses a singleforloop to partition data intotop_dict_andnested_dict_input. This reduces the number ofisinstance()calls and dictionary iterations.In-place dictionary updates: Rather than creating a new dictionary with
{**top_dict_, **nested_dict_}(which allocates a new dict and copies all key-value pairs), the optimization usestop_dict_.update(nested_dict_)to merge results in-place, avoiding the allocation overhead.Conditional processing: The optimization only calls
_normalise_jsonwhennested_dict_inputis non-empty, avoiding unnecessary function calls for dictionaries with no nested structure.Simplified return logic: In
_simple_json_normalize, removed the intermediatenormalised_json_objectvariable and directly return the result, reducing variable assignments.Performance impact by test case type:
_normalise_jsoncalls remain the bottleneckThe optimizations are particularly effective for the common JSON normalization use case of processing many flat or lightly nested records, which aligns with typical data processing workflows where this function would be called repeatedly in hot paths.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_simple_json_normalize-mhopi3bland push.