⚡️ Speed up function remove_pattern by 366%
#95
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 366% (3.66x) speedup for
remove_patternininvokeai/app/util/controlnet_utils.py⏱️ Runtime :
22.2 milliseconds→4.76 milliseconds(best of152runs)📝 Explanation and details
The optimized code achieves a 365% speedup by eliminating expensive NumPy array operations and replacing them with more efficient alternatives.
Key Optimizations:
Eliminated
np.where()for indexing: The original code usedobjects = np.where(objects > 127)which creates a tuple of index arrays. This is expensive becausenp.wheremust scan the entire array and build coordinate arrays for all matching elements. The optimized version uses direct boolean masking withmask = objects > 127, which is much faster.Replaced tuple introspection with direct counting: Instead of checking
objects[0].shape[0] > 0to determine if any patterns were found, the optimization usesnp.count_nonzero(mask)which directly counts True values in the boolean mask. This avoids the overhead of creating index arrays entirely.Conditional assignment optimization: The optimized version only performs the expensive array assignment
x[mask] = 0when patterns are actually found (if count:), avoiding unnecessary work in cases where no patterns match.Performance Impact Analysis:
np.where()call took 68.4% of total execution time (19.5ms), while the optimized boolean mask creation takes only 15.4% (0.88ms)Best Performance Gains:
The optimization maintains identical behavior while dramatically reducing computational overhead through more efficient NumPy operations.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-remove_pattern-mhn8bgfband push.