⚡️ Speed up function manual_convolution_1d by 12,774%
#158
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 12,774% (127.74x) speedup for
manual_convolution_1dinsrc/signal/filters.py⏱️ Runtime :
47.7 milliseconds→371 microseconds(best of135runs)📝 Explanation and details
The optimized code achieves a 127x speedup by replacing nested Python loops with vectorized NumPy operations using stride tricks.
Key optimizations:
Eliminated nested loops: The original code uses two nested Python loops that perform 167,188 individual array access operations (63.8% of runtime). The optimized version removes these entirely.
Used
as_stridedfor sliding windows: Instead of manually indexingsignal[i + j]in loops,as_stridedcreates a 2D view of the signal where each row represents a sliding window. This avoids copying data and enables vectorized operations.Vectorized computation with
np.dot: Replaced the inner loop multiplication and accumulation (result[i] += signal[i + j] * kernel[j]) with a singlenp.dot(windows, kernel)operation that leverages optimized BLAS routines.Added edge case handling: The
if result_len <= 0check prevents errors when the kernel is longer than the signal.Performance characteristics from tests:
The optimization shines on larger inputs where the vectorized operations drastically outweigh setup costs, transforming an O(n*k) nested loop operation into efficient matrix multiplication.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-manual_convolution_1d-mheotswgand push.