⚡️ Speed up function rope by 30%
#110
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 30% (0.30x) speedup for
ropeininvokeai/backend/flux/math.py⏱️ Runtime :
5.28 milliseconds→4.07 milliseconds(best of186runs)📝 Explanation and details
The optimized code achieves a 29% speedup by replacing expensive tensor operations with more efficient alternatives:
Key Optimizations
1. Eliminated
torch.einsumbottleneck (64% → 10% of runtime)torch.einsum("...n,d->...nd", pos, omega)with direct broadcasting:pos.unsqueeze(-1) * omega2. Reduced trigonometric function calls
cosandsintwice each within the stack operationcos_outandsin_outonce, then reuse them3. Faster tensor reshaping
einops.rearrange()with directtensor.view()for the common case.view()is a zero-copy operation that's faster than einops' more general reshaping logic4. Reduced attribute access overhead
pos.deviceandpos.dtypein variables to avoid repeated attribute lookupsPerformance Impact
The optimizations are particularly effective for:
Based on the test results, this optimization provides consistent speedups across all input sizes and configurations, making it a valuable improvement for any workload using rotary position embeddings.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-rope-mhodz532and push.