-
Notifications
You must be signed in to change notification settings - Fork 62
optimize adptive avg pool #2012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the adaptive average pool operation by introducing a vectorized implementation for channel-last memory format. The changes add vectorization support to improve memory access patterns and performance for 2D adaptive average pooling operations.
Key changes:
- Add vectorized kernel implementation for adaptive average pooling in channel-last format
- Replace the original channel-last kernel with the new optimized vectorized version
- Add necessary memory access utilities for vectorization support
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
2cfe262 to
3a013e6
Compare
Co-authored-by: Copilot <[email protected]>
dc988db to
e875db2
Compare
follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg --------- Co-authored-by: Copilot <[email protected]>
follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg