add vectorization path on maxpool forward channel last #1883

jianyizh · 2025-07-28T08:56:58Z

Part 1 of #1861
tested on shapes from alexnet training
on BMG, 831719 Scoreboard stalls decrease to 497,098. instruction fetch and distance stall also get better.

shape	device	before opt	after opt
[4096, 64, 55, 55]	pvc	8.02 ms	5.44 ms
[4096, 64, 55, 55]	bmg	12.45 ms	8.89 ms
[4096, 192, 27, 27]	pvc	5.72 ms	3.85 ms
[4096, 192, 27, 27]	bmg	9.00 ms	5.06 ms
[4096, 256, 13, 13]	pvc	1.68 ms	1.12 ms
[4096, 256, 13, 13]	bmg	2.83 ms	1.35 ms

Copilot

Pull Request Overview

This PR adds a vectorized code path for the max pooling forward operation when using channel-last memory layout, providing significant performance improvements on Intel GPU architectures. The optimization uses vectorized memory operations and SYCL kernels to improve throughput.

Key changes:

Introduces a new vectorized kernel MaxPool2dChannelLastVec that processes multiple channels simultaneously
Adds automatic vector size selection (8, 4, 2, or 1) based on data alignment and hardware capabilities
Implements dynamic work group sizing based on hardware thread availability

src/ATen/native/xpu/sycl/DilatedMaxPool2d.cpp

Co-authored-by: Copilot <[email protected]>

follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg --------- Co-authored-by: Copilot <[email protected]>

save

bb6320a

Copilot AI review requested due to automatic review settings July 28, 2025 08:56

jianyizh added the kernel_optimization label Jul 28, 2025

Copilot AI reviewed Jul 28, 2025

View reviewed changes

save

a73416c

Co-authored-by: Copilot <[email protected]>

jianyizh requested review from EikanWang and toyxu July 28, 2025 09:15

Merge branch 'main' into jianyi/maxpool

510d563

jianyizh requested a review from liangan1 August 13, 2025 02:35

chuanqi129 linked an issue Aug 13, 2025 that may be closed by this pull request

Maxpooling takes too long on BMG #1861

Closed

liangan1 approved these changes Aug 19, 2025

View reviewed changes

chunhuanMeng approved these changes Aug 19, 2025

View reviewed changes

jianyizh added this pull request to the merge queue Aug 19, 2025

Merged via the queue into main with commit c091232 Aug 19, 2025
21 checks passed

jianyizh deleted the jianyi/maxpool branch August 19, 2025 05:42

jianyizh mentioned this pull request Sep 4, 2025

optimize adptive avg pool #2012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add vectorization path on maxpool forward channel last #1883

add vectorization path on maxpool forward channel last #1883

Uh oh!

jianyizh commented Jul 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add vectorization path on maxpool forward channel last #1883

add vectorization path on maxpool forward channel last #1883

Uh oh!

Conversation

jianyizh commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jianyizh commented Jul 28, 2025 •

edited

Loading