Skip to content

Conversation

@CuiYifeng
Copy link
Contributor

To solve #1121.

The original usage of sycl::get_kernel_bundle should not be device-specific. Since the fixing has landed on oneAPI 2025.2, kernel bundle is now created using only the kernel ID (kid) instead of both device and kernel ID, removing the workaround for device-specific kernel builds and associated comments.

Copilot AI review requested due to automatic review settings August 15, 2025 03:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR rolls back the usage of sycl::get_kernel_bundle to its original device-agnostic form, removing a workaround that was needed for oneAPI versions prior to 2025.2. The change eliminates device-specific kernel bundle creation and the associated comments explaining the workaround.

  • Reverts sycl::get_kernel_bundle call to use only kernel ID instead of both device and kernel ID
  • Removes workaround comments that referenced a specific GitHub issue

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@CuiYifeng CuiYifeng added this pull request to the merge queue Aug 19, 2025
Merged via the queue into main with commit 83a1555 Aug 19, 2025
61 of 63 checks passed
@CuiYifeng CuiYifeng deleted the yifeng/get_kernel_bundle branch August 19, 2025 05:27
@newtdms
Copy link

newtdms commented Sep 5, 2025

@CuiYifeng This patch is causing regression for some of the E2E model for both single- and multi-rank distributed training. Does the new DLE version require removing the device id from the kernel bundle? Do you have any documentation?

@CuiYifeng
Copy link
Contributor Author

CuiYifeng commented Sep 8, 2025

@CuiYifeng This patch is causing regression for some of the E2E model for both single- and multi-rank distributed training. Does the new DLE version require removing the device id from the kernel bundle? Do you have any documentation?

@newtdms The device id of sycl::get_kernel_bundle is a work-around to fix a issue and this parameter should not have been passed in originally. After the device id is removed, some old DLE versions are incompatible. Could you check if the DLE version >= 2025.2.0 (2025.2.0.558 works on my local machine) in your environment?
BTW, could you share your error logs?

@newtdms
Copy link

newtdms commented Sep 8, 2025

@CuiYifeng Yes, I am using LKG 2025.2.1, used by the PT upstream nightly.
There is no error during runtime or compile time, but we are observing ~75% performance regression for CosmicTagger for both single-rank and distributed training between the 0831 and 0209 nightly. Some details are here in the internal JIRA: PYTORCHDGQ-7051
What is the original issue that you are using this work around for?

@CuiYifeng
Copy link
Contributor Author

CuiYifeng commented Sep 9, 2025

@newtdms Please refer to #745 and #1121. Could you provide profiling data in the internal JIRA or desensitized data here? Then, we can revert these changes quickly if needed.

CuiYifeng added a commit that referenced this pull request Sep 10, 2025
github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2025
…" (#2026)

To solve #2025.
This reverts commit 83a1555 to quickly
bypass the performance regression caused by usage of
sycl::get_kernel_bundle .
CuiYifeng added a commit that referenced this pull request Sep 12, 2025
…" (#2026)

To solve #2025.
This reverts commit 83a1555 to quickly
bypass the performance regression caused by usage of sycl::get_kernel_bundle.
chuanqi129 pushed a commit that referenced this pull request Sep 15, 2025
…_bundle (#1935)" (#2026) (#2035)

Cherry-pick #2026 to solve #2025.
This reverts commit 83a1555 to quickly
bypass the performance regression caused by usage of
sycl::get_kernel_bundle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants