Skip to content

[moe training] integrate rowwise expert quant kernel #2698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: danielvegamyhre/stack/31
Choose a base branch
from

Conversation

…to improve perf

stack-info: PR: #2668, branch: danielvegamyhre/stack/27
stack-info: PR: #2669, branch: danielvegamyhre/stack/28
Copy link

pytorch-bot bot commented Aug 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2698

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre added a commit that referenced this pull request Aug 5, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 4813df8 to cec6365 Compare August 5, 2025 23:12
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/31 branch from 59e34b1 to 362cfb2 Compare August 5, 2025 23:12
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025
@danielvegamyhre danielvegamyhre added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 5, 2025
@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Aug 5, 2025

@zou3519 here is a different problem occurring with wrap_triton in this PR. I added a new triton kernel _triton_fp8_rowwise_3d_transpose_scales_rhs_kernel and wrapped with wrap_triton, and wrapped that in a custom triton op triton_fp8_rowwise_3d_transpose_rhs.

In eager it works, but when I try to use torch.compile now I get an error about using a FunctionalTensor. This doesn't happen with the other triton kernels used for this feature.

Repro

  1. Check out this PR
  2. Run unit test using compile: pytest test/prototype/moe_training/test_training.py -k test_moe_float8_training[True-target_fqns0]

Error

@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 5, 2025 23:44
danielvegamyhre added a commit that referenced this pull request Aug 5, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from cec6365 to 3d8f201 Compare August 5, 2025 23:44
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 5, 2025 23:44
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 5, 2025 23:56
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 3d8f201 to 79af5db Compare August 5, 2025 23:56
danielvegamyhre added a commit that referenced this pull request Aug 5, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 5, 2025 23:57
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 6, 2025 00:00
danielvegamyhre added a commit that referenced this pull request Aug 6, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 79af5db to 9f8a1da Compare August 6, 2025 00:00
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 6, 2025 00:01
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 6, 2025 00:13
danielvegamyhre added a commit that referenced this pull request Aug 6, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 9f8a1da to 91bdcbc Compare August 6, 2025 00:13
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 6, 2025 00:13
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 6, 2025 00:56
danielvegamyhre added a commit that referenced this pull request Aug 6, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 91bdcbc to a54939c Compare August 6, 2025 00:56
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 6, 2025 00:56
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 6, 2025 01:20
danielvegamyhre added a commit that referenced this pull request Aug 6, 2025
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from a54939c to 05b1d9a Compare August 6, 2025 01:20
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 6, 2025 01:20
stack-info: PR: #2671, branch: danielvegamyhre/stack/29
stack-info: PR: #2696, branch: danielvegamyhre/stack/30
…totune configs

stack-info: PR: #2697, branch: danielvegamyhre/stack/31
stack-info: PR: #2698, branch: danielvegamyhre/stack/32
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/31 to main August 6, 2025 01:36
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/32 branch from 05b1d9a to 3248229 Compare August 6, 2025 01:36
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/31 August 6, 2025 01:36
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/31 branch from 1fb9ee1 to f124235 Compare August 6, 2025 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants