[moe training] add fp8 rowwise kernels for expert weights #2696

danielvegamyhre · 2025-08-05T23:11:50Z

Stacked PRs:

[moe training] add fp8 rowwise kernels for expert weights

Summary

torch.compile is too slow for quantizing expert weights along dim1 for backward (see Inductor codegen for float8 dynamic quantization ops for scaled_grouped_mm backward pass is slow pytorch#159769)
I wrote a triton kernel to do this with better perf. We can remove it once torch.compile is better.
This PR is focused on numerical accuracy. Perf benchmarking is in next PR in stack.

Test plan

pytest test/prototype/moe_training/test_kernels.py

stack-info: PR: #2696, branch: danielvegamyhre/stack/30

pytorch-bot · 2025-08-05T23:11:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2696

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-08-06T12:25:16Z

torchao/prototype/moe_training/kernels/float8_rowwise.py

+        tl.float32
+    )
+    if round_scales_to_power_of_2:
+        scales = tl.exp2(tl.floor(tl.log2(scales)))


this seems expensive, can we just extract the bits?

vkuzo · 2025-08-06T12:25:59Z

torchao/prototype/moe_training/utils.py

+
+    # Apply scales to tensor and convert to float8.
+    tensor_scaled = input_hp_t.to(torch.float32) * scales
+    float8_tensor = to_fp8_saturated(tensor_scaled, target_dtype)


this is confusing because it sounds like Float8TrainingTensor, maybe name it float8_data?

[moe training] add fp8 rowwise kernels for expert weights

c789281

stack-info: PR: #2696, branch: danielvegamyhre/stack/30

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from af159db to f6688be Compare August 5, 2025 23:12

danielvegamyhre added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 5, 2025

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 5, 2025 23:44

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from f6688be to a6f8cbb Compare August 5, 2025 23:44

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 5, 2025 23:44

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 5, 2025 23:56

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from a6f8cbb to ef4e25c Compare August 5, 2025 23:56

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 5, 2025 23:57

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 6, 2025 00:00

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 6, 2025 00:01

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 6, 2025 00:13

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 6, 2025 00:13

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 6, 2025 00:56

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 6, 2025 00:56

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 6, 2025 01:19

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from ef4e25c to 2ea3573 Compare August 6, 2025 01:20

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 6, 2025 01:20

danielvegamyhre changed the base branch from danielvegamyhre/stack/29 to main August 6, 2025 01:36

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from 2ea3573 to 6704fd3 Compare August 6, 2025 01:36

danielvegamyhre changed the base branch from main to danielvegamyhre/stack/29 August 6, 2025 01:36

vkuzo reviewed Aug 6, 2025

View reviewed changes

vkuzo approved these changes Aug 6, 2025

View reviewed changes

danielvegamyhre force-pushed the danielvegamyhre/stack/29 branch from fad9062 to 241e9b7 Compare August 6, 2025 17:24

danielvegamyhre force-pushed the danielvegamyhre/stack/30 branch from 6704fd3 to c789281 Compare August 6, 2025 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[moe training] add fp8 rowwise kernels for expert weights #2696

[moe training] add fp8 rowwise kernels for expert weights #2696

danielvegamyhre commented Aug 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

vkuzo Aug 6, 2025

Uh oh!

vkuzo Aug 6, 2025

Uh oh!

Uh oh!

[moe training] add fp8 rowwise kernels for expert weights #2696

Are you sure you want to change the base?

[moe training] add fp8 rowwise kernels for expert weights #2696

Conversation

danielvegamyhre commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2696

❗ 1 Active SEVs

Uh oh!

vkuzo Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielvegamyhre commented Aug 5, 2025 •

edited

Loading

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading