[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Xia-Weiwen · 2025-09-26T03:18:08Z

Summary
We split the original big PR #2505 into the following smaller ones:

Unify get_block_size #3039 (relanded by [Reland] Unify get_block_size #3059)
[CPU] Add ops for float8 linear #3052
And this PR [CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075, which as the Float8OpaqueTensor for dynamic float8 act float8 weight quantization on CPU

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

pytorch-bot · 2025-09-26T03:18:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[Maintenance] MacOS runners update

✅ No Failures

As of commit d460134 with merge base 4013764 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-09-26T03:19:01Z

CC @mingfeima for review. Thanks.

Xia-Weiwen · 2025-09-28T01:16:59Z

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

mingfeima · 2025-09-28T01:46:37Z

test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

+    @common_utils.parametrize(
+        "x_granularity",
+        [PerTensor(), PerRow(), PerGroup(32), PerGroup(64), PerGroup(128)],
+    )


does torch.ao support per block quantization, e.g. deepseek style?

Thanks for the comment. The supported granularity varies among different quantization methods in Torchao. For float8 da8w8 on CPU, it does not support the block-wise quantization used in DeepSeek.

mingfeima · 2025-09-28T01:57:25Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+
+class Float8OpaqueTensor(TorchAOBaseTensor):
+    """
+    Float8 dynamic activation float8 weight on CPU. The weight tensor is reordered to a blocked layout


should it be Float8 dynamic quantized float8 weight on CPU

The expression here is interpreted as "Float8 dynamic activation" "float8 weight" on CPU, which means activation is dynamic quantized to float8 and weight is static quantized to float8. This is aligned with other parts in Torchao.

mingfeima · 2025-09-28T01:58:27Z

torchao/quantization/quantize_/workflows/float8/float8_opaque_tensor.py

+    [block_k, block_n] may be further reordered to VNNI layout depending on supported CPU ISA.
+
+    Tensor Attributes:
+        qdata: Reordered float8 weight on CPU with shape = [N/block_n, K/block_k, block_k, block_n].


are we computing with float32 here, as the weight is not packed in vnni2 format.

We are computing with bf16 or fp8, depending on ISA. The exposed shape does not have the VNNI dimension but the memory layout is VNNI-2 or VNNI-4 if ISA is supported.

Xia-Weiwen · 2025-09-30T01:38:23Z

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025

Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025

Xia-Weiwen requested review from jerryzh168 and andrewor14 September 26, 2025 06:10

Xia-Weiwen marked this pull request as ready for review September 26, 2025 06:10

Xia-Weiwen mentioned this pull request Sep 26, 2025

[CPU] Add Float8OpaqueTensor for dynamic float8 act float8 weight #2505

Closed

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight

d460134

mingfeima reviewed Sep 28, 2025

View reviewed changes

Xia-Weiwen requested a review from mingfeima September 28, 2025 02:08

Xia-Weiwen marked this pull request as draft September 30, 2025 01:28

Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Uh oh!

Xia-Weiwen commented Sep 26, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

Xia-Weiwen commented Sep 26, 2025

Uh oh!

Xia-Weiwen commented Sep 28, 2025

Uh oh!

mingfeima Sep 28, 2025

Uh oh!

Xia-Weiwen Sep 28, 2025

Uh oh!

mingfeima Sep 28, 2025

Uh oh!

Xia-Weiwen Sep 28, 2025

Uh oh!

mingfeima Sep 28, 2025

Uh oh!

Xia-Weiwen Sep 28, 2025

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

Uh oh!

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Are you sure you want to change the base?

[CPU] add Float8OpaqueTensor for dynamic float8 act float8 weight #3075

Uh oh!

Conversation

Xia-Weiwen commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Xia-Weiwen commented Sep 26, 2025

Uh oh!

Xia-Weiwen commented Sep 28, 2025

Uh oh!

mingfeima Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

mingfeima Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

mingfeima Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Sep 30, 2025

Uh oh!

Uh oh!

Xia-Weiwen commented Sep 26, 2025 •

edited

Loading

pytorch-bot bot commented Sep 26, 2025 •

edited

Loading