Skip to content

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Sep 26, 2025

Summary
We split the original big PR #2505 into the following smaller ones:

Test plan

pytest -sv test/quantization/quantize_/workflows/float8/test_float8_opaque_tensor.py

Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3075

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit d460134 with merge base 4013764 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@Xia-Weiwen Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 26, 2025
@Xia-Weiwen
Copy link
Collaborator Author

CC @mingfeima for review. Thanks.

@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Could you please review this PR? Thanks.

Comment on lines +66 to +69
@common_utils.parametrize(
"x_granularity",
[PerTensor(), PerRow(), PerGroup(32), PerGroup(64), PerGroup(128)],
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does torch.ao support per block quantization, e.g. deepseek style?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. The supported granularity varies among different quantization methods in Torchao. For float8 da8w8 on CPU, it does not support the block-wise quantization used in DeepSeek.


class Float8OpaqueTensor(TorchAOBaseTensor):
"""
Float8 dynamic activation float8 weight on CPU. The weight tensor is reordered to a blocked layout

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be Float8 dynamic quantized float8 weight on CPU

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression here is interpreted as "Float8 dynamic activation" "float8 weight" on CPU, which means activation is dynamic quantized to float8 and weight is static quantized to float8. This is aligned with other parts in Torchao.

[block_k, block_n] may be further reordered to VNNI layout depending on supported CPU ISA.

Tensor Attributes:
qdata: Reordered float8 weight on CPU with shape = [N/block_n, K/block_k, block_k, block_n].

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we computing with float32 here, as the weight is not packed in vnni2 format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are computing with bf16 or fp8, depending on ISA. The exposed shape does not have the VNNI dimension but the memory layout is VNNI-2 or VNNI-4 if ISA is supported.

@Xia-Weiwen Xia-Weiwen marked this pull request as draft September 30, 2025 01:28
@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review September 30, 2025 01:35
@Xia-Weiwen
Copy link
Collaborator Author

Hi @mingfeima @jerryzh168 @andrewor14 Though this PR depends on #3100, could you please review this PR? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants