Skip to content

[bc-breaking] Generalize FakeQuantizeConfig beyond intx #2628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2025

Conversation

andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented Jul 29, 2025

Stack from ghstack (oldest at bottom):

Summary: The existing FakeQuantizeConfig performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig

In the future, we will have other types of FakeQuantizeConfigBase
for float dtypes that users can pass in instead of the existing
Intx one.

BC-breaking notes: For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
FakeQuantizeConfigBase instead. For the most part, this abstract
class will be hidden from the user.

Before:

activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)

After:

activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)

Test Plan:
python test/quantization/test_qat.py

 **Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2628

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 8245cee with merge base 2f8fd69 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025
@andrewor14 andrewor14 added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Jul 29, 2025
@andrewor14 andrewor14 requested review from jerryzh168 and drisspg July 29, 2025 20:03
@drisspg
Copy link
Contributor

drisspg commented Jul 30, 2025

Just to confirm, we are changing the name of FakeQuantizeConfig to intxFakeQuantizeConfig, But we are also keeping around the fake quantize config object as a rename of int x fake quantize config ? And then in two releases we will remove it?

@andrewor14
Copy link
Contributor Author

Just to confirm, we are changing the name of FakeQuantizeConfig to intxFakeQuantizeConfig, But we are also keeping around the fake quantize config object as a rename of int x fake quantize config ? And then in two releases we will remove it?

Yes keeping around FakeQuantizeConfig is just for BC. I think many users are using it today (it's part of our recommended API). Will deprecate in a future PR

**Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]
**Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]
)


@dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think you need this dataclass decorator here


def __init__(
self,
dtype: Union[torch.dtype, TorchAODType],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you can type this as Literal[...] so that it only allows for int inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually have a lot of dtypes we allow, like all of int2-8 and uint2-8, will be too verbose I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just define a Allowed Types above and use it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I just tried it but didn't really like it. I think I prefer a simpler signature like just torch.dtype (we can drop TorchAODType soon, only needed for pytorch 2.5 and before) and do the validation in init

self.eps = eps

# Validate dtype
all_dtypes = [torch.int8, torch.uint8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to this

**Summary:** The existing `FakeQuantizeConfig` performs only
intx quantization, but we plan to extend QAT to other dtypes
such as fp8 and nvfp4 in the near future. This is the necessary
refactor before that. Specifically:

```
# New abstract class
FakeQuantizeConfigBase
# Rename
FakeQuantizeConfig -> IntxFakeQuantizeConfig
```

In the future, we will have other types of `FakeQuantizeConfigBase`
for float dtypes that users can pass in instead of the existing
Intx one.

**BC-breaking notes:** For BC, we keep around the old names to
reference the new ones. However, this commit is still BC-breaking
in the sense that a few APIs now accept the abstract
`FakeQuantizeConfigBase` instead. For the most part, this abstract
class will be hidden from the user.

Before:
```
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
```

After:
```
activation_config = IntxFakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = IntxFakeQuantizeConfig(torch.int4, group_size=32)
```

**Test Plan:**
python test/quantization/test_qat.py

[ghstack-poisoned]
@andrewor14 andrewor14 changed the base branch from gh/andrewor14/13/base to main August 1, 2025 15:07
@andrewor14 andrewor14 merged commit 97b090d into main Aug 1, 2025
36 of 40 checks passed
andrewor14 added a commit that referenced this pull request Aug 7, 2025
**Summary:** Similar to #2628,
but for `FakeQuantizer`. It is cleaner to isolate the logic of
each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8.
Naming change:

```
FakeQuantizer -> IntxFakeQuantizer
```

**BC-breaking notes:** This is technically not BC-breaking yet
since we are just deprecating the old APIs while keeping them
around. It will be when we do remove the old APIs in the future
according to #2630.

Before:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
FakeQuantizer(config)
```

After:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
IntxFakeQuantizer(config) # or
FakeQuantizerBase.from_config(config)
```

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]
andrewor14 added a commit that referenced this pull request Aug 7, 2025
**Summary:** Similar to #2628,
but for `FakeQuantizer`. It is cleaner to isolate the logic of
each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8.
Naming change:

```
FakeQuantizer -> IntxFakeQuantizer
```

**BC-breaking notes:** This is technically not BC-breaking yet
since we are just deprecating the old APIs while keeping them
around. It will be when we do remove the old APIs in the future
according to #2630.

Before:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
FakeQuantizer(config)
```

After:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
IntxFakeQuantizer(config) # or
FakeQuantizerBase.from_config(config)
```

**Test Plan:**
```
python test/quantization/test_qat.py
```

ghstack-source-id: 3867fab
Pull Request resolved: #2714
andrewor14 added a commit that referenced this pull request Aug 8, 2025
**Summary:** Similar to #2628,
but for `FakeQuantizer`. It is cleaner to isolate the logic of
each quantizer in separate classes, e.g. intx vs nvfp4 vs fp8.
Naming change:

```
FakeQuantizer -> IntxFakeQuantizer
```

**BC-breaking notes:** This is technically not BC-breaking yet
since we are just deprecating the old APIs while keeping them
around. It will be when we do remove the old APIs in the future
according to #2630.

Before:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
FakeQuantizer(config)
```

After:
```
config = IntxFakeQuantizeConfig(torch.int8, "per_channel")
IntxFakeQuantizer(config) # or
FakeQuantizerBase.from_config(config)
```

**Test Plan:**
```
python test/quantization/test_qat.py
```

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: bc-breaking Use this tag if this PR breaks backward compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants