Build SmoothQuant release pipeline #3010

namgyu-youn · 2025-09-16T09:01:40Z

Summary:
Adds SMOOTHQUANT-W8A8 quantization method to the TorchAO model release pipeline.

Adjust defaults: Increased calibration samples from 10 to 128 to ensure consistency
Update old HF CLI command: huggingface-cli login to hf auth login

Test plan:

python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant SMOOTHQUANT-W8A8 --push_to_hub --task mmlu_pro --populate_model_card_template

Adds SMOOTHQUANT-W8A8 quantization method to the TorchAO model release pipeline. - Adjusted defaults: Increased calibration samples from 10 to 128 to ensure consistency, reduced max sequence length (SeqLen) from 2048 to 1024 - Updated HF CLI command: `huggingface-cli login` to `hf auth login` Test plan: ```bash python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant SMOOTHQUANT-W8A8 --push_to_hub --task bbh ```

pytorch-bot · 2025-09-16T09:01:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3010

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2025-09-16T09:02:44Z

Checkpoint:
https://huggingface.co/namgyu-youn/Qwen3-8B-SMOOTHQUANT-W8A8

jerryzh168 · 2025-09-18T00:59:57Z

.github/scripts/torchao_model_releases/README.md


-### AWQ-INT4
-[AWQ](https://arxiv.org/abs/2306.00978) is a technique to improve accuracy for weight only quantization. It improves accuracy by preserving "salient" weight channels that has high impact on the accuracy of output, through multiplying the weight channel by a scale, and do the reverse for the correspnoding activation, since activation is not quantized, there is no additional loss from activation, while the quantization loss from weight can be reduced.
+### SMOOTHQUANT-W8A8 & AWQ-INT4


can you add a separate section for smoothquant?

Yes, separating them and linking them seems better.

jerryzh168 · 2025-09-22T17:45:53Z

.github/scripts/torchao_model_releases/quantize_and_upload.py

                "model.embed_tokens": _int8_int4_embedding_config,
            }
        ),
+        "SMOOTHQUANT-W8A8": Int8DynamicActivationInt8WeightConfig(),


to standardize on naming, this should be: SmoothQuant-INT8-INT8 I think

Thanks for correcting it; I missed the standard name in this script.

jerryzh168 · 2025-09-22T17:46:37Z

.github/scripts/torchao_model_releases/README.md


 Note: for initial release, please include `--populate_model_card_template` to populate model card template.

+### SMOOTHQUANT-W8A8


can you add the command to generate smoothquant checkpoints as well? similar to AWQ-INT4

jerryzh168 · 2025-09-24T20:22:03Z

.github/scripts/torchao_model_releases/README.md

 # with some calibration_limit (number of samples)
 python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant AWQ-INT4 --push_to_hub --task bbh --calibration_limit 2
+
+# release SMOOTHQUANT-INT8-INT8 model, calibrated with a specific task


this should be added in SMOOTHQUANT-INT8-INT8 section I think

jerryzh168 · 2025-09-24T20:22:44Z

.github/scripts/torchao_model_releases/quantize_and_upload.py

        quantized_model = model
        quant_config = AWQConfig(base_config, step="prepare_for_loading")
        quantized_model.config.quantization_config = TorchAoConfig(quant_config)
+    elif quant == "SMOOTHQUANT-INT8-INT8":


nit: can you change to SmoothQuant-INT8-INT8? I feel that's slightly easier to read

But how about keeping upper letter to ensure consistency in quant_to_quant_code ? Upper letter seems right pattern I think.

it's abbreviations, that's why they are upper case, maybe you can use SQ-INT8-INT8 then? but SmoothQuant will be clearer though

okay then SmoothQuant-INT8-INT8 looks best

jerryzh168 · 2025-10-08T01:37:46Z

.github/scripts/torchao_model_releases/README.md


 Note: for initial release, please include `--populate_model_card_template` to populate model card template.

+### SMOOTHQUANT-INT8-INT8


nit: can you update this to SmoothQuant-INT8-INT8 as well

jerryzh168 · 2025-10-08T01:37:52Z

.github/scripts/torchao_model_releases/README.md

+
+Examples:
+```
+# release SMOOTHQUANT-INT8-INT8 model, calibrated with a specific task


jerryzh168

looks good, see some nit comments inline

jerryzh168 · 2025-10-08T18:43:51Z

.github/scripts/torchao_model_releases/quantize_and_upload.py

        type=int,
-        default=2048,
-        help="Maximum sequence length of examples to calibrate and evaluate model on. Default is 2048",
+        default=1024,


actually for this one, can you keep as is? I remember some models even need larger like 4096

* Summary: Adds SMOOTHQUANT-W8A8 quantization method to the TorchAO model release pipeline. - Adjusted defaults: Increased calibration samples from 10 to 128 to ensure consistency, reduced max sequence length (SeqLen) from 2048 to 1024 - Updated HF CLI command: `huggingface-cli login` to `hf auth login` Test plan: ```bash python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant SMOOTHQUANT-W8A8 --push_to_hub --task bbh ``` * add SmoothQuant uploader * separate docs for AWQ & SmoothQuant * rename SMOOTHQUANT-W8A8 to SMOOTHQUANT-INT8-INT8 * add SmoothQuant release example * update example in docs * rename SMOOTHQUANT-INT8-INT8 to SmoothQuant-INT8-INT8 * rename SMOOTHQUANT to SmoothQuant * revert max_seq_length default to 2048

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2025

namgyu-youn marked this pull request as draft September 17, 2025 09:28

namgyu-youn changed the title ~~Add SmoothQuant model release pipeline~~ feat: SmoothQuant release pipeline Sep 17, 2025

add SmoothQuant uploader

0b367db

jerryzh168 reviewed Sep 18, 2025

View reviewed changes

separate docs for AWQ & SmoothQuant

ede0448

namgyu-youn marked this pull request as ready for review September 18, 2025 08:54

namgyu-youn requested a review from jerryzh168 September 18, 2025 08:54

namgyu-youn changed the title ~~feat: SmoothQuant release pipeline~~ Build SmoothQuant release pipeline Sep 22, 2025

jerryzh168 reviewed Sep 22, 2025

View reviewed changes

namgyu-youn added 2 commits September 23, 2025 02:54

rename SMOOTHQUANT-W8A8 to SMOOTHQUANT-INT8-INT8

d3cc18a

add SmoothQuant release example

bdea42c

namgyu-youn requested a review from jerryzh168 September 22, 2025 17:56

jerryzh168 reviewed Sep 24, 2025

View reviewed changes

update example in docs

6c62463

namgyu-youn requested a review from jerryzh168 September 24, 2025 20:34

namgyu-youn added 2 commits September 25, 2025 15:49

rename SMOOTHQUANT-INT8-INT8 to SmoothQuant-INT8-INT8

cc58f52

Merge branch 'main' into smoothquant-serve

76bde99

jerryzh168 reviewed Oct 8, 2025

View reviewed changes

jerryzh168 approved these changes Oct 8, 2025

View reviewed changes

rename SMOOTHQUANT to SmoothQuant

557633b

namgyu-youn requested a review from jerryzh168 October 8, 2025 01:43

jerryzh168 approved these changes Oct 8, 2025

View reviewed changes

jerryzh168 reviewed Oct 8, 2025

View reviewed changes

revert max_seq_length default to 2048

b566810

namgyu-youn requested a review from jerryzh168 October 8, 2025 18:49

jerryzh168 approved these changes Oct 8, 2025

View reviewed changes

jerryzh168 merged commit 239e57a into pytorch:main Oct 8, 2025
3 checks passed

namgyu-youn deleted the smoothquant-serve branch October 9, 2025 03:21


		Note: for initial release, please include `--populate_model_card_template` to populate model card template.

		### SMOOTHQUANT-W8A8


		Note: for initial release, please include `--populate_model_card_template` to populate model card template.

		### SMOOTHQUANT-INT8-INT8

Build SmoothQuant release pipeline #3010

Build SmoothQuant release pipeline #3010

Uh oh!

Conversation

namgyu-youn commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3010

Uh oh!

namgyu-youn commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

namgyu-youn commented Sep 16, 2025 •

edited

Loading

pytorch-bot bot commented Sep 16, 2025 •

edited

Loading

namgyu-youn commented Sep 16, 2025 •

edited

Loading

jerryzh168 Sep 24, 2025 •

edited

Loading

jerryzh168 Sep 24, 2025 •

edited

Loading

namgyu-youn Sep 25, 2025 •

edited

Loading