Skip to content

Conversation

namgyu-youn
Copy link
Contributor

@namgyu-youn namgyu-youn commented Sep 16, 2025

Summary:
Adds SMOOTHQUANT-W8A8 quantization method to the TorchAO model release pipeline.

  • Adjust defaults: Increased calibration samples from 10 to 128 to ensure consistency, decreased max_sequence_length (SeqLen) from 2048 to 1024
  • Update old HF CLI command: huggingface-cli login to hf auth login

Test plan:

python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant SMOOTHQUANT-W8A8 --push_to_hub --task mmlu_pro --populate_model_card_template

Adds SMOOTHQUANT-W8A8 quantization method to the TorchAO model release pipeline.
- Adjusted defaults: Increased calibration samples from 10 to 128 to
ensure consistency, reduced max sequence length (SeqLen) from 2048 to 1024
- Updated HF CLI command: `huggingface-cli login` to `hf auth login`

Test plan:
```bash
python quantize_and_upload.py --model_id Qwen/Qwen3-8B --quant SMOOTHQUANT-W8A8 --push_to_hub --task bbh
```
Copy link

pytorch-bot bot commented Sep 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3010

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2025
@namgyu-youn
Copy link
Contributor Author

namgyu-youn commented Sep 16, 2025

Checkpoint:
https://huggingface.co/namgyu-youn/Qwen3-8B-SMOOTHQUANT-W8A8

@namgyu-youn namgyu-youn marked this pull request as draft September 17, 2025 09:28
@namgyu-youn namgyu-youn changed the title Add SmoothQuant model release pipeline feat: SmoothQuant release pipeline Sep 17, 2025

### AWQ-INT4
[AWQ](https://arxiv.org/abs/2306.00978) is a technique to improve accuracy for weight only quantization. It improves accuracy by preserving "salient" weight channels that has high impact on the accuracy of output, through multiplying the weight channel by a scale, and do the reverse for the correspnoding activation, since activation is not quantized, there is no additional loss from activation, while the quantization loss from weight can be reduced.
### SMOOTHQUANT-W8A8 & AWQ-INT4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a separate section for smoothquant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, separating them and linking them seems better.

@namgyu-youn namgyu-youn marked this pull request as ready for review September 18, 2025 08:54
@namgyu-youn namgyu-youn changed the title feat: SmoothQuant release pipeline Build SmoothQuant release pipeline Sep 22, 2025
"model.embed_tokens": _int8_int4_embedding_config,
}
),
"SMOOTHQUANT-W8A8": Int8DynamicActivationInt8WeightConfig(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to standardize on naming, this should be: SmoothQuant-INT8-INT8 I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for correcting it; I missed the standard name in this script.


Note: for initial release, please include `--populate_model_card_template` to populate model card template.

### SMOOTHQUANT-W8A8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the command to generate smoothquant checkpoints as well? similar to AWQ-INT4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants