-
Notifications
You must be signed in to change notification settings - Fork 341
Update SmoothQuant to use subtensor instead of external wrapper #3012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Update SmoothQuant to use SupportsActivationPreScaling protocol instead of wrapper Similar to AWQ (pytorch#2753), SmoothQuant now uses direct protocol implementation instead of `to_weight_tensor_with_linear_activation_scale_metadata` wrapper. Key changes: - Add `act_pre_scale` attribute to `LinearActivationQuantizedTensor` - Apply pre-scaling in all dispatch methods - Remove external wrapper dependency Test Plan: ```bash python torchao/prototype/smoothquant/example.py python test/prototype/test_smoothquant.py python test/integration/test_integration.py ```
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3012
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Result of ----------------------------------------------------------------------
Ran 358 tests in 477.933s
OK (skipped=205) |
@namgyu-youn thanks, I think you probably want to migrate the Int8 tensor first, can you help with that? #2752 the plain_layout for int8_act_int8_weight ao/torchao/dtypes/uintx/plain_layout.py Lines 269 to 315 in 122b307
|
I am definitely interested in it because affine transformation is still hard for me. I will look into it; thanks for your suggestion. |
|
||
# Apply pre-scaling if present | ||
if ( | ||
hasattr(weight_tensor, "act_pre_scale") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should always be true?
oh actually this is not ready for review, we need to update int8 tensor instead, converting this to draft now |
Summary
Similar to AWQ (#2753), SmoothQuant now uses direct protocol implementation instead of an external wrapper (
to_weight_tensor_with_linear_activation_scale_metadata
).This PR was inspired by @jerryzh168, at #2728 (comment).
LinearActivationQuantizedTensor
:act_pre_scale
Test Plan