Skip to content

Conversation

@realAsma
Copy link
Contributor

@realAsma realAsma commented Nov 20, 2025

What does this PR do?

Type of change: ? New Feature

Overview:

This PR extends AutoQuantize with KL Divergence Loss-based sensitivity measurement as an alternative to the existing gradient-based approach. KD Loss uses a binary searcher similar to the binary searcher in FastNAS.

AutoQuantize gradient is faster than KL Divergence based AutoQuantize. However KL Divergence does not need the model implementation to support gradient backward. In addition, AutoQuantize collected KL Divergence is useful for sensitivity analysis of the model. KL Divergence is a more direct measure of sensitivity than gradient scores.

Usage

see tests/unit/torch/quantization/test_autoquant.py

Testing

Testes with unit tests.

Result for Qwen3 8B
image

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@realAsma realAsma requested review from a team as code owners November 20, 2025 23:11
@realAsma realAsma requested review from Edwardf0t1 and ajrasane and removed request for a team November 20, 2025 23:11
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 197d4d6 to 9134ca9 Compare November 20, 2025 23:39
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch from 9ebd69f to b7bd107 Compare November 21, 2025 00:21
@realAsma realAsma requested a review from a team as a code owner November 21, 2025 00:21
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from dc15dae to 48b0423 Compare November 21, 2025 00:33
@realAsma realAsma force-pushed the asma/auto_quantize_improvements branch 3 times, most recently from 60a0f26 to 0275c61 Compare November 21, 2025 17:56
Signed-off-by: Asma Kuriparambil Thekkumpate <[email protected]>

minor

Signed-off-by: Asma Kuriparambil Thekkumpate <[email protected]>

cheery-picked final PR changes

changelog updates

Signed-off-by: realAsma <[email protected]>

minor

Signed-off-by: realAsma <[email protected]>
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 48b0423 to 73fc080 Compare November 21, 2025 21:22
@realAsma realAsma requested a review from meenchen November 21, 2025 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants