Skip to content

Conversation

@realAsma
Copy link
Contributor

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@realAsma realAsma requested review from a team as code owners November 20, 2025 18:54
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 88.97849% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.65%. Comparing base (a703e22) to head (c96d919).

Files with missing lines Patch % Lines
modelopt/torch/quantization/algorithms.py 88.73% 41 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
+ Coverage   74.45%   74.65%   +0.19%     
==========================================
  Files         182      182              
  Lines       18250    18453     +203     
==========================================
+ Hits        13588    13776     +188     
- Misses       4662     4677      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma changed the base branch from main to asma/auto_quantize_kd_loss_sensitivity November 20, 2025 23:19
@realAsma realAsma changed the title [3/N] Added autoquantize search state save/restore support [3/N] Support for save/restoring AutoQuantize sensitivity scores Nov 20, 2025
@realAsma realAsma requested review from Fridah-nv, ajrasane, cjluo-nv, kinjalpatel27 and mxinO and removed request for ChenhanYu and cjluo-nv November 20, 2025 23:20
@realAsma realAsma force-pushed the asma/auto_quantize_user_improvements branch from c96d919 to a489c5d Compare November 20, 2025 23:25
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 197d4d6 to 9134ca9 Compare November 20, 2025 23:39
@realAsma realAsma force-pushed the asma/auto_quantize_user_improvements branch from a489c5d to 35be9b2 Compare November 20, 2025 23:41
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from dc15dae to 48b0423 Compare November 21, 2025 00:33
@realAsma realAsma force-pushed the asma/auto_quantize_user_improvements branch from 35be9b2 to 63a034f Compare November 21, 2025 00:39
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 48b0423 to 73fc080 Compare November 21, 2025 21:22
@realAsma realAsma force-pushed the asma/auto_quantize_user_improvements branch from d169f2d to c0dd4cf Compare November 21, 2025 21:47
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 46670e1 to 09d8a29 Compare November 25, 2025 20:24
@realAsma realAsma requested a review from meenchen November 25, 2025 20:29
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 09d8a29 to 75f83da Compare November 25, 2025 22:00
@realAsma realAsma force-pushed the asma/auto_quantize_kd_loss_sensitivity branch from 75f83da to 1b52477 Compare November 25, 2025 23:18
Some improvements for KLDiv

Signed-off-by: realAsma <[email protected]>

changelog update

Signed-off-by: realAsma <[email protected]>

minor

Signed-off-by: realAsma <[email protected]>

doc updates

Signed-off-by: realAsma <[email protected]>
@realAsma realAsma force-pushed the asma/auto_quantize_user_improvements branch from c0dd4cf to 4853606 Compare November 25, 2025 23:30
@realAsma realAsma merged commit 6e3ad6f into asma/auto_quantize_kd_loss_sensitivity Nov 25, 2025
1 check passed
@realAsma realAsma deleted the asma/auto_quantize_user_improvements branch November 25, 2025 23:31
realAsma added a commit that referenced this pull request Nov 26, 2025
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see #592
and #588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
inisis pushed a commit to inisis/TensorRT-Model-Optimizer that referenced this pull request Nov 26, 2025
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see NVIDIA#592
and NVIDIA#588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
Signed-off-by: inisis <[email protected]>
jQizhang pushed a commit to jQizhang/TensorRT-Model-Optimizer that referenced this pull request Nov 26, 2025
…AutoQuantizeGradientSearcher; seperated quant modules and score modules (NVIDIA#586)

## What does this PR do?

**Type of change:**  Refator; Minor new feature

**Overview:** ?

1. Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher &
AutoQuantizeGradientSearcher - Prepares architecture for additional
search methods.
2. seperated quant modules and score modules - separate quantization
modules from scoring modules, enabling auto-quantization to measure
sensitivity at parent layers (e.g., MLP output for MoE experts) rather
than individual ops.
3. Also see NVIDIA#592
and NVIDIA#588

## Testing
See unittests; `tests/unit/torch/quantization/test_autoquant.py` and
`tests/unit/torch/quantization/plugins/test_huggingface.py`

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Not Required

## Additional Information
<!-- E.g. related issue. -->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for score modules in quantization workflows.
  * Added optional naming for quantization recipes.

* **Bug Fixes**
* Improved quantization grouping rules documentation with clearer
configuration examples.

* **Refactor**
  * Renamed quantization module parameters for improved clarity.
  * Enhanced quantization search architecture for better scalability.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: realAsma <[email protected]>
Co-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants