NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 195
Star 1.6k

Code
Issues 70
Pull requests 45
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Pull requests: NVIDIA/TensorRT-Model-Optimizer

Labels 25 Milestones 0

New pull request New

45 Open 282 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add L2NormHook and use it in megatron.py

#599 opened Nov 22, 2025 by danielkorzekwa

Loading…

[OMNIML-2244] enable fp8 and int8 ONNX export

#594 opened Nov 21, 2025 by ajrasane

Loading…

[2/N] Added KDLoss based AutoQuantize

#592 opened Nov 20, 2025 by realAsma

Loading…

Convert compressed-tensor int4 format to GPTQ int4 format

#590 opened Nov 20, 2025 by Edwardf0t1

Loading…

Yeyu/eagle embedding optional

#589 opened Nov 20, 2025 by yeyu-nvidia • Draft

[3/N] Support for save/restoring AutoQuantize sensitivity scores

#588 opened Nov 20, 2025 by realAsma

Loading…

[1/N] Refactored AutoQuantizeSearcher to _AutoQuantizeBaseSearcher & AutoQuantizeGradientSearcher; seperated quant modules and score modules

#586 opened Nov 20, 2025 by realAsma

Loading…

Yeyu/remove embedding from eagle

#585 opened Nov 20, 2025 by yeyu-nvidia • Draft

Add sewing kit and utilities used for pruning scoring - pruning scoring is self-contained now

#584 opened Nov 20, 2025 by danielkorzekwa

Loading…

Product Rename: TensorRT Model Optimizer to Model Optimizer

#583 opened Nov 20, 2025 by kevalmorabia97

Loading…

1 of 2 tasks

support for newer checkpoints

#582 opened Nov 20, 2025 by binghanc • Draft

Added support to export for BF16 weight and amax for vLLM fakequant QAT

#579 opened Nov 19, 2025 by kinjalpatel27

Loading…

Bump TRT-LLM docker to 1.2.0rc2 (CUDA 13)

#578 opened Nov 19, 2025 by kevalmorabia97

Loading…

1 task

[OMNIML-2244] Implement the ONNX quantization exporter for INT4

#575 opened Nov 18, 2025 by ajrasane

Loading…

Feat: SGL backend for online SD training

#564 opened Nov 14, 2025 by h-guo18

Loading…

Fix hf_quant_config with kv cache type

#557 opened Nov 14, 2025 by jenchen13

Loading…

GPTQ Lite implementation

#555 opened Nov 13, 2025 by sugunav14

Loading…

1 of 2 tasks

Specdec Bench: vLLM reqid, SGL path, conc > 1 metric fix

#541 opened Nov 12, 2025 by IzzyPutterman

Loading…

[OMNIML-2850] [3/n] Adds sparse attention calibration

#538 opened Nov 11, 2025 by kaix-nv

Loading…

Optimize NVFP4 Triton kernel

#533 opened Nov 11, 2025 by mxinO

Loading…

[OMNIML-2852] [2/n] Add Core Sparse Attention Infrastructure

#527 opened Nov 7, 2025 by kaix-nv

Loading…

parallel eagle draft

#523 opened Nov 6, 2025 by yeyu-nvidia • Draft

[Bug #193] fix fp8 blockwise real quantization

#522 opened Nov 6, 2025 by meenchen

Loading…

Support AWQ fake quant for vLLM MoE models

#521 opened Nov 6, 2025 by meenchen • Draft

[Draft] [5526696] Add kv cache quantization support for onnx quantization

#486 opened Oct 31, 2025 by zhanghaoc

Loading…

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!