Skip to content

Commit 7e014ca

Browse files
authored
fix bug and add nvfp in alg-ext with slight improvement (#794)
1 parent d7d2efa commit 7e014ca

File tree

4 files changed

+26
-3
lines changed

4 files changed

+26
-3
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ and [fbaldassarri](https://huggingface.co/fbaldassarri). Please check out [User
2828

2929
## 🆕 What's New
3030

31+
[2025/09] AutoRound now includes experimental support for the mxfp4 and nvfp4 dtypes. For accuracy results, see the [documentation](./docs/mxnv_acc.md)
32+
. We currently recommend exporting to the LLM-Compressor format.
33+
3134
[2025/08] AutoRound now provides experimental support for an improved INT2 algorithm via `--enable_alg_ext`. See this [documentation](./docs/alg_202508.md)
3235
for some accuracy results.
3336

20.1 KB
Binary file not shown.

auto_round/autoround.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2885,16 +2885,21 @@ def _quantize_blocks(
28852885
self.sym
28862886
and self.enable_alg_ext
28872887
and self.super_group_size is None
2888-
and ((self.data_type.startswith("int") and self.act_bits >= 8) or self.data_type.startswith("mx"))
2888+
and (
2889+
(self.data_type.startswith("int") and self.act_bits >= 8)
2890+
or self.data_type.startswith("mx")
2891+
or self.data_type.startswith("nv")
2892+
)
28892893
):
28902894
try:
28912895
from auto_round.alg_ext import quantize_block_ext
28922896

28932897
AutoRound.quantize_block_ext = quantize_block_ext
28942898
quantize_block = self.quantize_block_ext # must use self.quantize_block_ext
2895-
if self.bits > 2 and not self.data_type.startswith("mx"):
2899+
if self.bits > 2 and (not self.data_type.startswith("mx") or not self.data_type.startswith("nv")):
28962900
logger.warning(
2897-
"algorithm extension has only undergone limited validation on INT2 and mxfp4; use with caution."
2901+
"algorithm extension has only undergone limited validation on "
2902+
"INT2,mxfp4 and nvfp4; use with caution."
28982903
)
28992904
else:
29002905
logger.info("using algorithm extension for quantization.")

docs/mxnv_acc.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Average accuracy of hellaswag,lambada_openai,mmlu,piqa,winogrande.
2+
3+
We evaluated using a fake model since we currently have no access to devices for running the real models. However, we have verified that in most cases the fake model closely matches the real model.
4+
5+
| mxfp4 g32 | llama3.1-8B-Instruct | Qwen2-7.5-Instruct | Phi4 | Qwen3-32B |
6+
|-------------------|----------------------|--------------------|---------|-----------|
7+
| RTN | 0.62124 | 0.65502 | 0.71674 | 0.69006 |
8+
| AutoRound | 0.66862 | 0.67588 | 0.72472 | 0.72106 |
9+
| AutoRound+alg_ext | 0.6732 | 0.68094 | 0.72252 | 0.72012 |
10+
11+
| nvfp4 g16 | llama3.1-8B-Instruct | Qwen2-7.5-Instruct | Phi4 | Qwen3-32B |
12+
|-------------------|----------------------|--------------------|---------|-----------|
13+
| RTN | 0.68756 | 0.6906 | 0.72962 | 0.71636 |
14+
| AutoRound | 0.69184 | 0.69728 | 0.73058 | 0.73062 |
15+
| AutoRound+alg_ext | 0.69648 | 0.6989 | 0.7318 | |

0 commit comments

Comments
 (0)