minor

kinjalpatel27 · kinjalpatel27 · commit 13f6bcd1f872 · 2025-11-21T18:49:46.000Z
Signed-off-by: Kinjal Patel &lt;kinjalpravin@nvidia.com&gt;
diff --git a/examples/vllm_serve/README.md b/examples/vllm_serve/README.md
@@ -89,3 +89,4 @@ torch.distributed.barrier()
 
 1. AWQ is not yet supported in vLLM.
 2. PTQ/QAT checkpoint doesn't work with KV Cache quantization enabled.
+3. Mixed precision checkpoint doesn't work currently.
diff --git a/modelopt/torch/export/unified_export_hf.py b/modelopt/torch/export/unified_export_hf.py
@@ -582,6 +582,8 @@ def export_hf_checkpoint(
         dtype: the weights data type to export the unquantized layers or the default model data type if None.
         export_dir: the target export path.
         save_modelopt_state: whether to save the modelopt state_dict.
+        export_bf16_weights_amax: whether to export the bf16 weights and amax values separately. This can be used for
+                                  vLLM fakequant serving.
     """
     export_dir = Path(export_dir)
     export_dir.mkdir(parents=True, exist_ok=True)

Original file line number	Diff line number	Diff line change
`@@ -89,3 +89,4 @@ torch.distributed.barrier()`
`89`	`89`
`90`	`90`	`1. AWQ is not yet supported in vLLM.`
`91`	`91`	`2. PTQ/QAT checkpoint doesn't work with KV Cache quantization enabled.`
	`92`	`+3. Mixed precision checkpoint doesn't work currently.`