Re-quantize FP8 model with INC #114

yiliu30 · 2025-08-29T05:14:42Z

QUANT_CONFIG=vllm-gaudi/tests/models/language/generation/inc_dynamic_quant.json VLLM_HPU_FORCE_CHANNEL_FP8=false  \
HABANA_VISIBLE_DEVICES=all VLLM_CONTIGUOUS_PA=False VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=1 VLLM_USE_V1=1 \
VLLM_SKIP_WARMUP=true VLLM_CONTIGUOUS_PA=False PT_HPU_LAZY_MODE=1 \
lm_eval   --model vllm --tasks gsm8k --num_fewshot 5 --batch_size 128 \
--model_args "pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16"

vllm (pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8817|±  |0.0089|
|     |       |strict-match    |     5|exact_match|↑  |0.8749|±  |0.0091|

Signed-off-by: yiliu30 <[email protected]>

xuechendi · 2025-09-11T01:13:01Z

what is required engineering build version ?

vllm_gaudi/extension/ops.py

tests/full_tests/ci_tests.sh

Signed-off-by: yiliu30 <[email protected]>

yiliu30 · 2025-09-11T01:39:25Z

what is required engineering build version ?

I tested it on build 1.23-279.

xuechendi · 2025-09-12T01:14:54Z

PR looks good to me, what is the status after this PR? We are now supporting any "quant_method as fp8" models with dynamic quant, is that right?

xuechendi · 2025-09-12T01:15:43Z

/run-gaudi-tests

```bash QUANT_CONFIG=vllm-gaudi/tests/models/language/generation/inc_dynamic_quant.json VLLM_HPU_FORCE_CHANNEL_FP8=false \ HABANA_VISIBLE_DEVICES=all VLLM_CONTIGUOUS_PA=False VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=1 VLLM_USE_V1=1 \ VLLM_SKIP_WARMUP=true VLLM_CONTIGUOUS_PA=False PT_HPU_LAZY_MODE=1 \ lm_eval --model vllm --tasks gsm8k --num_fewshot 5 --batch_size 128 \ --model_args "pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16" ``` ```bash vllm (pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8817|± |0.0089| | | |strict-match | 5|exact_match|↑ |0.8749|± |0.0091| ``` --------- Signed-off-by: yiliu30 <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Signed-off-by: Katarzyna Fojcik <[email protected]>

yiliu30 added 6 commits August 29, 2025 04:44

add dequant for lienar

ce46720

Signed-off-by: yiliu30 <[email protected]>

add test

aab05d4

Signed-off-by: yiliu30 <[email protected]>

fix pre-commit

0214868

Signed-off-by: yiliu30 <[email protected]>

merge

91daffa

Signed-off-by: yiliu30 <[email protected]>

fix

7e5c4cd

Signed-off-by: yiliu30 <[email protected]>

update

84bb448

Signed-off-by: yiliu30 <[email protected]>

yiliu30 marked this pull request as ready for review September 11, 2025 00:39

yiliu30 requested review from kzawora-intel, xuechendi, mswiniarsk and adobrzyn as code owners September 11, 2025 00:39

xuechendi reviewed Sep 11, 2025

View reviewed changes

vllm_gaudi/extension/ops.py Show resolved Hide resolved

xuechendi reviewed Sep 11, 2025

View reviewed changes

tests/full_tests/ci_tests.sh Outdated Show resolved Hide resolved

yiliu30 added 2 commits September 11, 2025 01:35

remove test in hourly test

e9a5204

Signed-off-by: yiliu30 <[email protected]>

udapte

f0fa6e5

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from xuechendi September 12, 2025 00:38

Merge branch 'main' into block-wise

19a03fb

xuechendi approved these changes Sep 12, 2025

View reviewed changes

xuechendi enabled auto-merge (squash) September 12, 2025 01:15

xuechendi merged commit 56158a3 into vllm-project:main Sep 12, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-quantize FP8 model with INC #114

Re-quantize FP8 model with INC #114

Uh oh!

yiliu30 commented Aug 29, 2025 •

edited

Loading

Uh oh!

xuechendi commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Sep 11, 2025

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Re-quantize FP8 model with INC #114

Re-quantize FP8 model with INC #114

Uh oh!

Conversation

yiliu30 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuechendi commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Sep 11, 2025

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Aug 29, 2025 •

edited

Loading