Skip to content

Conversation

yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Aug 29, 2025

QUANT_CONFIG=vllm-gaudi/tests/models/language/generation/inc_dynamic_quant.json VLLM_HPU_FORCE_CHANNEL_FP8=false  \
HABANA_VISIBLE_DEVICES=all VLLM_CONTIGUOUS_PA=False VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=1 VLLM_USE_V1=1 \
VLLM_SKIP_WARMUP=true VLLM_CONTIGUOUS_PA=False PT_HPU_LAZY_MODE=1 \
lm_eval   --model vllm --tasks gsm8k --num_fewshot 5 --batch_size 128 \
--model_args "pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16" 
vllm (pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.8817|±  |0.0089|
|     |       |strict-match    |     5|exact_match||0.8749|±  |0.0091|

Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
@yiliu30 yiliu30 marked this pull request as ready for review September 11, 2025 00:39
@xuechendi
Copy link
Collaborator

what is required engineering build version ?

Signed-off-by: yiliu30 <[email protected]>
@yiliu30
Copy link
Contributor Author

yiliu30 commented Sep 11, 2025

what is required engineering build version ?

I tested it on build 1.23-279.

@yiliu30 yiliu30 requested a review from xuechendi September 12, 2025 00:38
@xuechendi
Copy link
Collaborator

PR looks good to me, what is the status after this PR? We are now supporting any "quant_method as fp8" models with dynamic quant, is that right?

@xuechendi
Copy link
Collaborator

/run-gaudi-tests

@xuechendi xuechendi enabled auto-merge (squash) September 12, 2025 01:15
@xuechendi xuechendi merged commit 56158a3 into vllm-project:main Sep 12, 2025
8 checks passed
kfojcik-intel pushed a commit to kfojcik-intel/vllm-gaudi that referenced this pull request Sep 12, 2025
```bash
QUANT_CONFIG=vllm-gaudi/tests/models/language/generation/inc_dynamic_quant.json VLLM_HPU_FORCE_CHANNEL_FP8=false  \
HABANA_VISIBLE_DEVICES=all VLLM_CONTIGUOUS_PA=False VLLM_SKIP_WARMUP=true PT_HPU_LAZY_MODE=1 VLLM_USE_V1=1 \
VLLM_SKIP_WARMUP=true VLLM_CONTIGUOUS_PA=False PT_HPU_LAZY_MODE=1 \
lm_eval   --model vllm --tasks gsm8k --num_fewshot 5 --batch_size 128 \
--model_args "pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16"
```
```bash
vllm (pretrained=/mnt/disk8/Qwen/Qwen3-8B-FP8,tensor_parallel_size=1,trust_remote_code=true,max_model_len=4096,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8817|±  |0.0089|
|     |       |strict-match    |     5|exact_match|↑  |0.8749|±  |0.0091|

```

---------

Signed-off-by: yiliu30 <[email protected]>
Co-authored-by: Chendi.Xue <[email protected]>
Signed-off-by: Katarzyna Fojcik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants