Skip to content

Commit 2573bb7

Browse files
authored
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests (#6303)
Signed-off-by: moraxu <[email protected]>
1 parent 738ab61 commit 2573bb7

File tree

5 files changed

+6
-7
lines changed

5 files changed

+6
-7
lines changed

tests/integration/defs/accuracy/references/cnn_dailymail.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ microsoft/Phi-3-small-128k-instruct:
4040
- accuracy: 27.208
4141
microsoft/Phi-3.5-mini-instruct:
4242
- accuracy: 31.354
43+
microsoft/Phi-4-mini-instruct:
44+
- accuracy: 32.921
4345
state-spaces/mamba-130m-hf:
4446
- accuracy: 19.470
4547
lmsys/vicuna-7b-v1.3:

tests/integration/defs/accuracy/references/gsm8k.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,3 +126,5 @@ mistralai/Mistral-Small-3.1-24B-Instruct-2503:
126126
- accuracy: 89.23
127127
microsoft/Phi-4-multimodal-instruct:
128128
- accuracy: 81.19
129+
microsoft/Phi-4-mini-instruct:
130+
- accuracy: 82.30

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1920,10 +1920,6 @@ class TestPhi4MiniInstruct(LlmapiAccuracyTestHarness):
19201920
MODEL_NAME = "microsoft/Phi-4-mini-instruct"
19211921
MODEL_PATH = f"{llm_models_root()}/Phi-4-mini-instruct"
19221922

1923-
@pytest.mark.skip(
1924-
reason=
1925-
"Temporarily skipping test_auto_dtype while resolving Phi-4's architecture issue."
1926-
)
19271923
def test_auto_dtype(self):
19281924
with LLM(self.MODEL_PATH) as llm:
19291925
task = CnnDailymail(self.MODEL_NAME)
@@ -1932,9 +1928,6 @@ def test_auto_dtype(self):
19321928
task.evaluate(llm)
19331929
task = GSM8K(self.MODEL_NAME)
19341930
task.evaluate(llm)
1935-
task = GPQADiamond(self.MODEL_NAME)
1936-
task.evaluate(llm,
1937-
extra_evaluator_kwargs=dict(apply_chat_template=True))
19381931

19391932

19401933
class TestKanana_Instruct(LlmapiAccuracyTestHarness):

tests/integration/test_lists/qa/examples_test_list.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,7 @@ accuracy/test_llm_api_pytorch.py::TestBielik11BInstruct::test_fp8
495495
accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
496496
accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8
497497
accuracy/test_llm_api_pytorch.py::TestPhi4MM::test_auto_dtype
498+
accuracy/test_llm_api_pytorch.py::TestPhi4MiniInstruct::test_auto_dtype
498499

499500
test_e2e.py::test_llama_e2e[use_cpp_session-remove_input_padding-]
500501
test_e2e.py::test_llama_e2e[use_py_session-remove_input_padding-]

tests/integration/test_lists/qa/llm_sanity_test.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_fp8_block_scales[laten
6363
accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[latency_moe_cutlass]
6464
accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_nvfp4[latency_moe_trtllm]
6565
accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_fp8_block_scales[latency]
66+
accuracy/test_llm_api_pytorch.py::TestPhi4MiniInstruct::test_auto_dtype
6667
disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]
6768
disaggregated/test_disaggregated.py::test_disaggregated_cuda_graph[TinyLlama-1.1B-Chat-v1.0]
6869
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one_mtp[DeepSeek-V3-Lite-fp8]

0 commit comments

Comments
 (0)