Skip to content

Commit 7cdeef4

Browse files
2ez4bzdominicshanshan
authored andcommitted
[https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (NVIDIA#6909)
This commit lowers the GPU memory allocated for KV cache in accuracy tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8. Signed-off-by: William Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
1 parent 6aa0dfc commit 7cdeef4

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

tests/integration/defs/accuracy/references/cnn_dailymail.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ mistralai/Mistral-Small-3.1-24B-Instruct-2503:
217217
- accuracy: 29.20
218218
- quant_algo: FP8
219219
kv_cache_quant_algo: FP8
220-
accuracy: 29.0
220+
accuracy: 27.0
221221
mistralai/Mistral-Nemo-12b-Base:
222222
- accuracy: 28.906
223223
mistralai/Mistral-Nemo-Base-2407:

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -795,7 +795,8 @@ class TestMistralSmall24B(LlmapiAccuracyTestHarness):
795795
],
796796
)
797797
def test_auto_dtype(self, model_path, expected_quant_algo):
798-
with LLM(model_path) as llm:
798+
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.75)
799+
with LLM(model_path, kv_cache_config=kv_cache_config) as llm:
799800
assert llm.args.quant_config.quant_algo == expected_quant_algo
800801
task = CnnDailymail(self.MODEL_NAME)
801802
task.evaluate(llm)

0 commit comments

Comments
 (0)