Skip to content

Commit fd8f417

Browse files
authored
[None][fix] fix Llama3 eagle3 test case OOM (#6832)
Signed-off-by: Ivy Zhang <[email protected]>
1 parent 0958efd commit fd8f417

File tree

5 files changed

+15
-11
lines changed

5 files changed

+15
-11
lines changed

tests/integration/defs/accuracy/references/cnn_dailymail.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,8 @@ meta-llama/Llama-3.2-3B:
172172
kv_cache_quant_algo: FP8
173173
accuracy: 33.629
174174
meta-llama/Llama-3.3-70B-Instruct:
175-
- spec_dec_algo: Eagle
175+
- quant_algo: FP8
176+
spec_dec_algo: Eagle
176177
accuracy: 33.244
177178
- quant_algo: NVFP4
178179
kv_cache_quant_algo: FP8

tests/integration/defs/accuracy/references/mmlu.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ meta-llama/Llama-3.2-3B:
5959
accuracy: 60.60
6060
meta-llama/Llama-3.3-70B-Instruct:
6161
- accuracy: 81.31
62-
- spec_dec_algo: Eagle
62+
- quant_algo: FP8
63+
spec_dec_algo: Eagle
6364
accuracy: 81.31
6465
- quant_algo: NVFP4
6566
kv_cache_quant_algo: FP8

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -383,25 +383,27 @@ def test_auto_dtype_tp8(self):
383383
task.evaluate(llm,
384384
extra_evaluator_kwargs=dict(apply_chat_template=True))
385385

386+
@skip_pre_hopper
386387
@pytest.mark.skip_less_mpi_world_size(8)
387388
@parametrize_with_ids("eagle3_one_model", [True, False])
388-
def test_eagle3_tp8(self, eagle3_one_model):
389-
model_path = f"{llm_models_root()}/llama-3.3-models/Llama-3.3-70B-Instruct"
389+
def test_fp8_eagle3_tp8(self, eagle3_one_model):
390+
model_path = f"{llm_models_root()}/modelopt-hf-model-hub/Llama-3.3-70B-Instruct-fp8"
390391
eagle_model_dir = f"{llm_models_root()}/EAGLE3-LLaMA3.3-Instruct-70B"
391392
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6)
392393
spec_config = EagleDecodingConfig(max_draft_len=4,
393394
speculative_model_dir=eagle_model_dir,
394395
eagle3_one_model=eagle3_one_model)
395-
pytorch_config = dict(disable_overlap_scheduler=True, )
396+
pytorch_config = dict(
397+
disable_overlap_scheduler=True,
398+
cuda_graph_config=CudaGraphConfig(max_batch_size=1))
396399
with LLM(model_path,
400+
max_batch_size=16,
397401
tensor_parallel_size=8,
398402
speculative_config=spec_config,
399403
kv_cache_config=kv_cache_config,
400404
**pytorch_config) as llm:
401405
task = CnnDailymail(self.MODEL_NAME)
402406
task.evaluate(llm)
403-
task = MMLU(self.MODEL_NAME)
404-
task.evaluate(llm)
405407

406408
@pytest.mark.skip_less_device(4)
407409
@skip_pre_hopper

tests/integration/test_lists/qa/llm_function_full.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -450,8 +450,8 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_guided_decoding_
450450
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_guided_decoding_4gpus[llguidance]
451451
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_tp4
452452
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_nvfp4_tp4
453-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=True]
454-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=False]
453+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=True]
454+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=False]
455455
accuracy/test_llm_api_pytorch.py::TestMistral7B::test_auto_dtype
456456
accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype
457457
accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_auto_dtype

tests/integration/test_lists/qa/llm_function_sanity.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,8 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_auto_dtype
6666
accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_fp8_prequantized
6767
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_tp4
6868
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_nvfp4_tp4
69-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=True]
70-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=False]
69+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=True]
70+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=False]
7171
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp8-cuda_graph=False]
7272
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp8ep4-cuda_graph=True]
7373
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp8ep8-cuda_graph=True]

0 commit comments

Comments
 (0)