Skip to content

Commit 76736a1

Browse files
crazydemodominicshanshan
authored andcommitted
[None][fix] fix Llama3 eagle3 test case OOM (NVIDIA#6832)
Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
1 parent b7a7977 commit 76736a1

File tree

6 files changed

+15
-13
lines changed

6 files changed

+15
-13
lines changed

tests/integration/defs/accuracy/references/cnn_dailymail.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,8 @@ meta-llama/Llama-3.2-3B:
186186
kv_cache_quant_algo: FP8
187187
accuracy: 33.629
188188
meta-llama/Llama-3.3-70B-Instruct:
189-
- spec_dec_algo: Eagle
189+
- quant_algo: FP8
190+
spec_dec_algo: Eagle
190191
accuracy: 33.244
191192
- quant_algo: NVFP4
192193
kv_cache_quant_algo: FP8

tests/integration/defs/accuracy/references/mmlu.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ meta-llama/Llama-3.2-3B:
5959
accuracy: 60.60
6060
meta-llama/Llama-3.3-70B-Instruct:
6161
- accuracy: 81.31
62-
- spec_dec_algo: Eagle
62+
- quant_algo: FP8
63+
spec_dec_algo: Eagle
6364
accuracy: 81.31
6465
- quant_algo: NVFP4
6566
kv_cache_quant_algo: FP8

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -543,25 +543,27 @@ def test_auto_dtype_tp8(self):
543543
task.evaluate(llm,
544544
extra_evaluator_kwargs=dict(apply_chat_template=True))
545545

546+
@skip_pre_hopper
546547
@pytest.mark.skip_less_mpi_world_size(8)
547548
@parametrize_with_ids("eagle3_one_model", [True, False])
548-
def test_eagle3_tp8(self, eagle3_one_model):
549-
model_path = f"{llm_models_root()}/llama-3.3-models/Llama-3.3-70B-Instruct"
549+
def test_fp8_eagle3_tp8(self, eagle3_one_model):
550+
model_path = f"{llm_models_root()}/modelopt-hf-model-hub/Llama-3.3-70B-Instruct-fp8"
550551
eagle_model_dir = f"{llm_models_root()}/EAGLE3-LLaMA3.3-Instruct-70B"
551552
kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.6)
552553
spec_config = EagleDecodingConfig(max_draft_len=4,
553554
speculative_model_dir=eagle_model_dir,
554555
eagle3_one_model=eagle3_one_model)
555-
pytorch_config = dict(disable_overlap_scheduler=True, )
556+
pytorch_config = dict(
557+
disable_overlap_scheduler=True,
558+
cuda_graph_config=CudaGraphConfig(max_batch_size=1))
556559
with LLM(model_path,
560+
max_batch_size=16,
557561
tensor_parallel_size=8,
558562
speculative_config=spec_config,
559563
kv_cache_config=kv_cache_config,
560564
**pytorch_config) as llm:
561565
task = CnnDailymail(self.MODEL_NAME)
562566
task.evaluate(llm)
563-
task = MMLU(self.MODEL_NAME)
564-
task.evaluate(llm)
565567

566568
@pytest.mark.skip_less_device(4)
567569
@skip_pre_hopper

tests/integration/test_lists/qa/llm_function_full.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -471,8 +471,8 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_beam_search[
471471
accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_fp8_beam_search[enable_cuda_graph=True-enable_padding=True-disable_overlap_scheduler=True]
472472
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_tp4
473473
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_nvfp4_tp4
474-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=True]
475-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=False]
474+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=True]
475+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=False]
476476
accuracy/test_llm_api_pytorch.py::TestMistral7B::test_auto_dtype
477477
accuracy/test_llm_api_pytorch.py::TestGemma3_1BInstruct::test_auto_dtype
478478
accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_auto_dtype

tests/integration/test_lists/qa/llm_function_sanity.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,10 @@ accuracy/test_llm_api_pytorch.py::TestLlama3_2_1B::test_auto_dtype
8686
accuracy/test_llm_api_pytorch.py::TestLlama3_2_1B::test_fp8_prequantized
8787
accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_auto_dtype
8888
accuracy/test_llm_api_pytorch.py::TestLlama3_2_3B::test_fp8_prequantized
89-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=False]
90-
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=True]
9189
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_tp4
9290
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_nvfp4_tp4
91+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=True]
92+
accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_fp8_eagle3_tp8[eagle3_one_model=False]
9393
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp4-cuda_graph=False]
9494
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp4ep2-cuda_graph=True]
9595
accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_auto_dtype[tp4ep4-cuda_graph=True]

tests/integration/test_lists/waives.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -294,8 +294,6 @@ disaggregated/test_disaggregated.py::test_disaggregated_diff_max_tokens[TinyLlam
294294
disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_tp1_single_gpu_mtp[DeepSeek-V3-Lite-fp8] SKIP (https://nvbugs/5465642)
295295
examples/test_multimodal.py::test_llm_multimodal_general[Mistral-Small-3.1-24B-Instruct-2503-pp:1-tp:1-bfloat16-bs:1-cpp_e2e:False-nb:1] SKIP (https://nvbugs/5431146)
296296
accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_fp8_blockscale[latency] SKIP (https://nvbugs/5464461)
297-
full:H100/accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=True] SKIP (https://nvbugs/5467815)
298-
full:H100/accuracy/test_llm_api_pytorch.py::TestLlama3_3_70BInstruct::test_eagle3_tp8[eagle3_one_model=False] SKIP (https://nvbugs/5467815)
299297
full:H100/accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8[tp4-cuda_graph=True] SKIP (https://nvbugs/5467815)
300298
full:H100/accuracy/test_llm_api_pytorch.py::TestLlama4ScoutInstruct::test_fp8_chunked_prefill[tp4ep4-cuda_graph=True] SKIP (https://nvbugs/5467815)
301299
accuracy/test_disaggregated_serving.py::TestQwen3_30B_A3B::test_mixed_ctx_gen_model[ctxpp2gentp2] SKIP (https://nvbugs/5470769)

0 commit comments

Comments
 (0)