Skip to content

Commit 7b073a9

Browse files
committed
[None][infra] update feature_combination_matrix of disaggregated and Eagle3
Signed-off-by: leslie-fang25 <[email protected]>
1 parent 3a98789 commit 7b073a9

File tree

3 files changed

+8
-5
lines changed

3 files changed

+8
-5
lines changed

docs/source/torch/features/feature_combination_matrix.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
| Disaggregated Serving | Yes | Yes | Yes | --- | | | | | | | | | | |
99
| Chunked Prefill | Yes | Yes | Yes | Untested | --- | | | | | | | | | |
1010
| MTP | Yes | Yes | Yes | Yes | Untested | --- | | | | | | | | |
11-
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | No | Yes | No | --- | | | | | | | |
12-
| EAGLE-3(Two Model Engine) | NO | Yes | Yes | No | Yes | No | No | --- | | | | | | |
11+
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | Yes | Yes | No | --- | | | | | | | |
12+
| EAGLE-3(Two Model Engine) | NO | Yes | Yes | Yes | Yes | No | No | --- | | | | | | |
1313
| Torch Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | |
1414
| TLLM C++ Sampler | Yes | Yes | Yes | Yes | Yes | No | No | No | No | --- | | | | |
1515
| KV Cache Reuse | Yes | Yes | Yes | Untested | Yes | Untested | Yes | No | Yes | Yes | --- | | | |

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -349,13 +349,15 @@ def test_disaggregated_llama_context_capacity(model, enable_cuda_graph,
349349
@pytest.mark.parametrize("model", ["Llama-3.1-8B-Instruct"])
350350
@pytest.mark.parametrize("spec_dec_model_path", ["EAGLE3-LLaMA3.1-Instruct-8B"])
351351
@pytest.mark.parametrize("generation_overlap", [False])
352+
@pytest.mark.parametrize("eagle3_one_model", [True, False])
352353
def test_disaggregated_spec_dec_batch_slot_limit(model, spec_dec_model_path,
353-
generation_overlap):
354+
generation_overlap,
355+
eagle3_one_model):
354356
# Test whether the batch slots are properly released when using speculative decoding
355357
# with disaggregated serving.
356358
spec_dec_config = EagleDecodingConfig(
357359
speculative_model_dir=model_path(spec_dec_model_path),
358-
eagle3_one_model=False,
360+
eagle3_one_model=eagle3_one_model,
359361
max_draft_len=3)
360362

361363
worker_pytorch_configs = []

tests/integration/test_lists/test-db/l0_h100.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ l0_h100:
8585
- disaggregated/test_workers.py::test_workers_kv_cache_aware_router[TinyLlama-1.1B-Chat-v1.0]
8686
- disaggregated/test_workers.py::test_workers_kv_cache_aware_router_eviction[TinyLlama-1.1B-Chat-v1.0]
8787
- disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity[False-False-DeepSeek-V3-Lite-fp8/fp8]
88-
- disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_spec_dec_batch_slot_limit[False-EAGLE3-LLaMA3.1-Instruct-8B-Llama-3.1-8B-Instruct]
88+
- disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_spec_dec_batch_slot_limit[True-False-EAGLE3-LLaMA3.1-Instruct-8B-Llama-3.1-8B-Instruct]
89+
- disaggregated/test_disaggregated_single_gpu.py::test_disaggregated_spec_dec_batch_slot_limit[False-False-EAGLE3-LLaMA3.1-Instruct-8B-Llama-3.1-8B-Instruct]
8990
- test_e2e.py::test_trtllm_bench_iteration_log[PyTorch-streaming-meta-llama/Llama-3.1-8B-llama-3.1-model/Meta-Llama-3.1-8B]
9091
- test_e2e.py::test_trtllm_bench_iteration_log[PyTorch-non-streaming-meta-llama/Llama-3.1-8B-llama-3.1-model/Meta-Llama-3.1-8B]
9192
- test_e2e.py::test_trtllm_bench_request_rate_and_concurrency[enable_concurrency-]

0 commit comments

Comments
 (0)