-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[None][fix] update skip config #6891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughMove and adjust pytest skip/timeout markers across disaggregated-serving tests; add MODEL_PATH and split a parameterized Mistral test into explicit Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
1-1
: Add required NVIDIA copyright headerPer repo guidelines, all source files require the NVIDIA copyright header with the current year at the top.
Apply this diff:
+ # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. # I want to create accuracy tests for disaggregated serving. # I need to to this by creating a new class that mimics LLM class. Instead of implementing the # actual methods it will send OAI requests to the disaggregated serving endpoint. # Please take a look at the existing test_llm_api_pytorch.py file for reference.
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
335-337
: Consider adding a memory gate to ngram testGiven speculative decoding and KV cache settings, memory pressure can still be significant even with 2 GPUs. Add a memory threshold to avoid CI OOMs.
Apply this diff:
@pytest.mark.skip_less_device(2) + @pytest.mark.skip_less_device_memory(32000) @skip_pre_hopper def test_ngram(self):
387-387
: EAGLE3 test may also need memory gatingThis path uses large max_num_tokens and batch sizes; consider a memory gate similar to DeepSeek to prevent OOM on smaller-memory GPUs.
Apply this diff:
@skip_pre_hopper @parametrize_with_ids("overlap_scheduler", [True, False]) @parametrize_with_ids("eagle3_one_model", [True, False]) @pytest.mark.skip_less_device(2) +@pytest.mark.skip_less_device_memory(60000) def test_eagle3(self, overlap_scheduler, eagle3_one_model):
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)
305-305
: LGTM: add per-test device gating for Llama3.1 8B auto_dtypeThe device gate is appropriate and reduces flakiness on smaller runners.
468-471
: LGTM: strengthen gating for Llama-4-Scout and add timeoutThe combination of device and memory gates plus a per-test timeout is appropriate for this heavy model.
509-511
: LGTM: add device and memory gating to DeepSeek V3 Lite nixl backendAppropriate thresholds to avoid OOM and under-provisioned devices in CI.
548-548
: LGTM: enforce 8-device requirement for DeepSeek V3 Lite auto_dtypeMatches the model’s parallelism expectations and reduces CI variance.
592-592
: LGTM: add device gating for Gemma 3 1B auto_dtypeReasonable lower bound for this test.
644-644
: LGTM: add device gating for Qwen3 8B nixl backendThis will reduce flakiness on single-GPU runners.
682-684
: LGTM: add Hopper and device gating for Qwen3 8B auto_dtypeDecorator order is fine and consistent with other tests.
304-306
: No action required — custom pytest marks are already registeredThe markers skip_less_device and skip_less_device_memory are defined in tests/integration/defs/pytest.ini (markers block), so PytestUnknownMarkWarning is not expected for tests in that folder.
- File: tests/integration/defs/pytest.ini — contains marker registrations (skip_less_device, skip_less_device_memory, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
51-54
: Python 3.8 compatibility: avoid PEP 585 built-in genericsThis annotation uses list[...] which requires Python 3.9+. Our guidelines target Python 3.8+, so prefer typing.List.
Apply this diff:
- self.futures: list[concurrent.futures.Future[RequestOutput]] = [] + self.futures: List[concurrent.futures.Future[RequestOutput]] = []
🧹 Nitpick comments (4)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4)
385-388
: Decorator order nit: place skip marks above parametrize for consistency/readabilityKeeping skip markers as the outermost decorators makes intent clearer and ensures consistent behavior across tests. Consider reordering here (mirroring other tests in this file).
Apply this diff to reorder:
-@skip_pre_hopper -@parametrize_with_ids("overlap_scheduler", [True, False]) -@parametrize_with_ids("eagle3_one_model", [True, False]) -@pytest.mark.skip_less_device(2) +@pytest.mark.skip_less_device(2) +@skip_pre_hopper +@parametrize_with_ids("overlap_scheduler", [True, False]) +@parametrize_with_ids("eagle3_one_model", [True, False])
545-549
: Decorator order nit: move skip mark above parametrize for uniformityMinor consistency/readability improvement. Also prevents any subtle mark/parametrize propagation surprises.
Apply this diff:
-@parametrize_with_ids("overlap_scheduler", [True, False]) -@parametrize_with_ids("mtp_nextn", - [0, pytest.param(2, marks=skip_pre_hopper)]) -@pytest.mark.skip_less_device(8) +@pytest.mark.skip_less_device(8) +@parametrize_with_ids("overlap_scheduler", [True, False]) +@parametrize_with_ids("mtp_nextn", + [0, pytest.param(2, marks=skip_pre_hopper)])
644-645
: Remove redundant no-op lines; gating addition looks fine
- The new skip_less_device(2) on the nixl test is appropriate.
- Lines 658-659 contain two no-op dictionary lookups that should be removed.
Apply this diff to clean up:
- ctx_server_config["cache_transceiver_config"] - ctx_server_config["cache_transceiver_config"]Also applies to: 658-659
1-1
: Missing NVIDIA copyright headerPer coding guidelines, prepend the current-year NVIDIA copyright header.
Apply this diff at the top of the file:
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
🔇 Additional comments (6)
tests/integration/defs/accuracy/test_disaggregated_serving.py (6)
305-306
: Good addition: enforce 2-GPU availability alongside memory gatingAdding skip_less_device(2) complements the existing memory guard and prevents noisy failures on undersized nodes.
335-337
: Targeted gating looks correct (device count + pre-Hopper exclusion)The combination of skip_less_device(2) and skip_pre_hopper is appropriate for ngram speculative decoding.
468-471
: LGTM: strong gating for heavy model (mem + devices) with a timeoutThe added memory gate, 8-GPU requirement, and a 1h timeout are appropriate for this class of test.
509-511
: LGTM: gating for nixl backend is reasonableThe per-test device and memory guards should reduce flakes on constrained CI workers.
592-593
: LGTM: add 2-GPU guard on Gemma 3 1B testAppropriate gating for the scenario covered.
681-684
: LGTM: decorator order improvedPlacing skip_less_device(2) before parametrize (alongside skip_pre_hopper) is consistent and clear.
/bot run --gpu-type "H100_PCIe" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (2)
701-709
: De-parameterized auto-dtype test is clean and consistentSwitching to a single-path test and covering CnnDailymail, MMLU, and GSM8K aligns with the rest of this file. Memory gating at 80G is appropriate.
If these runs occasionally hang under CI stress, consider adding a timeout mark similar to other long-running tests in this file.
@pytest.mark.skip_less_device_memory(80000) def test_auto_dtype(self): + # Optional: avoid rare CI hangs + # @pytest.mark.timeout(7200) with LLM(self.MODEL_PATH) as llm: ...
710-716
: FP8 coverage added — consider asserting KV cache quantization if applicableGreat to add a dedicated FP8 prequantized test with proper arch gating and memory guard. If the FP8 model ships FP8 KV cache too, asserting kv_cache_quant_algo can harden the test. If not guaranteed for this artifact, ignore.
@skip_pre_ada @pytest.mark.skip_less_device_memory(80000) def test_fp8(self): model_path = f"{llm_models_root()}/Mistral-Small-3.1-24B-Instruct-2503-fp8" with LLM(model_path) as llm: assert llm.args.quant_config.quant_algo == QuantAlgo.FP8 + # Optional: only if the artifact guarantees FP8 KV cache + # assert llm.args.quant_config.kv_cache_quant_algo == QuantAlgo.FP8 task = CnnDailymail(self.MODEL_NAME) task.evaluate(llm)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(9 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py
(1 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_h100.yml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/test_lists/qa/llm_function_sanity.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_h100.yml
tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧬 Code Graph Analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (5)
tensorrt_llm/llmapi/llm.py (1)
LLM
(1111-1127)tests/integration/defs/accuracy/accuracy_core.py (3)
CnnDailymail
(202-219)evaluate
(146-199)evaluate
(678-688)tests/integration/defs/accuracy/test_llm_api.py (4)
test_fp8
(123-130)test_fp8
(185-191)test_fp8
(199-204)test_fp8
(271-276)tensorrt_llm/llmapi/llm_args.py (2)
quant_config
(2121-2124)quant_config
(2127-2128)tensorrt_llm/quantization/mode.py (1)
QuantAlgo
(23-44)
🔇 Additional comments (12)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)
698-698
: Centralizing MODEL_PATH for MistralSmall24B looks goodDefining MODEL_PATH on the class removes the need for param-based path passing and keeps tests consistent with other suites.
tests/integration/defs/accuracy/test_disaggregated_serving.py (9)
305-306
: Device gating added to disaggregated Llama3.1-8B auto-dtypeAdding skip_less_device(2) reduces spurious CI failures on undersized runners.
335-337
: Per-test device gating for NGram is appropriateLocalizing the skip to the test improves clarity and avoids over-gating the entire class.
387-388
: EAGLE3 test: explicit device gatingThis mirrors resource needs for speculative decoding; looks good.
468-471
: Function-level gating for Scout Instruct auto-dtype is clearer
- Moving memory/device gating and timeout onto the test itself is an improvement.
- Using skip_less_device(8) matches the footprint of this model.
509-511
: NIXL backend test: added device and memory guardsGood addition to stabilize CI; the backend variations can be memory-sensitive.
548-549
: DeepSeekV3Lite auto-dtype now gated to 8 GPUsGiven the param sweep, this is a sensible constraint.
592-593
: Gemma3 1B: added device gatingFine-grained gating keeps the class accessible while protecting resource-heavy tests.
644-645
: Qwen3 8B NIXL backend: added device gatingConsistent with other NIXL tests; good.
682-684
: Decorator order: skip marks before parametrizePlacing skip decorators before parametrize avoids parametrizing tests that won’t run. Good call.
tests/integration/test_lists/qa/llm_function_full.txt (1)
457-458
: Explicit MistralSmall24B entries added — verification passed
- Replacing parameterized invocations with explicit tests and adding test_fp8 is correct.
- I searched for lingering parameterized MistralSmall24B auto-dtype invocations and hardcoded fp8-variant strings:
- Explicit entries found:
- tests/integration/test_lists/qa/llm_function_full.txt — lines 457–458 (test_auto_dtype, test_fp8)
- tests/integration/test_lists/qa/llm_function_sanity.txt — lines 49–50
- tests/integration/test_lists/test-db/l0_h100.yml — lines 208–209
- Definition and fp8 model paths in:
- tests/integration/defs/accuracy/test_llm_api_pytorch.py (class TestMistralSmall24B and MODEL_PATH / -fp8 model_path)
- No bracketed/parameterized occurrence of TestMistralSmall24B::test_auto_dtype[...] was found; other tests do use parameterized ::test_auto_dtype[...] but none reference TestMistralSmall24B.
- No exact hardcoded pattern "Mistral-Small-3.1-24B-Instruct-2503-fp8-FP8" found; fp8 variants appear as expected (e.g., "-fp8" model paths) and test_fp8 entries are present across test lists.
Conclusion: changes are correct and there are no stale parameterized TestMistralSmall24B entries remaining.
tests/integration/test_lists/test-db/l0_h100.yml (1)
208-209
: H100 post-merge: added MistralSmall24B explicit testsThese entries align with the refactor and ensure both BF16 (auto-dtype) and FP8 variants are exercised in post-merge.
PR_Github #15367 [ run ] triggered by Bot |
PR_Github #15367 [ run ] completed with state |
/bot run |
PR_Github #15409 [ run ] triggered by Bot |
PR_Github #15409 [ run ] completed with state |
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/integration/defs/accuracy/test_disaggregated_serving.py (3)
304-307
: Py38 typing and missing copyright header (repo guideline)Unrelated to this hunk, but:
- list[...] generics require Python 3.9+. Guidelines target 3.8+, so prefer typing.List[...] (or enable from future import annotations).
- File is missing the standard NVIDIA copyright header.
Proposed fixes:
- Switch to typing.List for the futures field (Line 53):
- self.futures: list[concurrent.futures.Future[RequestOutput]] = [] + self.futures: List[concurrent.futures.Future[RequestOutput]] = []
- Add the repository’s standard NVIDIA header at the top of the file (use the canonical header used elsewhere in this repo).
I can auto-generate a patch for the header once you confirm the exact header stanza used in this repo (Apache-2.0 vs. internal variant).
468-473
: Bound the health-check loop to avoid indefinite hangsUnrelated to this hunk, but the health probe loop (Lines 193-201) can spin forever if servers fail to become healthy. Consider a total timeout aligned with --server_start_timeout and a per-request timeout to reduce flakiness.
Sketch:
# Replace the while True block with a bounded loop deadline = time.time() + 3600 # match --server_start_timeout while time.time() < deadline: time.sleep(1) try: print("Checking health endpoint") response = requests.get("http://localhost:8000/health", timeout=5) if response.status_code == 200: break except requests.exceptions.RequestException: continue else: raise RuntimeError("Disaggregated serving health check timed out")
644-660
: Remove no-op lines; they do nothing and may confuse future readersThese two lines are redundant:
- ctx_server_config["cache_transceiver_config"]
- ctx_server_config["cache_transceiver_config"]
Apply:
@@ def test_nixl_backend(self): @@ gen_server_config = { "disable_overlap_scheduler": True, "cache_transceiver_config": { "backend": "nixl" } } - ctx_server_config["cache_transceiver_config"] - ctx_server_config["cache_transceiver_config"] disaggregated_server_config = {
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(9 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py
(1 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_h100.yml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/integration/test_lists/test-db/l0_h100.yml
- tests/integration/test_lists/qa/llm_function_sanity.txt
- tests/integration/test_lists/qa/llm_function_full.txt
- tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)
304-307
: Correct: add skip_less_device(2) for disaggregated auto_dtypeTwo servers (ctx + gen) each consume 1 GPU with default TP/PP=1, so requiring 2 devices is appropriate.
335-337
: Correct: add skip_less_device(2) for ngram speculative decodingThis test also spins ctx + gen servers (1 GPU each). The gating is consistent with the runtime needs.
384-389
: Correct: add skip_less_device(2) for EAGLE3Matches the one-ctx/one-gen server setup. Good call.
468-473
: Correct: function-level gating matches TP=4 (total 8 GPUs)You pass tensor_parallel_size=4 and run one ctx and one gen server -> 4 + 4 = 8 GPUs. The new device/memory/timeout markers are aligned with the workload.
509-512
: Correct: gating for nixl backend (DeepSeek V3 Lite)Requiring 2 devices and ≥60 GB GPU memory is reasonable for bf16 + disaggregated setup.
545-551
: Correct: adjust DeepSeek auto_dtype to 8 GPUsConsistent with tensor_parallel_size=4 across ctx/gen (4+4 GPUs).
592-596
: Correct: add skip_less_device(2) for Gemma3 1BMatches the 1 ctx + 1 gen server footprint.
681-685
: Correct: add skip_less_device(2) and place before parametrizeDecorator ordering is good: skip markers precede parametrize and avoid unnecessary param expansions on under-provisioned nodes.
/bot reuse-pipeline |
PR_Github #15584 [ reuse-pipeline ] triggered by Bot |
PR_Github #15584 [ reuse-pipeline ] completed with state |
@@ -695,25 +695,26 @@ def test_auto_dtype(self): | |||
|
|||
class TestMistralSmall24B(LlmapiAccuracyTestHarness): | |||
MODEL_NAME = "mistralai/Mistral-Small-3.1-24B-Instruct-2503" | |||
MODEL_PATH = f"{llm_models_root()}/Mistral-Small-3.1-24B-Instruct-2503" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why were these changes necessary? I understand they're achieving the same thing (well, except that the previous version was also checking quant_algo == None
for test_auto_dtype
.
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
Summary by CodeRabbit
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.