Skip to content

[None][fix] update skip config #6891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 18, 2025
Merged

Conversation

crazydemo
Copy link
Collaborator

@crazydemo crazydemo commented Aug 14, 2025

Summary by CodeRabbit

  • Tests
    • Expanded device- and memory-based gating to skip heavier tests on smaller hardware, reducing CI OOMs.
    • Moved several class-level skips to function-level for finer per-test control; adjusted and removed outdated gates and reordered decorators.
    • Added/updated targeted timeouts for long-running cases.
    • Consolidated parameterized model-paths, introduced a dedicated FP8 validation test, and updated test lists to use explicit test entries.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

📝 Walkthrough

Walkthrough

Move and adjust pytest skip/timeout markers across disaggregated-serving tests; add MODEL_PATH and split a parameterized Mistral test into explicit test_auto_dtype and new test_fp8; update QA and test-db lists to reference the explicit Mistral tests.

Changes

Cohort / File(s) Summary of Changes
Disaggregated serving tests
tests/integration/defs/accuracy/test_disaggregated_serving.py
Adjusted pytest markers: added @pytest.mark.skip_less_device(2) to multiple Llama3, Gemma3, Qwen3 tests; removed some per-test skip_less_device(2/4) gates; moved class-level skips to function-level for Llama4ScoutInstruct (now skip_less_device_memory(140000), timeout(3600), skip_less_device(8) on test_auto_dtype); DeepSeekV3Lite changes (test_nixl_backend: add skip_less_device(2) + skip_less_device_memory(60000); test_auto_dtype: skip_less_device(4)skip_less_device(8)); reordered decorators so skip_less_device appears before parametrization where applicable.
LLM PyTorch accuracy tests
tests/integration/defs/accuracy/test_llm_api_pytorch.py
In TestMistralSmall24B: add MODEL_PATH pointing to Mistral-Small-3.1-24B-Instruct-2503; convert test_auto_dtype from a parameterized model_path variant to a non-parameterized test using MODEL_PATH; add new test_fp8 that loads the FP8 variant and asserts QuantAlgo.FP8; removed previous path-parameterized permutations.
Test lists — QA & DB
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/test-db/l0_h100.yml
Replace path-parameterized TestMistralSmall24B entries with explicit entries: add accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_auto_dtype and ...::test_fp8; remove prior per-path variants. No other list changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

Suggested labels

Disaggregated Serving, CI

Suggested reviewers

  • litaotju
  • yilin-void
  • syuoni
  • chzblych
  • LarryXFly
  • StanleySun639

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

1-1: Add required NVIDIA copyright header

Per repo guidelines, all source files require the NVIDIA copyright header with the current year at the top.

Apply this diff:

+ # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
  # I want to create accuracy tests for disaggregated serving.
  # I need to to this by creating a new class that mimics LLM class. Instead of implementing the
  # actual methods it will send OAI requests to the disaggregated serving endpoint.
  # Please take a look at the existing test_llm_api_pytorch.py file for reference.
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

335-337: Consider adding a memory gate to ngram test

Given speculative decoding and KV cache settings, memory pressure can still be significant even with 2 GPUs. Add a memory threshold to avoid CI OOMs.

Apply this diff:

     @pytest.mark.skip_less_device(2)
+    @pytest.mark.skip_less_device_memory(32000)
     @skip_pre_hopper
     def test_ngram(self):

387-387: EAGLE3 test may also need memory gating

This path uses large max_num_tokens and batch sizes; consider a memory gate similar to DeepSeek to prevent OOM on smaller-memory GPUs.

Apply this diff:

 @skip_pre_hopper
 @parametrize_with_ids("overlap_scheduler", [True, False])
 @parametrize_with_ids("eagle3_one_model", [True, False])
 @pytest.mark.skip_less_device(2)
+@pytest.mark.skip_less_device_memory(60000)
 def test_eagle3(self, overlap_scheduler, eagle3_one_model):
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebb770 and 81f02d8.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)

305-305: LGTM: add per-test device gating for Llama3.1 8B auto_dtype

The device gate is appropriate and reduces flakiness on smaller runners.


468-471: LGTM: strengthen gating for Llama-4-Scout and add timeout

The combination of device and memory gates plus a per-test timeout is appropriate for this heavy model.


509-511: LGTM: add device and memory gating to DeepSeek V3 Lite nixl backend

Appropriate thresholds to avoid OOM and under-provisioned devices in CI.


548-548: LGTM: enforce 8-device requirement for DeepSeek V3 Lite auto_dtype

Matches the model’s parallelism expectations and reduces CI variance.


592-592: LGTM: add device gating for Gemma 3 1B auto_dtype

Reasonable lower bound for this test.


644-644: LGTM: add device gating for Qwen3 8B nixl backend

This will reduce flakiness on single-GPU runners.


682-684: LGTM: add Hopper and device gating for Qwen3 8B auto_dtype

Decorator order is fine and consistent with other tests.


304-306: No action required — custom pytest marks are already registered

The markers skip_less_device and skip_less_device_memory are defined in tests/integration/defs/pytest.ini (markers block), so PytestUnknownMarkWarning is not expected for tests in that folder.

  • File: tests/integration/defs/pytest.ini — contains marker registrations (skip_less_device, skip_less_device_memory, ...)

@crazydemo crazydemo marked this pull request as ready for review August 15, 2025 01:41
@crazydemo crazydemo requested a review from a team as a code owner August 15, 2025 01:41
@crazydemo crazydemo changed the title [None][test] update testing skip [None][fix] Update testing skip Aug 15, 2025
@crazydemo crazydemo changed the title [None][fix] Update testing skip [None][fix] update skip config Aug 15, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

51-54: Python 3.8 compatibility: avoid PEP 585 built-in generics

This annotation uses list[...] which requires Python 3.9+. Our guidelines target Python 3.8+, so prefer typing.List.

Apply this diff:

-        self.futures: list[concurrent.futures.Future[RequestOutput]] = []
+        self.futures: List[concurrent.futures.Future[RequestOutput]] = []
🧹 Nitpick comments (4)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4)

385-388: Decorator order nit: place skip marks above parametrize for consistency/readability

Keeping skip markers as the outermost decorators makes intent clearer and ensures consistent behavior across tests. Consider reordering here (mirroring other tests in this file).

Apply this diff to reorder:

-@skip_pre_hopper
-@parametrize_with_ids("overlap_scheduler", [True, False])
-@parametrize_with_ids("eagle3_one_model", [True, False])
-@pytest.mark.skip_less_device(2)
+@pytest.mark.skip_less_device(2)
+@skip_pre_hopper
+@parametrize_with_ids("overlap_scheduler", [True, False])
+@parametrize_with_ids("eagle3_one_model", [True, False])

545-549: Decorator order nit: move skip mark above parametrize for uniformity

Minor consistency/readability improvement. Also prevents any subtle mark/parametrize propagation surprises.

Apply this diff:

-@parametrize_with_ids("overlap_scheduler", [True, False])
-@parametrize_with_ids("mtp_nextn",
-                      [0, pytest.param(2, marks=skip_pre_hopper)])
-@pytest.mark.skip_less_device(8)
+@pytest.mark.skip_less_device(8)
+@parametrize_with_ids("overlap_scheduler", [True, False])
+@parametrize_with_ids("mtp_nextn",
+                      [0, pytest.param(2, marks=skip_pre_hopper)])

644-645: Remove redundant no-op lines; gating addition looks fine

  • The new skip_less_device(2) on the nixl test is appropriate.
  • Lines 658-659 contain two no-op dictionary lookups that should be removed.

Apply this diff to clean up:

-        ctx_server_config["cache_transceiver_config"]
-        ctx_server_config["cache_transceiver_config"]

Also applies to: 658-659


1-1: Missing NVIDIA copyright header

Per coding guidelines, prepend the current-year NVIDIA copyright header.

Apply this diff at the top of the file:

+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 81f02d8 and 7cd0ec0.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
🔇 Additional comments (6)
tests/integration/defs/accuracy/test_disaggregated_serving.py (6)

305-306: Good addition: enforce 2-GPU availability alongside memory gating

Adding skip_less_device(2) complements the existing memory guard and prevents noisy failures on undersized nodes.


335-337: Targeted gating looks correct (device count + pre-Hopper exclusion)

The combination of skip_less_device(2) and skip_pre_hopper is appropriate for ngram speculative decoding.


468-471: LGTM: strong gating for heavy model (mem + devices) with a timeout

The added memory gate, 8-GPU requirement, and a 1h timeout are appropriate for this class of test.


509-511: LGTM: gating for nixl backend is reasonable

The per-test device and memory guards should reduce flakes on constrained CI workers.


592-593: LGTM: add 2-GPU guard on Gemma 3 1B test

Appropriate gating for the scenario covered.


681-684: LGTM: decorator order improved

Placing skip_less_device(2) before parametrize (alongside skip_pre_hopper) is consistent and clear.

@crazydemo
Copy link
Collaborator Author

/bot run --gpu-type "H100_PCIe"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (2)

701-709: De-parameterized auto-dtype test is clean and consistent

Switching to a single-path test and covering CnnDailymail, MMLU, and GSM8K aligns with the rest of this file. Memory gating at 80G is appropriate.

If these runs occasionally hang under CI stress, consider adding a timeout mark similar to other long-running tests in this file.

 @pytest.mark.skip_less_device_memory(80000)
 def test_auto_dtype(self):
+    # Optional: avoid rare CI hangs
+    # @pytest.mark.timeout(7200)
     with LLM(self.MODEL_PATH) as llm:
         ...

710-716: FP8 coverage added — consider asserting KV cache quantization if applicable

Great to add a dedicated FP8 prequantized test with proper arch gating and memory guard. If the FP8 model ships FP8 KV cache too, asserting kv_cache_quant_algo can harden the test. If not guaranteed for this artifact, ignore.

 @skip_pre_ada
 @pytest.mark.skip_less_device_memory(80000)
 def test_fp8(self):
     model_path = f"{llm_models_root()}/Mistral-Small-3.1-24B-Instruct-2503-fp8"
     with LLM(model_path) as llm:
         assert llm.args.quant_config.quant_algo == QuantAlgo.FP8
+        # Optional: only if the artifact guarantees FP8 KV cache
+        # assert llm.args.quant_config.kv_cache_quant_algo == QuantAlgo.FP8
         task = CnnDailymail(self.MODEL_NAME)
         task.evaluate(llm)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7cd0ec0 and c4d4590.

📒 Files selected for processing (5)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (9 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/test_lists/qa/llm_function_sanity.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/test_lists/test-db/l0_h100.yml
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧬 Code Graph Analysis (1)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (5)
tensorrt_llm/llmapi/llm.py (1)
  • LLM (1111-1127)
tests/integration/defs/accuracy/accuracy_core.py (3)
  • CnnDailymail (202-219)
  • evaluate (146-199)
  • evaluate (678-688)
tests/integration/defs/accuracy/test_llm_api.py (4)
  • test_fp8 (123-130)
  • test_fp8 (185-191)
  • test_fp8 (199-204)
  • test_fp8 (271-276)
tensorrt_llm/llmapi/llm_args.py (2)
  • quant_config (2121-2124)
  • quant_config (2127-2128)
tensorrt_llm/quantization/mode.py (1)
  • QuantAlgo (23-44)
🔇 Additional comments (12)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

698-698: Centralizing MODEL_PATH for MistralSmall24B looks good

Defining MODEL_PATH on the class removes the need for param-based path passing and keeps tests consistent with other suites.

tests/integration/defs/accuracy/test_disaggregated_serving.py (9)

305-306: Device gating added to disaggregated Llama3.1-8B auto-dtype

Adding skip_less_device(2) reduces spurious CI failures on undersized runners.


335-337: Per-test device gating for NGram is appropriate

Localizing the skip to the test improves clarity and avoids over-gating the entire class.


387-388: EAGLE3 test: explicit device gating

This mirrors resource needs for speculative decoding; looks good.


468-471: Function-level gating for Scout Instruct auto-dtype is clearer

  • Moving memory/device gating and timeout onto the test itself is an improvement.
  • Using skip_less_device(8) matches the footprint of this model.

509-511: NIXL backend test: added device and memory guards

Good addition to stabilize CI; the backend variations can be memory-sensitive.


548-549: DeepSeekV3Lite auto-dtype now gated to 8 GPUs

Given the param sweep, this is a sensible constraint.


592-593: Gemma3 1B: added device gating

Fine-grained gating keeps the class accessible while protecting resource-heavy tests.


644-645: Qwen3 8B NIXL backend: added device gating

Consistent with other NIXL tests; good.


682-684: Decorator order: skip marks before parametrize

Placing skip decorators before parametrize avoids parametrizing tests that won’t run. Good call.

tests/integration/test_lists/qa/llm_function_full.txt (1)

457-458: Explicit MistralSmall24B entries added — verification passed

  • Replacing parameterized invocations with explicit tests and adding test_fp8 is correct.
  • I searched for lingering parameterized MistralSmall24B auto-dtype invocations and hardcoded fp8-variant strings:
    • Explicit entries found:
      • tests/integration/test_lists/qa/llm_function_full.txt — lines 457–458 (test_auto_dtype, test_fp8)
      • tests/integration/test_lists/qa/llm_function_sanity.txt — lines 49–50
      • tests/integration/test_lists/test-db/l0_h100.yml — lines 208–209
    • Definition and fp8 model paths in:
      • tests/integration/defs/accuracy/test_llm_api_pytorch.py (class TestMistralSmall24B and MODEL_PATH / -fp8 model_path)
    • No bracketed/parameterized occurrence of TestMistralSmall24B::test_auto_dtype[...] was found; other tests do use parameterized ::test_auto_dtype[...] but none reference TestMistralSmall24B.
    • No exact hardcoded pattern "Mistral-Small-3.1-24B-Instruct-2503-fp8-FP8" found; fp8 variants appear as expected (e.g., "-fp8" model paths) and test_fp8 entries are present across test lists.

Conclusion: changes are correct and there are no stale parameterized TestMistralSmall24B entries remaining.

tests/integration/test_lists/test-db/l0_h100.yml (1)

208-209: H100 post-merge: added MistralSmall24B explicit tests

These entries align with the refactor and ensure both BF16 (auto-dtype) and FP8 variants are exercised in post-merge.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15367 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15367 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #130 (Partly Tested) completed with status: 'SUCCESS'

@crazydemo
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15409 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15409 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #135 completed with status: 'SUCCESS'

Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Ivy Zhang <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/integration/defs/accuracy/test_disaggregated_serving.py (3)

304-307: Py38 typing and missing copyright header (repo guideline)

Unrelated to this hunk, but:

  • list[...] generics require Python 3.9+. Guidelines target 3.8+, so prefer typing.List[...] (or enable from future import annotations).
  • File is missing the standard NVIDIA copyright header.

Proposed fixes:

  • Switch to typing.List for the futures field (Line 53):
-        self.futures: list[concurrent.futures.Future[RequestOutput]] = []
+        self.futures: List[concurrent.futures.Future[RequestOutput]] = []
  • Add the repository’s standard NVIDIA header at the top of the file (use the canonical header used elsewhere in this repo).

I can auto-generate a patch for the header once you confirm the exact header stanza used in this repo (Apache-2.0 vs. internal variant).


468-473: Bound the health-check loop to avoid indefinite hangs

Unrelated to this hunk, but the health probe loop (Lines 193-201) can spin forever if servers fail to become healthy. Consider a total timeout aligned with --server_start_timeout and a per-request timeout to reduce flakiness.

Sketch:

# Replace the while True block with a bounded loop
deadline = time.time() + 3600  # match --server_start_timeout
while time.time() < deadline:
    time.sleep(1)
    try:
        print("Checking health endpoint")
        response = requests.get("http://localhost:8000/health", timeout=5)
        if response.status_code == 200:
            break
    except requests.exceptions.RequestException:
        continue
else:
    raise RuntimeError("Disaggregated serving health check timed out")

644-660: Remove no-op lines; they do nothing and may confuse future readers

These two lines are redundant:

  • ctx_server_config["cache_transceiver_config"]
  • ctx_server_config["cache_transceiver_config"]

Apply:

@@
     def test_nixl_backend(self):
@@
         gen_server_config = {
             "disable_overlap_scheduler": True,
             "cache_transceiver_config": {
                 "backend": "nixl"
             }
         }
-        ctx_server_config["cache_transceiver_config"]
-        ctx_server_config["cache_transceiver_config"]
         disaggregated_server_config = {
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c94b60f and 569a025.

📒 Files selected for processing (5)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (9 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/integration/test_lists/test-db/l0_h100.yml
  • tests/integration/test_lists/qa/llm_function_sanity.txt
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)

304-307: Correct: add skip_less_device(2) for disaggregated auto_dtype

Two servers (ctx + gen) each consume 1 GPU with default TP/PP=1, so requiring 2 devices is appropriate.


335-337: Correct: add skip_less_device(2) for ngram speculative decoding

This test also spins ctx + gen servers (1 GPU each). The gating is consistent with the runtime needs.


384-389: Correct: add skip_less_device(2) for EAGLE3

Matches the one-ctx/one-gen server setup. Good call.


468-473: Correct: function-level gating matches TP=4 (total 8 GPUs)

You pass tensor_parallel_size=4 and run one ctx and one gen server -> 4 + 4 = 8 GPUs. The new device/memory/timeout markers are aligned with the workload.


509-512: Correct: gating for nixl backend (DeepSeek V3 Lite)

Requiring 2 devices and ≥60 GB GPU memory is reasonable for bf16 + disaggregated setup.


545-551: Correct: adjust DeepSeek auto_dtype to 8 GPUs

Consistent with tensor_parallel_size=4 across ctx/gen (4+4 GPUs).


592-596: Correct: add skip_less_device(2) for Gemma3 1B

Matches the 1 ctx + 1 gen server footprint.


681-685: Correct: add skip_less_device(2) and place before parametrize

Decorator ordering is good: skip markers precede parametrize and avoid unnecessary param expansions on under-provisioned nodes.

@crazydemo
Copy link
Collaborator Author

/bot reuse-pipeline

@crazydemo crazydemo enabled auto-merge (squash) August 18, 2025 05:24
@tensorrt-cicd
Copy link
Collaborator

PR_Github #15584 [ reuse-pipeline ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15584 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #15409 for commit 569a025

@crazydemo crazydemo merged commit 055fdd9 into NVIDIA:release/1.0 Aug 18, 2025
5 checks passed
@@ -695,25 +695,26 @@ def test_auto_dtype(self):

class TestMistralSmall24B(LlmapiAccuracyTestHarness):
MODEL_NAME = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
MODEL_PATH = f"{llm_models_root()}/Mistral-Small-3.1-24B-Instruct-2503"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why were these changes necessary? I understand they're achieving the same thing (well, except that the previous version was also checking quant_algo == None for test_auto_dtype.

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 22, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 22, 2025
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Wangshanshan <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 22, 2025
Signed-off-by: Ivy Zhang <[email protected]>
Signed-off-by: Wangshanshan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants