Skip to content

Conversation

xinhe-nv
Copy link
Collaborator

@xinhe-nv xinhe-nv commented Aug 22, 2025

Summary by CodeRabbit

  • Tests
    • Expanded sanity coverage across DeepSeek, Llama, Qwen and related families with new precision/back-end variants (FP8, NVFP4, W4W4, bf16) and additional nixl/UCX entries.
    • Added multimodal end-to-end and quickstart scenarios (image True/False) and updated Gemma coverage.
    • Introduced new n-gram, TP/PP, mixed-ctx and other model variants; added TRT-LLM sampler and richer disaggregated/load-balance multi-GPU/backend tests.
    • Reorganized, reclassified and migrated many tests across suites.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@xinhe-nv xinhe-nv requested a review from LarryXFly August 22, 2025 08:27
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

📝 Walkthrough

Walkthrough

Reorganized and expanded the LLM sanity test manifest at tests/integration/test_lists/qa/llm_function_sanity.txt, adding many new model/backend variants (DeepSeekV3Lite, Llama families, Qwen, multimodal), nixl/disaggregated entries, NVFP4/FP8 variants, and relocating several disaggregated/e2e entries.

Changes

Cohort / File(s) Summary of Changes
Manifest reorg & variant additions
tests/integration/test_lists/qa/llm_function_sanity.txt
Major rewrite: reordered and reclassified test entries; added many new model/backend permutations (DeepSeekV3Lite, Llama3/Llama4/Eagle3, Qwen), NVFP4/FP8/FP4/NVFP variants, nixl_backend/ucx/disaggregated variants, new end-to-end multimodal quickstarts, and new disaggregated sampler/load-balance entries; removed or relocated several prior combinations.

Sequence Diagram(s)

(omitted — changes are manifest/test-list edits without control-flow feature changes)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • LarryXFly
  • litaotju
  • StanleySun639

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@xinhe-nv xinhe-nv enabled auto-merge (squash) August 22, 2025 08:27
@xinhe-nv xinhe-nv force-pushed the user/xinhe/gpt-oss branch from 25f3ed0 to b21ad00 Compare August 22, 2025 08:27
@xinhe-nv xinhe-nv changed the title [None][feat]add gpt-osss tests to sanity list [None][feat] add gpt-osss tests to sanity list Aug 22, 2025
@xinhe-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16161 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d94cc3f and b21ad00.

📒 Files selected for processing (1)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_sanity.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/integration/test_lists/qa/llm_function_sanity.txt (4)

154-159: Good move: covering disaggregated UCX and TRTLLM sampler in sanity.

Adding test_disaggregated_deepseek_v3_lite_fp8_ucx and test_disaggregated_trtllm_sampler improves early signal on transport and sampler regressions.


137-143: Naming consistency confirmed for W4A16 and W4A8 Qwen tests

I’ve checked the test_w4a16_mxfp4 and test_w4a8_mxfp4 definitions in tests/integration/defs/accuracy/test_llm_api_pytorch.py and their ids= decorators:

  • test_w4a16_mxfp4 is parameterized with ids=["latency-TRTLLM"], matching the [latency-TRTLLM] entry in the sanity list (Line 137).
  • In TestQwen3_30B_A3B, test_w4a8_mxfp4 uses a two‐stage parametrization:
    • Backends: ["CUTLASS","TRITON","TRTLLM"] for the static-fp8 path and ["CUTLASS","TRTLLM"] for the MXFP8 path, producing IDs like "latency-CUTLASS", "latency-TRITON", and "latency-TRTLLM".
    • Activation types ["static_fp8","mxfp8"] with ids=["fp8","mxfp8"].
      These combine to exactly the five entries shown:
      • [fp8-latency-CUTLASS]
      • [fp8-latency-TRITON]
      • [fp8-latency-TRTLLM]
      • [mxfp8-latency-CUTLASS]
      • [mxfp8-latency-TRTLLM]
        (Lines 138–142 in the sanity list.)

Everything lines up with the fixture expectations—no mismatches in “mxfp8” vs “fp8” or backend casing. No further action needed.


181-182: Multimodal Gemma quickstarts added to sanity — confirm asset availability/skip markers

Our inspection of tests/integration/defs/test_e2e.py shows that the gemma-3-27b-it parameters for test_ptp_quickstart_multimodal do not have any skip/xfail guards for missing model artifacts (we only see memory‐based skips such as skip_less_device_memory(80000)).

Please verify one of the following:

  • Your CI environment guarantees that the gemma-3-27b-it artifacts live at $LLM_MODELS_ROOT/gemma/gemma-3-27b-it (so the quickstart can pull them at runtime),
    OR
  • Add a skip marker around the pytest.param("gemma-3-27b-it", …) entry in test_ptp_quickstart_multimodal—for example:
     @pytest.mark.parametrize("model_name,model_path", [
         # …
    -    ("gemma-3-27b-it", "gemma/gemma-3-27b-it"),
    +    pytest.param(
    +        "gemma-3-27b-it",
    +        "gemma/gemma-3-27b-it",
    +        marks=pytest.mark.skipif(
    +            not os.path.isdir(os.path.join(llm_models_root(), "gemma/gemma-3-27b-it")),
    +            reason="Gemma-3-27B-IT artifacts unavailable"
    +        )
    +    ),
         # …
     ])
     def test_ptp_quickstart_multimodal(llm_root, llm_venv, …):

This will ensure the sanity suite only runs the Gemma multimodal tests when those assets are present.


48-61: Verified — GPTOSS test parametrizations match the fixtures

I inspected tests/integration/defs/accuracy/test_llm_api_pytorch.py and confirmed the added GPT-OSS tests use the expected parameter names and enum values (no case mismatches).

Key locations:

  • tests/integration/defs/accuracy/test_llm_api_pytorch.py — class TestGPTOSS (around line 2680).
    • test_w4_1gpu parametrization: lines ~2692–2697 (moe_backend=["CUTLASS","TRTLLM","TRITON"], ids=["cutlass","trtllm","triton"]; cuda_graph/overlap_scheduler set to (True, True)).
    • test_w4_4gpus parametrization: lines ~2720–2726 (same moe_backend values; tp/pp/ep combos with ids like "tp4", "ep4", "dp4").
    • test_w4a16 signature/params: lines ~2753–2759 (presence confirmed).

Conclusion: enums and parameter names are correct as implemented; no changes required before merge.

@xinhe-nv xinhe-nv force-pushed the user/xinhe/gpt-oss branch from b21ad00 to 98efcf7 Compare August 22, 2025 09:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/integration/test_lists/qa/llm_function_sanity.txt (2)

1-201: Blocking: All entries appear to reference files outside the repo-visible paths (need prefix correction).

Previous verification already showed unresolved paths for every entry. Unless the repo’s test discovery intentionally runs from tests/integration/, you likely need to prefix most paths:

  • accuracy/... → tests/integration/accuracy/...
  • disaggregated/... → tests/integration/disaggregated/...
  • test_e2e.py → tests/integration/test_e2e.py

Fix paths first; otherwise, the sanity stage will schedule 0 runnable tests or fail fast during collection.

Run this to auto-rewrite and re-verify:

#!/bin/bash
set -euo pipefail
f=tests/integration/test_lists/qa/llm_function_sanity.txt

# 1) Normalize paths
tmp=$(mktemp)
sed -E \
  -e 's@^accuracy/@tests/integration/accuracy/@' \
  -e 's@^disaggregated/@tests/integration/disaggregated/@' \
  -e 's@^test_e2e\.py@tests/integration/test_e2e.py@' \
  "$f" > "$tmp" && mv "$tmp" "$f"

# 2) Check file/class/test existence (ignoring param brackets)
missing=0
while IFS= read -r line || [[ -n "$line" ]]; do
  [[ -z "$line" || "$line" =~ ^# ]] && continue
  path="${line%%::*}"
  rest="${line#*::}"
  class="${rest%%::*}"
  test_with_params="${rest#*::}"
  test="${test_with_params%%[*}"

  if [[ ! -f "$path" ]]; then
    echo "MISSING FILE: $path  <-- $line"
    missing=1; continue
  fi
  if ! rg -nP "^\s*class\s+${class}\b" "$path" >/dev/null; then
    echo "MISSING CLASS: $class in $path  <-- $line"
    missing=1
  fi
  if ! rg -nP "^\s*def\s+${test}\b" "$path" >/dev/null; then
    echo "MISSING TEST: $test in $path  <-- $line"
    missing=1
  fi
done < "$f"

if [[ $missing -eq 1 ]]; then
  echo "One or more entries are unresolved." >&2
  exit 1
else
  echo "All sanity entries resolve to existing classes/tests."
fi

88-91: Remove duplicated manifest entries to avoid double scheduling.

These lines are exact duplicates of neighbors and will run twice:

  • Lines 88–91: TestLlama4MaverickInstruct::test_chunked_prefill for FLASHINFER and TRTLLM
  • Lines 114–117: TestMinistral8BInstruct::test_auto_dtype and ::test_fp8
  • Lines 146–147: disaggregated_cache_aware_balance for TinyLlama

Delete the second occurrence of each.

Apply:

--- a/tests/integration/test_lists/qa/llm_function_sanity.txt
+++ b/tests/integration/test_lists/qa/llm_function_sanity.txt
@@ -88,6 +88,4 @@
 accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
 accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]

@@ -114,4 +112,2 @@
 accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
 accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8
-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8

@@ -146,2 +142,1 @@
 disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]
-disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]

To scan for any stragglers:
awk '{c[$0]++; if(c[$0]==2) print "DUP", NR, $0}' tests/integration/test_lists/qa/llm_function_sanity.txt

Also applies to: 114-117, 146-147

🧹 Nitpick comments (10)
tests/integration/test_lists/qa/llm_function_sanity.txt (10)

18-26: Llama3.1 8B disaggregated: heavy tp/pp/eagle/ngram matrix may not be “sanity”.

This block explodes combinations across GSM8K/MMLU and tp/pp/eagle/ngram. Consider sampling 1–2 reps and moving the rest to extended suites to keep sanity time bounded.

I can propose a minimal subset that still exercises each axis (tp, pp, eagle, ngram) with ≤4 lines.


40-42: DeepSeekV3Lite BF16/FP8/NVFP4 variants: multi-GPU and block-scale paths are expensive.

These add 4-GPU and block-scale quant configs. If sanity budget is tight, prefer single-GPU BF16 + one NVFP4 variant, and leave 4-GPU/block-scales for throughput/accuracy stages.

Please confirm whether the sanity stage maps to 4-GPU nodes; if not, these will fail or be silently skipped by capacity.

Also applies to: 45-46


48-61: New GPT-OSS W4/W4A16 grid: strong coverage, but trim for sanity.

Nine 4-GPU grid points (dp/ep/tp × CUTLASS/TRITON/TRTLLM) + single-GPU tri-backend entries will balloon runtime. Keep a representative backend per parallelism mode in sanity.

If you want, I’ll provide a reduced set: {1×W4 single-GPU, 1×tp4, 1×ep4, 1×dp4} with alternating backends.


78-79: eagle3_tp8 on 70B: extremely heavy for a sanity lane.

tp8 on 70B typically requires large nodes and long warmup. Recommend moving to an extended/throughput lane; keep a tp4 smoke if needed.


85-101: Llama4MaverickInstruct TP8/EP8 grids: likely to exceed sanity SLAs.

Multiple tp8/ep8 combinations plus fp8 and chunked-prefill—great coverage, but risky for the short sanity lane. Suggest:

  • Keep one tp4 and one tp8 smoke (no ep).
  • Keep exactly one fp8 case.
  • Move the rest to accuracy/throughput.

Confirm target runtime budget for sanity on your CI (minutes per PR).


105-113: Llama4ScoutInstruct: validate FP4 and FP8 combos; ensure artifacts exist.

  • test_fp4_chunked_prefill and test_fp4 appear; confirm both are supported in Scout’s test module and artifacts are available for tp8ep8.
  • Similar trimming advice as above if sanity runtime is a concern.

120-120: MistralSmall24B added to sanity: sanity budget check.

24B auto-dtype is fine, but if nodes are constrained, consider keeping 7B/8B families in sanity and moving 24B up-stack.


124-124: Phi4MiniInstruct: ensure artifacts are in cache for sanity pool.

Phi variants sometimes fetch larger weights; cache misses will spike runtime. Warm cache or swap to a tiny variant for sanity.


181-182: Multimodal Gemma3 27B additions: verify data assets and CI node types.

These tests often need image/video sample assets and sometimes larger GPU mem. Confirm assets are packaged and nodes meet requirements in the sanity pool.


1-201: Right-size the sanity manifest; propose split into sanity vs extended sanity.

Given the volume of multi-GPU (tp8/ep8), block-scale, overlap-scheduler, and disaggregated transports, consider:

  • Keep 1–2 per family/back-end axis in sanity.
  • Move the rest into a new tests/integration/test_lists/qa/llm_function_extended.txt and add a CI stage (or reuse existing “accuracy/throughput”) triggered via /bot run --stage-list.

This keeps PR turnaround fast while preserving coverage elsewhere.

If you want, I’ll draft a trimmed sanity list plus an extended list derived from this manifest.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b21ad00 and 98efcf7.

📒 Files selected for processing (1)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_sanity.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/integration/test_lists/qa/llm_function_sanity.txt (6)

2-2: DeepSeekV3Lite: verify scheduler/backends are supported in sanity environment.

  • overlap_scheduler=True entries can shift runtime; ensure the runner nodes enable and exercise overlap logic.
  • test_nixl_backend requires NIXL plumbing; gate or skip if the sanity pool lacks NIXL.

Would you like a list-level marker or pytest skipif to avoid NIXL on non-supported agents?

Also applies to: 4-5


68-68: Confirm presence of LLM sampler test.

Ensure TestLlama3_1_8BInstruct::test_fp8_llm_sampler exists and isn’t an artifact of a renamed test (common: test_fp8_sampler or test_llm_sampler_fp8).

I can extend the verification script to assert exact method names for these specific classes only.


114-117: Ministral vs Mistral: verify class naming is intentional.

If the class is actually TestMistral8BInstruct, these entries will not resolve. If “Ministral” is a deliberate internal model, ignore.

I can auto-check for both spellings and report which exists.


137-137: Qwen3_30B_A3B “w4a8_mxfp4” entries param mismatch suspicion.

Function name suggests “mxfp4” but params include mxfp8/fp8 in brackets:

  • test_w4a16_mxfp4[latency-TRTLLM] is fine.
  • test_w4a8_mxfp4[mxfp8-…] seems contradictory.

If the bracket arg is a separate dimension (e.g., kv dtype), it’s OK. Otherwise, this may select the wrong config.

I can grep the test’s param ids to confirm the intended tuples.

Also applies to: 142-142


154-154: UCX/NIXL disaggregated variants: gate by environment.

test_disaggregated_deepseek_v3_lite_fp8_ucx and _nixl should be conditionally included only when the agent supports these transports; otherwise they’ll fail/skip in sanity.

I can add a tiny wrapper list or mark to toggle these via /bot run --stage-list or a pytest skipif.


159-159: New TRT-LLM sampler test: confirm stability in sanity.

Samplers can be flaky with aggressive settings; ensure it’s already in passing state to avoid PR noise.

@xinhe-nv xinhe-nv force-pushed the user/xinhe/gpt-oss branch from 98efcf7 to e1dbb54 Compare August 22, 2025 11:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
tests/integration/test_lists/qa/llm_function_sanity.txt (4)

1-6: Confirm path resolution for all manifest entries before merge

Past validation indicated these relative paths did not resolve to real files/classes/methods. Please re-run a resolver and fix any misses (e.g., missing tests/integration/ prefix or moved files) before merging.

Run this script from repo root to verify file/class/test existence and catch whitespace/dup issues:

#!/bin/bash
set -euo pipefail
file="tests/integration/test_lists/qa/llm_function_sanity.txt"
missing=0

while IFS= read -r line || [[ -n "$line" ]]; do
  [[ -z "$line" || "$line" =~ ^# ]] && continue
  line="${line%%[[:space:]]}"           # rstrip
  path="${line%%::*}"
  rest="${line#*::}"
  class="${rest%%::*}"
  test_with_params="${rest#*::}"
  test="${test_with_params%%[*}"

  if [[ ! -f "$path" ]]; then
    echo "MISSING FILE: $path  <--  $line"
    missing=1; continue
  fi
  if ! rg -nP "^\s*class\s+${class}\b" "$path" >/dev/null; then
    echo "MISSING CLASS: $class in $path  <--  $line"
    missing=1
  fi
  if ! rg -nP "^\s*def\s+${test}\b" "$path" >/dev/null; then
    echo "MISSING TEST: $test in $path  <--  $line"
    missing=1
  fi
done < "$file"

if [[ $missing -eq 1 ]]; then
  echo "✖ One or more entries unresolved." >&2; exit 1
else
  echo "✓ All entries resolve."
fi

85-101: Remove duplicated Llama4Maverick chunked_prefill entries

These two lines duplicate existing entries and will double-run:

  • Line 88 duplicates Line 89 (FLASHINFER)
  • Line 91 duplicates Line 90 (TRTLLM)

Apply this minimal diff:

-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]

After editing, run:
awk '{cnt[$0]++; if (cnt[$0]==2) printf("DUP:%6d %s\n", NR, $0)}' tests/integration/test_lists/qa/llm_function_sanity.txt | sort


114-117: Drop duplicate Ministral8BInstruct entries

  • Line 114 duplicates Line 115 (auto_dtype)
  • Line 117 duplicates Line 116 (fp8)

Apply:

-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8

Also, confirm the class name spelling “Ministral” is intentional (there are also “Mistral” classes in this file).


146-147: Remove duplicate disaggregated_cache_aware_balance entry

Line 146 duplicates Line 147. Remove one occurrence to avoid double-run.

-disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]
🧹 Nitpick comments (7)
tests/integration/test_lists/qa/llm_function_sanity.txt (7)

18-26: Large new Llama3.1-8B disaggregated matrix—consider trimming for sanity stage

EAGLE3, NGRAM, and multiple tp/pp symmetric variants can materially increase runtime. For sanity, prefer 1-2 representative configs and move the rest to extended/accuracy suites.


78-79: EAGLE3 tp8 for 70B will extend runtime considerably

For the sanity stage, consider deferring tp8 variants to throughput/extended. Keep a single tp4 representative here.


85-101: Many new Maverick FP8/auto-dtype variants—trim for sanity

Consider keeping a minimal subset (e.g., one tp4 no-cudagraph, one tp8 with cudagraph, one fp8 eagle3) and move the grid to accuracy/throughput.


105-113: Scout FP4/FP8 grid likely heavy for sanity

Same suggestion: keep 1–2 representatives (e.g., fp4 tp4ep4 with cudagraph, fp8 tp8ep8) and relocate the remainder.


128-128: Qwen3 235B NVFP4 4-GPU eagle3/moe: too heavy for sanity

Recommend moving to extended/perf; keep a lighter NVFP4 single-node variant in sanity.


181-182: New multimodal Gemma e2e entries—check asset availability and runtime

Confirm image/video fixtures are small and bundled/cached; otherwise these can bloat sanity stage. If large, move to multimodal-quickstart or extended sanity.


1-201: Automate duplicate and resolution checks in CI to prevent regressions

Consider adding a lightweight presubmit that fails on unresolved entries and duplicate lines for all test-list manifests.

Here’s a simple guard you can drop under a repo script and call from CI:

#!/bin/bash
set -euo pipefail
rc=0
for f in tests/integration/test_lists/**/*.txt; do
  echo "Checking $f"
  # duplicates
  if awk '{cnt[$0]++} END{for (l in cnt) if (cnt[l]>1 && l !~ /^#/ && l !~ /^[[:space:]]*$/) {print "DUP: " cnt[l] "x  " l; exit 1}}' "$f"; then :; else rc=1; fi
  # resolution
  while IFS= read -r line || [[ -n "$line" ]]; do
    [[ -z "$line" || "$line" =~ ^# ]] && continue
    path="${line%%::*}"; rest="${line#*::}"; cls="${rest%%::*}"; test="${rest#*::}"; test="${test%%[*}"
    [[ -f "$path" ]] || { echo "MISSING FILE in $f: $path"; rc=1; continue; }
    rg -nP "^\s*class\s+${cls}\b" "$path" >/dev/null || { echo "MISSING CLASS in $f: $cls @ $path"; rc=1; }
    rg -nP "^\s*def\s+${test}\b" "$path" >/dev/null || { echo "MISSING TEST in $f: $test @ $path"; rc=1; }
  done < "$f"
done
exit $rc
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 98efcf7 and e1dbb54.

📒 Files selected for processing (1)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_sanity.txt
🔇 Additional comments (9)
tests/integration/test_lists/qa/llm_function_sanity.txt (9)

30-30: nixl backend addition: confirm infra readiness

Make sure the CI agents have nixl runtime and required env vars; otherwise these will be skipped or flake.


40-46: DeepSeekV3Lite additions span FP8/NVFP4/bfloat16 with various toggles—validate param names

Please verify all parametrization keys (attention_dp, cuda_graph, torch_compile, overlap_scheduler, fp8kv, moe_backend, mtp_nextn/mtp) match the pytest parameter names to avoid silent deselection.


48-60: Nice: GPT-OSS coverage added across 1-GPU and 4-GPU backends

These hit the PR objective directly and look well-scoped (w4, w4a16). LGTM from manifest perspective.


68-68: LLM sampler in FP8: ensure determinism/tolerance fit sanity

Sampler randomness can cause flakes in short sanity. Pin seeds and/or use tolerant asserts in the target test if not already.


120-120: Add Mistral-Small-24B auto-dtype to sanity—looks good

Reasonable lightweight coverage. No further action.


124-124: Phi4MiniInstruct auto-dtype addition—LGTM

Adds breadth without much runtime impact.


137-142: W4A16/W4A8-MXFP4 TRTLLM combos—verify backend support and artifacts

Please ensure quantized artifacts are available in CI cache and TRT-LLM backend knobs are valid for these exact tuples.


154-154: UCX variant: confirm CI UCX stack and network fabric are present

UCX tests can be environment-sensitive. Verify UCX packages, env flags (e.g., UCX_TLS), and runner permissions are set on sanity lanes.


159-159: TRTLLM sampler sanity: ensure seed/control to avoid flakiness

If assertions compare token sequences, enforce deterministic seeds or compare distributions rather than exact strings.

Signed-off-by: Xin He (SW-GPU) <[email protected]>
@xinhe-nv xinhe-nv force-pushed the user/xinhe/gpt-oss branch from e1dbb54 to 20a9525 Compare August 25, 2025 01:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
tests/integration/test_lists/qa/llm_function_sanity.txt (4)

1-201: All entries likely unresolved: paths appear to miss the tests/integration/ prefix

Based on repo conventions, test files typically live under tests/integration/**. Entries like accuracy/..., disaggregated/..., and test_e2e.py will not resolve unless the path prefix is included. Please verify and auto-fix.

Run this verification + auto-fix helper. It first checks resolution, then offers a preview rewrite that prepends tests/integration/ for any entry not already under tests/:

#!/bin/bash
set -euo pipefail
list="tests/integration/test_lists/qa/llm_function_sanity.txt"

echo "== Checking resolution =="
missing=0
while IFS= read -r line || [[ -n "$line" ]]; do
  [[ -z "$line" || "$line" =~ ^# ]] && continue
  path="${line%%::*}"
  rest="${line#*::}"; class="${rest%%::*}"; test_with_params="${rest#*::}"; test="${test_with_params%%[*}"
  if [[ ! -f "$path" ]]; then
    echo "MISSING FILE: $path  <-- $line"
    missing=1
  else
    # Class and test definitions
    if ! rg -nP "^\s*class\s+${class}\b" "$path" >/dev/null; then
      echo "MISSING CLASS: $class in $path  <-- $line"; missing=1
    fi
    if ! rg -nP "^\s*def\s+${test}\b" "$path" >/dev/null; then
      echo "MISSING TEST: $test in $path  <-- $line"; missing=1
    fi
  fi
done < "$list"
[[ $missing -eq 0 ]] && echo "All entries resolve."

echo -e "\n== Preview rewrite (no files changed) =="
awk '{
  if ($0 ~ /^[[:space:]]*#/ || $0 ~ /^[[:space:]]*$/) {print; next}
  split($0, a, "::"); p=a[1]
  if (p !~ /^tests\//) {print "tests/integration/" $0} else {print $0}
}' "$list" | head -n 60

# To apply rewrite (make a backup first), uncomment:
# cp "$list" "$list.bak"
# awk '{
#   if ($0 ~ /^[[:space:]]*#/ || $0 ~ /^[[:space:]]*$/) {print; next}
#   split($0, a, "::"); p=a[1]
#   if (p !~ /^tests\//) {print "tests/integration/" $0} else {print $0}
# }' "$list" > "$list.tmp" && mv "$list.tmp" "$list"

88-91: Duplicate Llama4MaverickInstruct::test_chunked_prefill entries (FLASHINFER, TRTLLM)

These lines appear twice and will double-schedule in sanity.

Apply this diff to drop the second occurrences:

--- a/tests/integration/test_lists/qa/llm_function_sanity.txt
+++ b/tests/integration/test_lists/qa/llm_function_sanity.txt
@@ -88,7 +88,6 @@
 accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
 accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=FLASHINFER]
-accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]
+accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]
 accuracy/test_llm_api_pytorch.py::TestLlama4MaverickInstruct::test_chunked_prefill[attn_backend=TRTLLM]

114-117: Duplicate Ministral8BInstruct auto_dtype and fp8 entries

Two duplicates were introduced; remove the extras to avoid redundant runs.

--- a/tests/integration/test_lists/qa/llm_function_sanity.txt
+++ b/tests/integration/test_lists/qa/llm_function_sanity.txt
@@ -114,7 +114,5 @@
 accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
 accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_auto_dtype
 accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8
-accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8

146-147: Duplicate disaggregated_cache_aware_balance entry

This test is listed twice.

--- a/tests/integration/test_lists/qa/llm_function_sanity.txt
+++ b/tests/integration/test_lists/qa/llm_function_sanity.txt
@@ -146,7 +146,6 @@
 disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]
-disaggregated/test_disaggregated.py::test_disaggregated_cache_aware_balance[TinyLlama-1.1B-Chat-v1.0]
🧹 Nitpick comments (6)
tests/integration/test_lists/qa/llm_function_sanity.txt (6)

5-5: NIXL/UCX backends: ensure environment guards exist or gate by stage

These entries can fail on CI hosts lacking NIXL/UCX runtime. Ensure the underlying tests have pytest.skipif guards or only include them in a stage that provisions those components.

Also applies to: 30-30, 154-154


95-101: Large FP8/FP4 matrices added for Llama4 families; prefer representative set in sanity

The exhaustive tp/ep/cuda_graph/tc combinations will inflate runtime. Keep one per family per backend in sanity; move the matrix to accuracy/throughput suites.

Also applies to: 108-113


181-182: New multimodal quickstarts: confirm asset availability and caching

E2E multimodal tests often fetch media assets and models. Please confirm they use cached sample data and models to avoid network dependency in sanity.


18-26: Nice addition: eagle3/ngram/tp_pp_symmetric coverage on Llama3.1 8B

Good targeted coverage; please keep only one symmetric tp/pp combo per benchmark in sanity to stay lean.

Also applies to: 65-73


1-201: Final hygiene: detect any remaining duplicates automatically in CI

Let’s add a quick CI check to fail on duplicates in sanity manifests to prevent regressions.

Drop this script under tools/ci/check_no_dups.sh and wire it in your sanity job:

#!/bin/bash
set -euo pipefail
rc=0
for f in tests/integration/test_lists/qa/*.txt; do
  dups=$(awk '{cnt[$0]++} END {for (k in cnt) if (cnt[k]>1 && k !~ /^[[:space:]]*#/ && length(k)>0) print cnt[k] "x " k}' "$f")
  if [[ -n "$dups" ]]; then
    echo "Duplicate entries in $f:"
    echo "$dups"
    rc=1
  fi
done
exit $rc

40-46: Trim down sanity-stage tests for faster CI cycles

Running the provided helper script uncovered 106 “heavy” entries in tests/integration/test_lists/qa/llm_function_sanity.txt matching the multi-GPU/backend patterns (tp4/tp8, ep4/ep8, 4gpus, eagle3, nvfp4, fp8, block_scale, ucx, nixl). To keep the short sanity lane fast and representative:

  • Pick a single, minimal variant per model/backend (e.g. default dtype on 1 GPU) and move all other heavy variants to an extended-sanity list.
  • Create tests/integration/test_lists/qa/llm_function_extended_sanity.txt and relocate tests such as:
    • TestDeepSeekV3Lite’s fp8_block_scales_4gpus_static_eplb, nvfp4_4gpus_online_eplb[…]
    • All tp4/tp8 and ep4/ep8 variants across LLama3, LLama4, Qwen, Mixtral, etc.
    • Eagle3, NIXL, UCX backends in accuracy/test_disaggregated_serving and elsewhere.
  • You can re-use this command to list candidates anytime:
    awk 'BEGIN{IGNORECASE=1}
        /tp[48]|ep[48]|4gpus|eagle3|nvfp4|fp8|block_scale|ucx|nixl/ {
          printf("%5d  %s\n", NR, $0)
        }' tests/integration/test_lists/qa/llm_function_sanity.txt

Also applies to other sections of the file (e.g. lines 78–87, 105–113, 120, 128, 137–142, 154, 159, 181–182).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1dbb54 and 20a9525.

📒 Files selected for processing (1)
  • tests/integration/test_lists/qa/llm_function_sanity.txt (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_sanity.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/test_lists/qa/llm_function_sanity.txt (2)

48-61: Audit Complete: No Duplicates & Parameter Matrix Is Correct

  • All 13 TestGPTOSS entries on lines 48–60 are unique—no exact duplicates detected.
  • Coverage aligns with the intended matrix:
    • W4 1-GPU: CUTLASS, TRITON, TRTLLM (True-True)
    • W4 4-GPU: dp4/ep4/tp4 × CUTLASS, TRITON, TRTLLM
    • W4A16: dp4
  • Sanity slice remains minimal and balanced across backends.

No further changes needed.


137-142: Param/value mismatch: test_w4a8_mxfp4[...] shows mxfp8 param — please verify

Entries like test_w4a8_mxfp4[mxfp8-...] mix the function name “mxfp4” with the param value mxfp8. Please confirm whether mxfp8 is a valid/intentional dtype id for this test; if not, either fix the param value or rename the test (e.g. test_w4a8_mx[...]) to avoid confusion.

Files/locations to check:

  • tests/integration/test_lists/qa/llm_function_sanity.txt — lines ~137-142 (listed test entries).
  • accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_w4a8_mxfp4 — verify pytest parametrization/ids.
  • tests/unittest/_torch/thop/test_w4a8_mxfp4_mxfp8_gemm.py — contains related test names (ensure naming/ids consistent).

Quick check command (example): rg -n "mxfp8|mxfp4|test_w4a8_mxfp4" -S tests

@LarryXFly LarryXFly disabled auto-merge August 25, 2025 02:22
@LarryXFly LarryXFly merged commit 3ba9afc into NVIDIA:main Aug 25, 2025
3 checks passed
@xinhe-nv xinhe-nv deleted the user/xinhe/gpt-oss branch August 25, 2025 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants