Skip to content

Conversation

Funatiq
Copy link
Collaborator

@Funatiq Funatiq commented Aug 5, 2025

Summary by CodeRabbit

  • New Features

    • Optional FlashInfer integration added with a central availability flag, PDL toggle via env var, and clearer install/error messaging; optimized ops exposed when available.
  • Tests

    • PyTorch test suites reorganized from broad filters to more granular, explicit listings; several test paths corrected and per-suite durations updated.
    • A sampler test simplified to use a fixed configuration instead of external JSON.
  • Chores

    • Updated test ownership mappings to adjust PR review routing.

Description

  • Move all unit test files to sub-directories.
  • Update YAML files to run separate CI job for each unit test directory.
  • Update test durations to ensure more even test distribution.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

📝 Walkthrough

Walkthrough

Refactored FlashInfer availability into a new flashinfer_utils module and switched imports to it across attention, custom_ops, and modules; added guarded FlashInfer wrappers; updated multiple PyTorch test-list YAMLs and durations (path fixes and explicit subdirectory listings); small unit-test and CODEOWNERS path adjustments.

Changes

Cohort / File(s) Change Summary
Test-list YAML edits
tests/integration/test_lists/test-db/l0_a30.yml, tests/integration/test_lists/test-db/l0_b200.yml, tests/integration/test_lists/test-db/l0_gb202.yml, tests/integration/test_lists/test-db/l0_h100.yml, tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
Fixed several test file paths (e.g., beam search → unittest/_torch/sampler/test_beam_search.py, attention MLA relocated under attention/) and replaced broad unittest/_torch selectors with explicit subdirectory listings (attention, compilation, debugger, executor, misc, modules, multimodal, sampler, speculative, thop); modeling filters made path-qualified.
Test durations mapping
tests/integration/defs/.test_durations
Replaced aggregated unittest/_torch duration entry with per-subsystem duration keys and updated durations; adjusted key/path for attention test to unittest/_torch/attention/test_attention_mla.py.
CODEOWNERS
.github/CODEOWNERS
Re-routed ownership: removed /tests/unittest/_torch/test_custom_ops.py and /tests/unittest/_torch/test_autotuner.py; added /tests/unittest/_torch/thop/test_custom_ops.py and /tests/unittest/_torch/misc/test_autotuner.py (ownership group unchanged).
New FlashInfer utility module
tensorrt_llm/_torch/flashinfer_utils.py
Added centralized FlashInfer integration helpers: get_env_enable_pdl(), ENABLE_PDL, and IS_FLASHINFER_AVAILABLE; attempts optional import of flashinfer on non-Windows and logs/falls back when unavailable.
Attention / custom-ops import refactors
tensorrt_llm/_torch/attention_backend/__init__.py, tensorrt_llm/_torch/attention_backend/utils.py, tensorrt_llm/_torch/custom_ops/__init__.py
Switched sourcing of IS_FLASHINFER_AVAILABLE to ..flashinfer_utils; re-exported attn_custom_op_inplace and mla_custom_op_inplace from modules.attention; updated __all__ to include the new exports; gating logic preserved.
FlashInfer custom ops behavior
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
Replaced per-platform probing and ad-hoc ImportError handling with centralized IS_FLASHINFER_AVAILABLE gating; added guarded imports (including rope-inplace); implemented FlashInfer-backed custom-op wrappers and their register_fake fallbacks under the availability guard.
Module import updates
tensorrt_llm/_torch/modules/rms_norm.py, tensorrt_llm/_torch/modules/rotary_embedding.py
Changed import source of IS_FLASHINFER_AVAILABLE to ..flashinfer_utils; existing runtime gating and behavior unchanged.
Unit test adjustments
tests/unittest/_torch/executor/test_resource_manager.py, tests/unittest/_torch/sampler/test_trtllm_sampler.py
test_resource_manager.py: root_dir moved up one level (now repo root) and added comments. test_trtllm_sampler.py: removed JSON fixture and file I/O, hard-coded sampling config, and updated test signature to drop the fixture.

Sequence Diagram(s)

sequenceDiagram
    participant Consumer as Module (attention/custom_ops/modules)
    participant Utils as flashinfer_utils
    participant FlashInfer as flashinfer (optional)

    Consumer->>Utils: import IS_FLASHINFER_AVAILABLE, ENABLE_PDL
    Utils->>Utils: get_env_enable_pdl() / read TRTLLM_ENABLE_PDL
    alt Non-Windows
        Utils->>FlashInfer: try import flashinfer
        FlashInfer-->>Utils: success / ImportError
        Utils->>Consumer: set IS_FLASHINFER_AVAILABLE True/False
    else Windows
        Utils->>Consumer: set IS_FLASHINFER_AVAILABLE False
    end
    Consumer->>Consumer: if IS_FLASHINFER_AVAILABLE: import/use flashinfer-backed ops (attn/mla/rope/rmsnorm) else use pure-PyTorch fallbacks
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

Community want to contribute

Suggested reviewers

  • litaotju
  • chzblych

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Funatiq Funatiq requested a review from omera-nv August 5, 2025 15:02
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 5, 2025

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/integration/test_lists/test-db/l0_h100.yml (1)

17-26: Enumerating every sub-directory hurts maintainability

Hard-coding the current list of _torch sub-folders means:

  1. Any newly-added sub-folder (e.g. a future distributed, training, …) will silently be skipped on H100 CI until someone remembers to touch every list file.
  2. The same list is duplicated across several YAMLs (see l0_b200.yml, l0_rtx_pro_6000.yml, …) so drift is likely.

Prefer an inclusive pattern with explicit exclusions, e.g.

-  - unittest/_torch/attention
-  - unittest/_torch/compilation
-  ...
-  - unittest/_torch/thop
+  # Run all _torch tests except the heavy groups we really want to avoid
+  - unittest/_torch -k "not modeling and not multi_gpu and not auto_deploy"

or keep the enumeration in a single YAML anchor that other configs can reference.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcbfa7e and a7ef9a1.

📒 Files selected for processing (5)
  • tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/test-db/l0_gb202.yml
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
  • tests/integration/test_lists/test-db/l0_a30.yml
  • tests/integration/test_lists/test-db/l0_h100.yml
  • tests/integration/test_lists/test-db/l0_b200.yml
🔇 Additional comments (3)
tests/integration/test_lists/test-db/l0_a30.yml (1)

23-23: Confirmed test_beam_search.py is present

The beam‐search unit test was found at tests/unittest/_torch/sampler/test_beam_search.py, so the CI selection will include it as expected. No further changes are needed.

tests/integration/test_lists/test-db/l0_gb202.yml (1)

23-23: Corrected test path matches new directory structure – looks good
The update to unittest/_torch/attention/test_attention_mla.py accurately reflects the sub-directory re-org and keeps this list in sync with the other YAML files. No further action needed.

tests/integration/test_lists/test-db/l0_b200.yml (1)

57-67: Granular PyTorch unittest selection looks good

Moving from the previous catch-all entry (unittest/_torch … -k "not …" TIMEOUT (90)) to explicit sub-directory targets should make future maintenance easier and avoids the brittle negative-keyword filter. Nice clean-up.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14161 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14161 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10687 completed with status: 'FAILURE'

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 1c7edd3 to c1e67e4 Compare August 6, 2025 08:46
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 6, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14279 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14279 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10784 completed with status: 'FAILURE'

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from c1e67e4 to 5072b39 Compare August 7, 2025 10:47
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14463 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14463 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10928 completed with status: 'FAILURE'

@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14474 [ run ] triggered by Bot

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 45208ff to 1be2649 Compare August 7, 2025 15:23
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14489 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14474 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14489 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10944 completed with status: 'SUCCESS'

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 1be2649 to ccb6a05 Compare August 8, 2025 08:38
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14593 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14593 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11024 completed with status: 'SUCCESS'

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from ccb6a05 to a3a1d2a Compare August 11, 2025 08:22
@Funatiq Funatiq marked this pull request as ready for review August 11, 2025 08:25
@Funatiq Funatiq requested review from a team as code owners August 11, 2025 08:25
@tensorrt-cicd
Copy link
Collaborator

PR_Github #15152 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
tests/unittest/_torch/executor/test_resource_manager.py (1)

47-56: Avoid re-generating LoRA test data on every class run; short-circuit if artifacts exist

Guard the generation step by checking for the expected .npy artifacts first. This cuts redundant work across re-runs and reduces CI flakiness.

Apply this diff:

         cpp_script_dir = os.path.join(cls.CPP_RESOURCES_DIR, "scripts")

-        # No reason to run this script for each test.
-        # TODO: move this to a fixture that runs once.
+        # No reason to run this script more than once per environment.
+        # Short-circuit if generated artifacts already exist.
+        if os.path.exists(cls.TP1_WEIGHTS_PATH) and os.path.exists(cls.TP1_CONFIG_PATH):
+            return
+
         generate_lora_data_args_tp1 = [

Additionally, consider making the timeout configurable and a bit more generous for busy CI nodes (e.g., via env TRTLLM_TEST_GEN_TIMEOUT, default 300s).

♻️ Duplicate comments (1)
tests/integration/test_lists/test-db/l0_h100.yml (1)

28-32: Quote entire YAML scalars that contain embedded quotes for safer parsing

Wrap the whole list item in quotes and flip inner quotes to single quotes. This avoids edge-cases across YAML parsers.

Apply these diffs:

-  - unittest/_torch/modeling -k "modeling_llama"
+  - "unittest/_torch/modeling -k 'modeling_llama'"
-  - unittest/_torch/modeling -k "modeling_mixtral"
+  - "unittest/_torch/modeling -k 'modeling_mixtral'"
-  - unittest/_torch/modeling -k "modeling_nemotron"
+  - "unittest/_torch/modeling -k 'modeling_nemotron'"
-  - unittest/_torch/modeling -k "modeling_gemma3"
+  - "unittest/_torch/modeling -k 'modeling_gemma3'"
-  - unittest/_torch/modeling -k "modeling_gpt_oss"
+  - "unittest/_torch/modeling -k 'modeling_gpt_oss'"
🧹 Nitpick comments (2)
tensorrt_llm/_torch/flashinfer_utils.py (1)

10-16: Minor API polish: add docstrings and export list; consistent logging

Add a short docstring for get_env_enable_pdl, expose all, and keep messages consistent. Also, consider returning a bool explicitly.

Apply this diff:

-def get_env_enable_pdl():
-    return os.environ.get("TRTLLM_ENABLE_PDL", "0") == "1"
+def get_env_enable_pdl() -> bool:
+    """Return True if Partial Decoding Layer (PDL) is enabled via env.
+
+    Controlled by environment variable: TRTLLM_ENABLE_PDL=1
+    """
+    return os.environ.get("TRTLLM_ENABLE_PDL", "0") == "1"
@@
-ENABLE_PDL = get_env_enable_pdl()
+ENABLE_PDL: bool = get_env_enable_pdl()
 if ENABLE_PDL:
     logger.info("PDL is enabled")
+
+__all__ = ["IS_FLASHINFER_AVAILABLE", "ENABLE_PDL", "get_env_enable_pdl"]
tests/unittest/_torch/sampler/test_trtllm_sampler.py (1)

43-46: Be explicit about top_p rather than passing None

Some SamplingParams implementations treat None differently than omitting the argument. To avoid ambiguity, either omit top_p (use default) or set it explicitly to 1.0.

Apply this minimal change:

-    temperature = 1.0
-    top_p = None
+    temperature = 1.0
+    top_p = 1.0

Alternatively, drop top_p from SamplingParams(...) entirely to inherit the library default.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f27f141 and 59da3ea.

📒 Files selected for processing (16)
  • .github/CODEOWNERS (1 hunks)
  • tensorrt_llm/_torch/attention_backend/__init__.py (1 hunks)
  • tensorrt_llm/_torch/attention_backend/utils.py (1 hunks)
  • tensorrt_llm/_torch/custom_ops/__init__.py (2 hunks)
  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (2 hunks)
  • tensorrt_llm/_torch/flashinfer_utils.py (1 hunks)
  • tensorrt_llm/_torch/modules/rms_norm.py (1 hunks)
  • tensorrt_llm/_torch/modules/rotary_embedding.py (1 hunks)
  • tests/integration/defs/.test_durations (3 hunks)
  • tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)
  • tests/unittest/_torch/executor/test_resource_manager.py (2 hunks)
  • tests/unittest/_torch/sampler/test_trtllm_sampler.py (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • tensorrt_llm/_torch/modules/rotary_embedding.py
  • .github/CODEOWNERS
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
  • tests/integration/test_lists/test-db/l0_a30.yml
  • tests/integration/test_lists/test-db/l0_gb202.yml
  • tests/integration/test_lists/test-db/l0_b200.yml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tensorrt_llm/_torch/flashinfer_utils.py
  • tensorrt_llm/_torch/attention_backend/__init__.py
  • tensorrt_llm/_torch/attention_backend/utils.py
  • tensorrt_llm/_torch/modules/rms_norm.py
  • tests/unittest/_torch/executor/test_resource_manager.py
  • tensorrt_llm/_torch/custom_ops/__init__.py
  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
  • tests/unittest/_torch/sampler/test_trtllm_sampler.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tensorrt_llm/_torch/flashinfer_utils.py
  • tensorrt_llm/_torch/attention_backend/__init__.py
  • tensorrt_llm/_torch/attention_backend/utils.py
  • tensorrt_llm/_torch/modules/rms_norm.py
  • tests/unittest/_torch/executor/test_resource_manager.py
  • tensorrt_llm/_torch/custom_ops/__init__.py
  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
  • tests/unittest/_torch/sampler/test_trtllm_sampler.py
🧠 Learnings (4)
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/unittest/_torch/executor/test_resource_manager.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/unittest/_torch/executor/test_resource_manager.py
📚 Learning: 2025-08-11T20:09:24.389Z
Learnt from: achartier
PR: NVIDIA/TensorRT-LLM#6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Applied to files:

  • tests/unittest/_torch/executor/test_resource_manager.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/_torch/sampler/test_trtllm_sampler.py
  • tests/integration/test_lists/test-db/l0_h100.yml
🪛 Gitleaks (8.27.2)
tests/integration/test_lists/test-db/l0_h100.yml

27-27: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (15)
tests/unittest/_torch/executor/test_resource_manager.py (1)

26-26: Integration paths verified – root_dir bump is safe

  • Confirmed that the following exist in the repo root:
    • tests/integration
    • cpp/tests/resources/scripts/generate_test_lora_weights.py
    • cpp/tests/resources/data

All required directories and files are present, so the updated root_dir will resolve imports correctly in CI.

tests/integration/test_lists/test-db/l0_h100.yml (2)

33-33: Reintroducing perf bench filter is fine in this band

Explicit spec via node::test syntax is consistent with other entries. No issues.


17-26: All unittest/_torch selectors resolve to valid test directories

I’ve verified that each of the following paths under tests/unittest/torch exists and contains at least one test*.py file, so there are no no-op selectors:

  • tests/unittest/_torch/attention
  • tests/unittest/_torch/compilation
  • tests/unittest/_torch/debugger
  • tests/unittest/_torch/executor
  • tests/unittest/_torch/misc
  • tests/unittest/_torch/modules
  • tests/unittest/_torch/multimodal
  • tests/unittest/_torch/sampler
  • tests/unittest/_torch/speculative
  • tests/unittest/_torch/thop
tests/unittest/_torch/sampler/test_trtllm_sampler.py (1)

32-32: Test signature simplification is fine

Dropping the JSON-driven fixture in favor of inline config keeps this test focused; no correctness issues spotted.

tests/integration/defs/.test_durations (3)

141-150: Granular durations for unittest/_torch sub-suites: good change

This will improve scheduling and predictability for the split test lists.


250-250: Modeling key path updated: consistent with YAML changes

The move to unittest/torch/modeling -k "modeling_llama" aligns with l0* test lists.


292-292: Attention MLA path fix acknowledged

Duration key updated to new location tests/unittest/_torch/attention/test_attention_mla.py; no issues.

tensorrt_llm/_torch/attention_backend/__init__.py (1)

1-1: LGTM! Import source refactored to centralized module.

The import of IS_FLASHINFER_AVAILABLE has been correctly updated to use the new centralized flashinfer_utils module. This improves maintainability by having a single source of truth for FlashInfer availability checks across the codebase.

tensorrt_llm/_torch/attention_backend/utils.py (1)

4-4: LGTM! Import refactored to use centralized FlashInfer utils.

The import has been correctly updated to source IS_FLASHINFER_AVAILABLE from the new centralized flashinfer_utils module, maintaining consistency with the broader refactoring effort.

tensorrt_llm/_torch/modules/rms_norm.py (1)

22-22: LGTM! Import source updated to centralized FlashInfer utilities.

The import change correctly aligns with the centralization of FlashInfer availability checks. The module now uses the shared flashinfer_utils module as the source of truth for IS_FLASHINFER_AVAILABLE.

tensorrt_llm/_torch/custom_ops/__init__.py (2)

1-2: LGTM! Import updates align with centralization and API expansion.

The changes correctly:

  • Import IS_FLASHINFER_AVAILABLE from the centralized flashinfer_utils module
  • Import the new attention custom ops from the modules layer

This maintains consistency with the broader refactoring effort while expanding the public API.


16-17: LGTM! Public API expanded with new attention operations.

The addition of attn_custom_op_inplace and mla_custom_op_inplace to the __all__ list properly exposes these operations as part of the public API.

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (3)

3-3: LGTM! Import updated to use centralized FlashInfer utilities.

The import correctly sources both ENABLE_PDL and IS_FLASHINFER_AVAILABLE from the new centralized flashinfer_utils module, replacing the previous platform-specific availability checks.


8-8: LGTM! Import follows centralized gating pattern.

The import of apply_rope_with_cos_sin_cache_inplace is now properly guarded by the centralized IS_FLASHINFER_AVAILABLE flag, ensuring consistent availability checking across the codebase.


48-48: LGTM! Function call updated to use imported wrapper.

The call has been correctly updated to use the imported apply_rope_with_cos_sin_cache_inplace function instead of the fully qualified module path, which aligns with the new import structure.

Copy link
Collaborator

@tburt-nv tburt-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewing CODEOWNERs here, looks good to me.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15152 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11443 completed with status: 'FAILURE'

@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 13, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15180 [ run ] triggered by Bot

@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 59da3ea to c691c0b Compare August 19, 2025 13:11
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 19, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15789 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15789 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11867 completed with status: 'FAILURE'

- Moved all unittest files to sub-directories.
- Updated test lists in YAML files to include new test paths.

This commit improves the testing framework for unittest files, ensuring comprehensive validation of the new implementations.

Signed-off-by: Robin Kobus <[email protected]>
Signed-off-by: Robin Kobus <[email protected]>
- Included `attn_custom_op_inplace` and `mla_custom_op_inplace` in the `__all__` list of the custom ops module.
- This change fixes unittest/_torch/compilation/test_add_norm.py::test_add_norm_fusion.

Signed-off-by: Robin Kobus <[email protected]>
- Added `flashinfer_utils.py` to manage flashinfer availability and PDL environment variable.
- Updated imports across multiple modules to source `IS_FLASHINFER_AVAILABLE` and `ENABLE_PDL` from the new utility file.
- This refactor improves code organization and maintains clarity regarding flashinfer's integration.
- This change fixes circular import issues in tensorrt_llm/_torch/attention_backend/__init__.py.

Signed-off-by: Robin Kobus <[email protected]>
…ixture

- Removed the test_case fixture and its associated JSON loading, simplifying the test setup.
- Hardcoded sampling parameters for clarity and consistency in the test execution.

Signed-off-by: Robin Kobus <[email protected]>
- Changed paths for `test_custom_ops.py` and `test_autotuner.py` to their new sub-directory locations.
- This update ensures proper ownership and visibility for the relocated test files.

Signed-off-by: Robin Kobus <[email protected]>
- Added new test duration entries for various unittest cases in the `_torch` module.
- Adjusted existing entries for clarity and consistency in naming conventions.
- This update enhances the tracking of test performance metrics.

Signed-off-by: Robin Kobus <[email protected]>
@Funatiq Funatiq force-pushed the dev/ci/restructure_unit_tests branch from c691c0b to 52eee4b Compare August 19, 2025 18:52
@Funatiq
Copy link
Collaborator Author

Funatiq commented Aug 19, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15817 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/custom_ops/__init__.py (1)

20-31: Ensure tests guard against missing flashinfer support

No circular imports involving modules.attention were found. However, the test in tests/unittest/_torch/compilation/test_add_norm.py unconditionally does:

from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm

This will break when IS_FLASHINFER_AVAILABLE=False. Please update the test to skip or guard this import, for example:

• File: tests/unittest/_torch/compilation/test_add_norm.py (line 5)
Replace:

- from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm

With something like:

+ import pytest
+ pytest.importorskip("tensorrt_llm._torch.custom_ops", reason="requires flashinfer support")
+ from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm

This ensures tests pass regardless of flashinfer availability.

♻️ Duplicate comments (2)
.github/CODEOWNERS (1)

53-54: Prefer directory-level CODEOWNERS to cover all thop/misc tests and future additions

The path updates are correct after the test move. To avoid having to maintain per-file entries and to ensure all existing and future tests under these subdirectories are owned, replace the two file-level rules with directory-level rules.

Proposed diff:

-/tests/unittest/_torch/thop/test_custom_ops.py @NVIDIA/trt-llm-torch-graph-compiler
-/tests/unittest/_torch/misc/test_autotuner.py @NVIDIA/trt-llm-torch-graph-compiler
+/tests/unittest/_torch/thop/ @NVIDIA/trt-llm-torch-graph-compiler
+/tests/unittest/_torch/misc/ @NVIDIA/trt-llm-torch-graph-compiler

Optional: run this script to confirm there are multiple tests under these dirs (justifying directory-level ownership) and to show current CODEOWNERS matches:

#!/bin/bash
set -euo pipefail

echo "Counting tests under thop/ and misc/:"
thop_count=$(fd -t f -e py . tests/unittest/_torch/thop 2>/dev/null | wc -l | tr -d ' ')
misc_count=$(fd -t f -e py . tests/unittest/_torch/misc 2>/dev/null | wc -l | tr -d ' ')
echo "thop: ${thop_count} files"
echo "misc: ${misc_count} files"
echo

echo "Listing thop tests:"
fd -t f -e py . tests/unittest/_torch/thop 2>/dev/null || true
echo

echo "Listing misc tests:"
fd -t f -e py . tests/unittest/_torch/misc 2>/dev/null || true
echo

echo "Current CODEOWNERS entries for thop/misc:"
rg -nE '^\s*/tests/unittest/_torch/(thop|misc)(/|/.*|\.|$)' .github/CODEOWNERS || true
tests/integration/test_lists/test-db/l0_h100.yml (1)

28-28: Quote entire YAML scalar when embedding quotes in -k filters

Safer across YAML parsers and matches prior guidance. Use outer double-quotes and inner single-quotes for the -k expression.

-  - unittest/_torch/modeling -k "modeling_llama"
+  - "unittest/_torch/modeling -k 'modeling_llama'"

Consider applying the same quoting pattern to the adjacent modeling filter lines for consistency.

🧹 Nitpick comments (4)
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (4)

10-10: Fix wording/typo in comment (“Warp” → “Wrap”).

Minor clarity improvement in a user-facing code comment.

-    # Warp this into custom op since flashinfer didn't warp it properly and we want to avoid graph break between mlp layer for user buffer optimization
+    # Wrap this into a custom op since FlashInfer didn't expose it properly, and we want to avoid graph breaks between MLP layers for user-buffer optimization

19-19: Fix wording and grammar in comment.

Minor clarity/grammar improvements.

-    # Warp this into custom op since flashinfer provides default value for eps with would produce two different graphs depends on the eps value.
+    # Wrap this into a custom op since FlashInfer provides a default value for eps,
+    # which would produce two different graphs depending on the eps value.

30-36: Provide a fake (meta) kernel for fused_add_rmsnorm.

Without a registered fake, torch.compile/meta passes may fail when encountering this custom op. Add a lightweight validator that returns None and checks shapes.

     @torch.library.custom_op("trtllm::flashinfer_fused_add_rmsnorm",
                              mutates_args=("input", "residual"))
     def flashinfer_fused_add_rmsnorm(input: torch.Tensor,
                                      residual: torch.Tensor,
                                      weight: torch.Tensor, eps: float) -> None:
         fused_add_rmsnorm(input, residual, weight, eps, enable_pdl=ENABLE_PDL)
+
+    @flashinfer_fused_add_rmsnorm.register_fake
+    def _(input: torch.Tensor,
+          residual: torch.Tensor,
+          weight: torch.Tensor,
+          eps: float) -> None:
+        # Meta: validate shapes/dtypes; op is in-place and returns None.
+        assert input.shape == residual.shape, "input and residual must have the same shape"
+        assert weight.dim() == 1 and weight.numel() == input.shape[-1], \
+            "weight must be 1D and match the last dimension of input"
+        return

57-66: Strengthen the fake (meta) kernel for ROPE in-place op.

Add shape checks to catch misuses during tracing/compile; keep return type None for in-place semantics.

-    @flashinfer_apply_rope_with_cos_sin_cache_inplace.register_fake
-    def _(
-        positions: torch.Tensor,
-        query: torch.Tensor,
-        key: torch.Tensor,
-        head_size: int,
-        cos_sin_cache: torch.Tensor,
-        is_neox: bool = True,
-    ):
-        return
+    @flashinfer_apply_rope_with_cos_sin_cache_inplace.register_fake
+    def _(
+        positions: torch.Tensor,
+        query: torch.Tensor,
+        key: torch.Tensor,
+        head_size: int,
+        cos_sin_cache: torch.Tensor,
+        is_neox: bool = True,
+    ) -> None:
+        # Meta: validate shapes; op is in-place on query/key.
+        assert query.shape == key.shape, "query and key must have the same shape"
+        assert cos_sin_cache.dim() >= 2, "cos_sin_cache expected to have at least 2 dims"
+        # positions is typically [B, T] or [T]; rely on runtime for exact layout.
+        return
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c691c0b and 52eee4b.

📒 Files selected for processing (16)
  • .github/CODEOWNERS (1 hunks)
  • tensorrt_llm/_torch/attention_backend/__init__.py (1 hunks)
  • tensorrt_llm/_torch/attention_backend/utils.py (1 hunks)
  • tensorrt_llm/_torch/custom_ops/__init__.py (2 hunks)
  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (2 hunks)
  • tensorrt_llm/_torch/flashinfer_utils.py (1 hunks)
  • tensorrt_llm/_torch/modules/rms_norm.py (1 hunks)
  • tensorrt_llm/_torch/modules/rotary_embedding.py (1 hunks)
  • tests/integration/defs/.test_durations (3 hunks)
  • tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)
  • tests/unittest/_torch/executor/test_resource_manager.py (2 hunks)
  • tests/unittest/_torch/sampler/test_trtllm_sampler.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
  • tensorrt_llm/_torch/attention_backend/init.py
  • tensorrt_llm/_torch/flashinfer_utils.py
  • tensorrt_llm/_torch/attention_backend/utils.py
  • tests/integration/test_lists/test-db/l0_a30.yml
  • tests/unittest/_torch/executor/test_resource_manager.py
  • tensorrt_llm/_torch/modules/rms_norm.py
  • tests/integration/defs/.test_durations
  • tests/integration/test_lists/test-db/l0_b200.yml
  • tensorrt_llm/_torch/modules/rotary_embedding.py
  • tests/integration/test_lists/test-db/l0_gb202.yml
  • tests/unittest/_torch/sampler/test_trtllm_sampler.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
  • tensorrt_llm/_torch/custom_ops/__init__.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
  • tensorrt_llm/_torch/custom_ops/__init__.py
🧠 Learnings (4)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • .github/CODEOWNERS
  • tests/integration/test_lists/test-db/l0_h100.yml
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • .github/CODEOWNERS
📚 Learning: 2025-08-11T20:09:24.389Z
Learnt from: achartier
PR: NVIDIA/TensorRT-LLM#6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Applied to files:

  • .github/CODEOWNERS
📚 Learning: 2025-08-14T15:43:23.067Z
Learnt from: MatthiasKohl
PR: NVIDIA/TensorRT-LLM#6904
File: tensorrt_llm/_torch/attention_backend/trtllm.py:259-262
Timestamp: 2025-08-14T15:43:23.067Z
Learning: In TensorRT-LLM's attention backend, tensor parameters in the plan() method are assigned directly without validation (dtype, device, contiguity checks). This maintains consistency across all tensor inputs and follows the pattern of trusting callers to provide correctly formatted tensors.

Applied to files:

  • .github/CODEOWNERS
🧬 Code Graph Analysis (2)
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (1)
tensorrt_llm/_torch/modules/swiglu.py (1)
  • silu_and_mul (54-79)
tensorrt_llm/_torch/custom_ops/__init__.py (1)
tensorrt_llm/_torch/modules/attention.py (2)
  • attn_custom_op_inplace (72-98)
  • mla_custom_op_inplace (534-541)
🪛 Gitleaks (8.27.2)
tests/integration/test_lists/test-db/l0_h100.yml

27-27: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/integration/test_lists/test-db/l0_h100.yml (2)

17-26: Verified: PyTorch sub-directory buckets exist

The sanity check confirms that all directories listed in tests/integration/test_lists/test-db/l0_h100.yml (lines 17–26) under tests/unittest/_torch/ are present. CI path-not-found failures should be resolved. Approved.


33-33: Verified: Test path and symbol exist

The test file and function have been confirmed:

  • tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py::test_trtllm_bench_backend_comparison

No further action required.

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (3)

5-9: Centralized FlashInfer gating looks good.

Import-time side effects are minimized and the availability check is now unified via flashinfer_utils. This will simplify maintenance and avoid noisy ImportErrors on unsupported platforms.


37-55: ROPE wrapper: interface alignment looks correct.

Signature and mutates_args align with the underlying in-place behavior; passthrough of is_neox and head_size is consistent with typical FlashInfer ROPE APIs.


41-47: is_neox default is never relied upon—call sites pass it explicitly
The only invocation of flashinfer_apply_rope_with_cos_sin_cache_inplace in rotary_embedding.py supplies self.is_neox as the sixth argument, so the wrapper’s default (True) is never used. No positional shifts or default mismatches will occur.

tensorrt_llm/_torch/custom_ops/__init__.py (1)

16-17: Good addition to public API.

Exposing attn_custom_op_inplace and mla_custom_op_inplace here improves discoverability and keeps users on the stable import path.

@Funatiq Funatiq enabled auto-merge (squash) August 20, 2025 06:54
@tensorrt-cicd
Copy link
Collaborator

PR_Github #15817 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11889 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Funatiq Funatiq merged commit b95cab2 into NVIDIA:main Aug 20, 2025
5 checks passed
@Funatiq Funatiq deleted the dev/ci/restructure_unit_tests branch August 20, 2025 10:12
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants