[None][ci] move unittests to sub-directories #6635

Funatiq · 2025-08-05T15:02:06Z

Summary by CodeRabbit

New Features
- Optional FlashInfer integration added with a central availability flag, PDL toggle via env var, and clearer install/error messaging; optimized ops exposed when available.
Tests
- PyTorch test suites reorganized from broad filters to more granular, explicit listings; several test paths corrected and per-suite durations updated.
- A sampler test simplified to use a fixed configuration instead of external JSON.
Chores
- Updated test ownership mappings to adjust PR review routing.

Description

Move all unit test files to sub-directories.
Update YAML files to run separate CI job for each unit test directory.
Update test durations to ensure more even test distribution.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-08-05T15:02:13Z

📝 Walkthrough

Walkthrough

Refactored FlashInfer availability into a new flashinfer_utils module and switched imports to it across attention, custom_ops, and modules; added guarded FlashInfer wrappers; updated multiple PyTorch test-list YAMLs and durations (path fixes and explicit subdirectory listings); small unit-test and CODEOWNERS path adjustments.

Changes

Cohort / File(s)	Change Summary
Test-list YAML edits `tests/integration/test_lists/test-db/l0_a30.yml`, `tests/integration/test_lists/test-db/l0_b200.yml`, `tests/integration/test_lists/test-db/l0_gb202.yml`, `tests/integration/test_lists/test-db/l0_h100.yml`, `tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml`	Fixed several test file paths (e.g., beam search → `unittest/_torch/sampler/test_beam_search.py`, attention MLA relocated under `attention/`) and replaced broad `unittest/_torch` selectors with explicit subdirectory listings (attention, compilation, debugger, executor, misc, modules, multimodal, sampler, speculative, thop); modeling filters made path-qualified.
Test durations mapping `tests/integration/defs/.test_durations`	Replaced aggregated `unittest/_torch` duration entry with per-subsystem duration keys and updated durations; adjusted key/path for attention test to `unittest/_torch/attention/test_attention_mla.py`.
CODEOWNERS `.github/CODEOWNERS`	Re-routed ownership: removed `/tests/unittest/_torch/test_custom_ops.py` and `/tests/unittest/_torch/test_autotuner.py`; added `/tests/unittest/_torch/thop/test_custom_ops.py` and `/tests/unittest/_torch/misc/test_autotuner.py` (ownership group unchanged).
New FlashInfer utility module `tensorrt_llm/_torch/flashinfer_utils.py`	Added centralized FlashInfer integration helpers: `get_env_enable_pdl()`, `ENABLE_PDL`, and `IS_FLASHINFER_AVAILABLE`; attempts optional import of `flashinfer` on non-Windows and logs/falls back when unavailable.
Attention / custom-ops import refactors `tensorrt_llm/_torch/attention_backend/__init__.py`, `tensorrt_llm/_torch/attention_backend/utils.py`, `tensorrt_llm/_torch/custom_ops/__init__.py`	Switched sourcing of `IS_FLASHINFER_AVAILABLE` to `..flashinfer_utils`; re-exported `attn_custom_op_inplace` and `mla_custom_op_inplace` from `modules.attention`; updated `__all__` to include the new exports; gating logic preserved.
FlashInfer custom ops behavior `tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py`	Replaced per-platform probing and ad-hoc ImportError handling with centralized `IS_FLASHINFER_AVAILABLE` gating; added guarded imports (including rope-inplace); implemented FlashInfer-backed custom-op wrappers and their register_fake fallbacks under the availability guard.
Module import updates `tensorrt_llm/_torch/modules/rms_norm.py`, `tensorrt_llm/_torch/modules/rotary_embedding.py`	Changed import source of `IS_FLASHINFER_AVAILABLE` to `..flashinfer_utils`; existing runtime gating and behavior unchanged.
Unit test adjustments `tests/unittest/_torch/executor/test_resource_manager.py`, `tests/unittest/_torch/sampler/test_trtllm_sampler.py`	`test_resource_manager.py`: root_dir moved up one level (now repo root) and added comments. `test_trtllm_sampler.py`: removed JSON fixture and file I/O, hard-coded sampling config, and updated test signature to drop the fixture.

Sequence Diagram(s)

sequenceDiagram
    participant Consumer as Module (attention/custom_ops/modules)
    participant Utils as flashinfer_utils
    participant FlashInfer as flashinfer (optional)

    Consumer->>Utils: import IS_FLASHINFER_AVAILABLE, ENABLE_PDL
    Utils->>Utils: get_env_enable_pdl() / read TRTLLM_ENABLE_PDL
    alt Non-Windows
        Utils->>FlashInfer: try import flashinfer
        FlashInfer-->>Utils: success / ImportError
        Utils->>Consumer: set IS_FLASHINFER_AVAILABLE True/False
    else Windows
        Utils->>Consumer: set IS_FLASHINFER_AVAILABLE False
    end
    Consumer->>Consumer: if IS_FLASHINFER_AVAILABLE: import/use flashinfer-backed ops (attn/mla/rope/rmsnorm) else use pure-PyTorch fallbacks

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[https://nvbugs/5340941][https://nvbugs/5375785] - fix: Wrap attentio… #6355 — Related changes introducing/integrating in-place attention custom op and attention/custom_ops refactors.
[None][fix] update skip config #6891 — Overlapping edits to integration test lists (l0_h100) and test-list restructuring.
[TRTLLM-5500][infra] Update CODEOWNERS with new ownership rules for additional paths #6564 — Overlapping .github/CODEOWNERS edits affecting test ownership mappings.

Suggested labels

Community want to contribute

Suggested reviewers

litaotju
chzblych

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Funatiq · 2025-08-05T15:04:40Z

/bot run

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/integration/test_lists/test-db/l0_h100.yml (1)
17-26: Enumerating every sub-directory hurts maintainability

Hard-coding the current list of _torch sub-folders means:

Any newly-added sub-folder (e.g. a future distributed, training, …) will silently be skipped on H100 CI until someone remembers to touch every list file.

The same list is duplicated across several YAMLs (see l0_b200.yml, l0_rtx_pro_6000.yml, …) so drift is likely.

Prefer an inclusive pattern with explicit exclusions, e.g.
-  - unittest/_torch/attention
-  - unittest/_torch/compilation
-  ...
-  - unittest/_torch/thop
+  # Run all _torch tests except the heavy groups we really want to avoid
+  - unittest/_torch -k "not modeling and not multi_gpu and not auto_deploy"
or keep the enumeration in a single YAML anchor that other configs can reference.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcbfa7e and a7ef9a1.

📒 Files selected for processing (5)

tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/test-db/l0_gb202.yml
tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
tests/integration/test_lists/test-db/l0_a30.yml
tests/integration/test_lists/test-db/l0_h100.yml
tests/integration/test_lists/test-db/l0_b200.yml

🔇 Additional comments (3)

tests/integration/test_lists/test-db/l0_a30.yml (1)

23-23: Confirmed test_beam_search.py is present

The beam‐search unit test was found at tests/unittest/_torch/sampler/test_beam_search.py, so the CI selection will include it as expected. No further changes are needed.

tests/integration/test_lists/test-db/l0_gb202.yml (1)

23-23: Corrected test path matches new directory structure – looks good
The update to unittest/_torch/attention/test_attention_mla.py accurately reflects the sub-directory re-org and keeps this list in sync with the other YAML files. No further action needed.

tests/integration/test_lists/test-db/l0_b200.yml (1)

57-67: Granular PyTorch unittest selection looks good

Moving from the previous catch-all entry (unittest/_torch … -k "not …" TIMEOUT (90)) to explicit sub-directory targets should make future maintenance easier and avoids the brittle negative-keyword filter. Nice clean-up.

tests/integration/test_lists/test-db/l0_b200.yml

tests/integration/test_lists/test-db/l0_h100.yml

tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml

tensorrt-cicd · 2025-08-05T15:10:02Z

PR_Github #14161 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-05T16:25:13Z

PR_Github #14161 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10687 completed with status: 'FAILURE'

Funatiq · 2025-08-06T08:47:00Z

/bot run

tensorrt-cicd · 2025-08-06T08:52:37Z

PR_Github #14279 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-06T14:55:42Z

PR_Github #14279 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10784 completed with status: 'FAILURE'

Funatiq · 2025-08-07T10:47:42Z

/bot run

tensorrt-cicd · 2025-08-07T10:52:55Z

PR_Github #14463 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-07T11:23:34Z

PR_Github #14463 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10928 completed with status: 'FAILURE'

Funatiq · 2025-08-07T13:14:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-07T13:19:15Z

PR_Github #14474 [ run ] triggered by Bot

Funatiq · 2025-08-07T15:23:21Z

/bot run

tensorrt-cicd · 2025-08-07T15:28:45Z

PR_Github #14489 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-07T15:28:46Z

PR_Github #14474 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-08-07T21:34:03Z

PR_Github #14489 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10944 completed with status: 'SUCCESS'

Funatiq · 2025-08-08T08:39:10Z

/bot run

tensorrt-cicd · 2025-08-08T08:44:15Z

PR_Github #14593 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-09T02:53:33Z

PR_Github #14593 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11024 completed with status: 'SUCCESS'

tensorrt-cicd · 2025-08-13T14:39:04Z

PR_Github #15152 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

tests/unittest/_torch/executor/test_resource_manager.py (1)
47-56: Avoid re-generating LoRA test data on every class run; short-circuit if artifacts exist

Guard the generation step by checking for the expected .npy artifacts first. This cuts redundant work across re-runs and reduces CI flakiness.

Apply this diff:
         cpp_script_dir = os.path.join(cls.CPP_RESOURCES_DIR, "scripts")

-        # No reason to run this script for each test.
-        # TODO: move this to a fixture that runs once.
+        # No reason to run this script more than once per environment.
+        # Short-circuit if generated artifacts already exist.
+        if os.path.exists(cls.TP1_WEIGHTS_PATH) and os.path.exists(cls.TP1_CONFIG_PATH):
+            return
+
         generate_lora_data_args_tp1 = [
Additionally, consider making the timeout configurable and a bit more generous for busy CI nodes (e.g., via env TRTLLM_TEST_GEN_TIMEOUT, default 300s).

♻️ Duplicate comments (1)

tests/integration/test_lists/test-db/l0_h100.yml (1)
28-32: Quote entire YAML scalars that contain embedded quotes for safer parsing

Wrap the whole list item in quotes and flip inner quotes to single quotes. This avoids edge-cases across YAML parsers.

Apply these diffs:
-  - unittest/_torch/modeling -k "modeling_llama"
+  - "unittest/_torch/modeling -k 'modeling_llama'"
-  - unittest/_torch/modeling -k "modeling_mixtral"
+  - "unittest/_torch/modeling -k 'modeling_mixtral'"
-  - unittest/_torch/modeling -k "modeling_nemotron"
+  - "unittest/_torch/modeling -k 'modeling_nemotron'"
-  - unittest/_torch/modeling -k "modeling_gemma3"
+  - "unittest/_torch/modeling -k 'modeling_gemma3'"
-  - unittest/_torch/modeling -k "modeling_gpt_oss"
+  - "unittest/_torch/modeling -k 'modeling_gpt_oss'"

🧹 Nitpick comments (2)

tensorrt_llm/_torch/flashinfer_utils.py (1)
10-16: Minor API polish: add docstrings and export list; consistent logging

Add a short docstring for get_env_enable_pdl, expose all, and keep messages consistent. Also, consider returning a bool explicitly.

Apply this diff:
-def get_env_enable_pdl():
-    return os.environ.get("TRTLLM_ENABLE_PDL", "0") == "1"
+def get_env_enable_pdl() -> bool:
+    """Return True if Partial Decoding Layer (PDL) is enabled via env.
+
+    Controlled by environment variable: TRTLLM_ENABLE_PDL=1
+    """
+    return os.environ.get("TRTLLM_ENABLE_PDL", "0") == "1"
@@
-ENABLE_PDL = get_env_enable_pdl()
+ENABLE_PDL: bool = get_env_enable_pdl()
 if ENABLE_PDL:
     logger.info("PDL is enabled")
+
+__all__ = ["IS_FLASHINFER_AVAILABLE", "ENABLE_PDL", "get_env_enable_pdl"]
tests/unittest/_torch/sampler/test_trtllm_sampler.py (1)
43-46: Be explicit about top_p rather than passing None

Some SamplingParams implementations treat None differently than omitting the argument. To avoid ambiguity, either omit top_p (use default) or set it explicitly to 1.0.

Apply this minimal change:
-    temperature = 1.0
-    top_p = None
+    temperature = 1.0
+    top_p = 1.0
Alternatively, drop top_p from SamplingParams(...) entirely to inherit the library default.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f27f141 and 59da3ea.

📒 Files selected for processing (16)

.github/CODEOWNERS (1 hunks)
tensorrt_llm/_torch/attention_backend/__init__.py (1 hunks)
tensorrt_llm/_torch/attention_backend/utils.py (1 hunks)
tensorrt_llm/_torch/custom_ops/__init__.py (2 hunks)
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (2 hunks)
tensorrt_llm/_torch/flashinfer_utils.py (1 hunks)
tensorrt_llm/_torch/modules/rms_norm.py (1 hunks)
tensorrt_llm/_torch/modules/rotary_embedding.py (1 hunks)
tests/integration/defs/.test_durations (3 hunks)
tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)
tests/unittest/_torch/executor/test_resource_manager.py (2 hunks)
tests/unittest/_torch/sampler/test_trtllm_sampler.py (2 hunks)

✅ Files skipped from review due to trivial changes (2)

tensorrt_llm/_torch/modules/rotary_embedding.py
.github/CODEOWNERS

🚧 Files skipped from review as they are similar to previous changes (4)

tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
tests/integration/test_lists/test-db/l0_a30.yml
tests/integration/test_lists/test-db/l0_gb202.yml
tests/integration/test_lists/test-db/l0_b200.yml

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/_torch/flashinfer_utils.py
tensorrt_llm/_torch/attention_backend/__init__.py
tensorrt_llm/_torch/attention_backend/utils.py
tensorrt_llm/_torch/modules/rms_norm.py
tests/unittest/_torch/executor/test_resource_manager.py
tensorrt_llm/_torch/custom_ops/__init__.py
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
tests/unittest/_torch/sampler/test_trtllm_sampler.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/flashinfer_utils.py
tensorrt_llm/_torch/attention_backend/__init__.py
tensorrt_llm/_torch/attention_backend/utils.py
tensorrt_llm/_torch/modules/rms_norm.py
tests/unittest/_torch/executor/test_resource_manager.py
tensorrt_llm/_torch/custom_ops/__init__.py
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
tests/unittest/_torch/sampler/test_trtllm_sampler.py

🧠 Learnings (4)

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tests/unittest/_torch/executor/test_resource_manager.py

📚 Learning: 2025-08-06T13:58:07.506Z

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

tests/unittest/_torch/executor/test_resource_manager.py

📚 Learning: 2025-08-11T20:09:24.389Z

Learnt from: achartier
PR: NVIDIA/TensorRT-LLM#6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Applied to files:

tests/unittest/_torch/executor/test_resource_manager.py

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/unittest/_torch/sampler/test_trtllm_sampler.py
tests/integration/test_lists/test-db/l0_h100.yml

🪛 Gitleaks (8.27.2)

tests/integration/test_lists/test-db/l0_h100.yml

27-27: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (15)

tests/unittest/_torch/executor/test_resource_manager.py (1)

26-26: Integration paths verified – root_dir bump is safe

Confirmed that the following exist in the repo root:

tests/integration

cpp/tests/resources/scripts/generate_test_lora_weights.py

cpp/tests/resources/data

All required directories and files are present, so the updated root_dir will resolve imports correctly in CI.

tests/integration/test_lists/test-db/l0_h100.yml (2)

33-33: Reintroducing perf bench filter is fine in this band

Explicit spec via node::test syntax is consistent with other entries. No issues.

17-26: All unittest/_torch selectors resolve to valid test directories

I’ve verified that each of the following paths under tests/unittest/torch exists and contains at least one test*.py file, so there are no no-op selectors:

tests/unittest/_torch/attention

tests/unittest/_torch/compilation

tests/unittest/_torch/debugger

tests/unittest/_torch/executor

tests/unittest/_torch/misc

tests/unittest/_torch/modules

tests/unittest/_torch/multimodal

tests/unittest/_torch/sampler

tests/unittest/_torch/speculative

tests/unittest/_torch/thop

tests/unittest/_torch/sampler/test_trtllm_sampler.py (1)

32-32: Test signature simplification is fine

Dropping the JSON-driven fixture in favor of inline config keeps this test focused; no correctness issues spotted.

tests/integration/defs/.test_durations (3)

141-150: Granular durations for unittest/_torch sub-suites: good change

This will improve scheduling and predictability for the split test lists.

250-250: Modeling key path updated: consistent with YAML changes

The move to unittest/torch/modeling -k "modeling_llama" aligns with l0* test lists.

292-292: Attention MLA path fix acknowledged

Duration key updated to new location tests/unittest/_torch/attention/test_attention_mla.py; no issues.

tensorrt_llm/_torch/attention_backend/__init__.py (1)

1-1: LGTM! Import source refactored to centralized module.

The import of IS_FLASHINFER_AVAILABLE has been correctly updated to use the new centralized flashinfer_utils module. This improves maintainability by having a single source of truth for FlashInfer availability checks across the codebase.

tensorrt_llm/_torch/attention_backend/utils.py (1)

4-4: LGTM! Import refactored to use centralized FlashInfer utils.

The import has been correctly updated to source IS_FLASHINFER_AVAILABLE from the new centralized flashinfer_utils module, maintaining consistency with the broader refactoring effort.

tensorrt_llm/_torch/modules/rms_norm.py (1)

22-22: LGTM! Import source updated to centralized FlashInfer utilities.

The import change correctly aligns with the centralization of FlashInfer availability checks. The module now uses the shared flashinfer_utils module as the source of truth for IS_FLASHINFER_AVAILABLE.

tensorrt_llm/_torch/custom_ops/__init__.py (2)

1-2: LGTM! Import updates align with centralization and API expansion.

The changes correctly:

Import IS_FLASHINFER_AVAILABLE from the centralized flashinfer_utils module

Import the new attention custom ops from the modules layer

This maintains consistency with the broader refactoring effort while expanding the public API.

16-17: LGTM! Public API expanded with new attention operations.

The addition of attn_custom_op_inplace and mla_custom_op_inplace to the __all__ list properly exposes these operations as part of the public API.

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (3)

3-3: LGTM! Import updated to use centralized FlashInfer utilities.

The import correctly sources both ENABLE_PDL and IS_FLASHINFER_AVAILABLE from the new centralized flashinfer_utils module, replacing the previous platform-specific availability checks.

8-8: LGTM! Import follows centralized gating pattern.

The import of apply_rope_with_cos_sin_cache_inplace is now properly guarded by the centralized IS_FLASHINFER_AVAILABLE flag, ensuring consistent availability checking across the codebase.

48-48: LGTM! Function call updated to use imported wrapper.

The call has been correctly updated to use the imported apply_rope_with_cos_sin_cache_inplace function instead of the fully qualified module path, which aligns with the new import structure.

tensorrt_llm/_torch/flashinfer_utils.py

tburt-nv

Only reviewing CODEOWNERs here, looks good to me.

tensorrt-cicd · 2025-08-13T20:15:47Z

PR_Github #15152 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11443 completed with status: 'FAILURE'

Funatiq · 2025-08-13T20:20:15Z

/bot run

tensorrt-cicd · 2025-08-13T20:25:49Z

PR_Github #15180 [ run ] triggered by Bot

Funatiq · 2025-08-19T13:12:00Z

/bot run

tensorrt-cicd · 2025-08-19T13:17:42Z

PR_Github #15789 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-19T15:43:53Z

PR_Github #15789 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11867 completed with status: 'FAILURE'

- Moved all unittest files to sub-directories. - Updated test lists in YAML files to include new test paths. This commit improves the testing framework for unittest files, ensuring comprehensive validation of the new implementations. Signed-off-by: Robin Kobus <[email protected]>

Signed-off-by: Robin Kobus <[email protected]>

- Included `attn_custom_op_inplace` and `mla_custom_op_inplace` in the `__all__` list of the custom ops module. - This change fixes unittest/_torch/compilation/test_add_norm.py::test_add_norm_fusion. Signed-off-by: Robin Kobus <[email protected]>

- Added `flashinfer_utils.py` to manage flashinfer availability and PDL environment variable. - Updated imports across multiple modules to source `IS_FLASHINFER_AVAILABLE` and `ENABLE_PDL` from the new utility file. - This refactor improves code organization and maintains clarity regarding flashinfer's integration. - This change fixes circular import issues in tensorrt_llm/_torch/attention_backend/__init__.py. Signed-off-by: Robin Kobus <[email protected]>

Signed-off-by: Robin Kobus <[email protected]>

…ixture - Removed the test_case fixture and its associated JSON loading, simplifying the test setup. - Hardcoded sampling parameters for clarity and consistency in the test execution. Signed-off-by: Robin Kobus <[email protected]>

- Changed paths for `test_custom_ops.py` and `test_autotuner.py` to their new sub-directory locations. - This update ensures proper ownership and visibility for the relocated test files. Signed-off-by: Robin Kobus <[email protected]>

- Added new test duration entries for various unittest cases in the `_torch` module. - Adjusted existing entries for clarity and consistency in naming conventions. - This update enhances the tracking of test performance metrics. Signed-off-by: Robin Kobus <[email protected]>

Funatiq · 2025-08-19T18:52:47Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-19T18:57:52Z

PR_Github #15817 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/custom_ops/__init__.py (1)
20-31: Ensure tests guard against missing flashinfer support

No circular imports involving modules.attention were found. However, the test in tests/unittest/_torch/compilation/test_add_norm.py unconditionally does:
from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm
This will break when IS_FLASHINFER_AVAILABLE=False. Please update the test to skip or guard this import, for example:

• File: tests/unittest/_torch/compilation/test_add_norm.py (line 5)
Replace:
- from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm
With something like:
+ import pytest
+ pytest.importorskip("tensorrt_llm._torch.custom_ops", reason="requires flashinfer support")
+ from tensorrt_llm._torch.custom_ops import flashinfer_rmsnorm
This ensures tests pass regardless of flashinfer availability.

♻️ Duplicate comments (2)

.github/CODEOWNERS (1)
53-54: Prefer directory-level CODEOWNERS to cover all thop/misc tests and future additions

The path updates are correct after the test move. To avoid having to maintain per-file entries and to ensure all existing and future tests under these subdirectories are owned, replace the two file-level rules with directory-level rules.

Proposed diff:
-/tests/unittest/_torch/thop/test_custom_ops.py @NVIDIA/trt-llm-torch-graph-compiler
-/tests/unittest/_torch/misc/test_autotuner.py @NVIDIA/trt-llm-torch-graph-compiler
+/tests/unittest/_torch/thop/ @NVIDIA/trt-llm-torch-graph-compiler
+/tests/unittest/_torch/misc/ @NVIDIA/trt-llm-torch-graph-compiler
Optional: run this script to confirm there are multiple tests under these dirs (justifying directory-level ownership) and to show current CODEOWNERS matches:
#!/bin/bash
set -euo pipefail

echo "Counting tests under thop/ and misc/:"
thop_count=$(fd -t f -e py . tests/unittest/_torch/thop 2>/dev/null | wc -l | tr -d ' ')
misc_count=$(fd -t f -e py . tests/unittest/_torch/misc 2>/dev/null | wc -l | tr -d ' ')
echo "thop: ${thop_count} files"
echo "misc: ${misc_count} files"
echo

echo "Listing thop tests:"
fd -t f -e py . tests/unittest/_torch/thop 2>/dev/null || true
echo

echo "Listing misc tests:"
fd -t f -e py . tests/unittest/_torch/misc 2>/dev/null || true
echo

echo "Current CODEOWNERS entries for thop/misc:"
rg -nE '^\s*/tests/unittest/_torch/(thop|misc)(/|/.*|\.|$)' .github/CODEOWNERS || true
tests/integration/test_lists/test-db/l0_h100.yml (1)
28-28: Quote entire YAML scalar when embedding quotes in -k filters

Safer across YAML parsers and matches prior guidance. Use outer double-quotes and inner single-quotes for the -k expression.
-  - unittest/_torch/modeling -k "modeling_llama"
+  - "unittest/_torch/modeling -k 'modeling_llama'"
Consider applying the same quoting pattern to the adjacent modeling filter lines for consistency.

🧹 Nitpick comments (4)

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (4)

10-10: Fix wording/typo in comment (“Warp” → “Wrap”).

Minor clarity improvement in a user-facing code comment.

-    # Warp this into custom op since flashinfer didn't warp it properly and we want to avoid graph break between mlp layer for user buffer optimization
+    # Wrap this into a custom op since FlashInfer didn't expose it properly, and we want to avoid graph breaks between MLP layers for user-buffer optimization

19-19: Fix wording and grammar in comment.

Minor clarity/grammar improvements.

-    # Warp this into custom op since flashinfer provides default value for eps with would produce two different graphs depends on the eps value.
+    # Wrap this into a custom op since FlashInfer provides a default value for eps,
+    # which would produce two different graphs depending on the eps value.

30-36: Provide a fake (meta) kernel for fused_add_rmsnorm.

Without a registered fake, torch.compile/meta passes may fail when encountering this custom op. Add a lightweight validator that returns None and checks shapes.

     @torch.library.custom_op("trtllm::flashinfer_fused_add_rmsnorm",
                              mutates_args=("input", "residual"))
     def flashinfer_fused_add_rmsnorm(input: torch.Tensor,
                                      residual: torch.Tensor,
                                      weight: torch.Tensor, eps: float) -> None:
         fused_add_rmsnorm(input, residual, weight, eps, enable_pdl=ENABLE_PDL)
+
+    @flashinfer_fused_add_rmsnorm.register_fake
+    def _(input: torch.Tensor,
+          residual: torch.Tensor,
+          weight: torch.Tensor,
+          eps: float) -> None:
+        # Meta: validate shapes/dtypes; op is in-place and returns None.
+        assert input.shape == residual.shape, "input and residual must have the same shape"
+        assert weight.dim() == 1 and weight.numel() == input.shape[-1], \
+            "weight must be 1D and match the last dimension of input"
+        return

57-66: Strengthen the fake (meta) kernel for ROPE in-place op.

Add shape checks to catch misuses during tracing/compile; keep return type None for in-place semantics.

-    @flashinfer_apply_rope_with_cos_sin_cache_inplace.register_fake
-    def _(
-        positions: torch.Tensor,
-        query: torch.Tensor,
-        key: torch.Tensor,
-        head_size: int,
-        cos_sin_cache: torch.Tensor,
-        is_neox: bool = True,
-    ):
-        return
+    @flashinfer_apply_rope_with_cos_sin_cache_inplace.register_fake
+    def _(
+        positions: torch.Tensor,
+        query: torch.Tensor,
+        key: torch.Tensor,
+        head_size: int,
+        cos_sin_cache: torch.Tensor,
+        is_neox: bool = True,
+    ) -> None:
+        # Meta: validate shapes; op is in-place on query/key.
+        assert query.shape == key.shape, "query and key must have the same shape"
+        assert cos_sin_cache.dim() >= 2, "cos_sin_cache expected to have at least 2 dims"
+        # positions is typically [B, T] or [T]; rely on runtime for exact layout.
+        return

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c691c0b and 52eee4b.

📒 Files selected for processing (16)

.github/CODEOWNERS (1 hunks)
tensorrt_llm/_torch/attention_backend/__init__.py (1 hunks)
tensorrt_llm/_torch/attention_backend/utils.py (1 hunks)
tensorrt_llm/_torch/custom_ops/__init__.py (2 hunks)
tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (2 hunks)
tensorrt_llm/_torch/flashinfer_utils.py (1 hunks)
tensorrt_llm/_torch/modules/rms_norm.py (1 hunks)
tensorrt_llm/_torch/modules/rotary_embedding.py (1 hunks)
tests/integration/defs/.test_durations (3 hunks)
tests/integration/test_lists/test-db/l0_a30.yml (1 hunks)
tests/integration/test_lists/test-db/l0_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_gb202.yml (1 hunks)
tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml (1 hunks)
tests/unittest/_torch/executor/test_resource_manager.py (2 hunks)
tests/unittest/_torch/sampler/test_trtllm_sampler.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (12)

tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml
tensorrt_llm/_torch/attention_backend/init.py
tensorrt_llm/_torch/flashinfer_utils.py
tensorrt_llm/_torch/attention_backend/utils.py
tests/integration/test_lists/test-db/l0_a30.yml
tests/unittest/_torch/executor/test_resource_manager.py
tensorrt_llm/_torch/modules/rms_norm.py
tests/integration/defs/.test_durations
tests/integration/test_lists/test-db/l0_b200.yml
tensorrt_llm/_torch/modules/rotary_embedding.py
tests/integration/test_lists/test-db/l0_gb202.yml
tests/unittest/_torch/sampler/test_trtllm_sampler.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
tensorrt_llm/_torch/custom_ops/__init__.py

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py
tensorrt_llm/_torch/custom_ops/__init__.py

🧠 Learnings (4)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

.github/CODEOWNERS
tests/integration/test_lists/test-db/l0_h100.yml

📚 Learning: 2025-08-06T13:58:07.506Z

Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

.github/CODEOWNERS

📚 Learning: 2025-08-11T20:09:24.389Z

Learnt from: achartier
PR: NVIDIA/TensorRT-LLM#6763
File: tests/integration/defs/triton_server/conftest.py:16-22
Timestamp: 2025-08-11T20:09:24.389Z
Learning: In the TensorRT-LLM test infrastructure, the team prefers simple, direct solutions (like hard-coding directory traversal counts) over more complex but robust approaches when dealing with stable directory structures. They accept the maintenance cost of updating tests if the layout changes.

Applied to files:

.github/CODEOWNERS

📚 Learning: 2025-08-14T15:43:23.067Z

Learnt from: MatthiasKohl
PR: NVIDIA/TensorRT-LLM#6904
File: tensorrt_llm/_torch/attention_backend/trtllm.py:259-262
Timestamp: 2025-08-14T15:43:23.067Z
Learning: In TensorRT-LLM's attention backend, tensor parameters in the plan() method are assigned directly without validation (dtype, device, contiguity checks). This maintains consistency across all tensor inputs and follows the pattern of trusting callers to provide correctly formatted tensors.

Applied to files:

.github/CODEOWNERS

🧬 Code Graph Analysis (2)

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (1)

tensorrt_llm/_torch/modules/swiglu.py (1)

silu_and_mul (54-79)

tensorrt_llm/_torch/custom_ops/__init__.py (1)

tensorrt_llm/_torch/modules/attention.py (2)

attn_custom_op_inplace (72-98)

mla_custom_op_inplace (534-541)

🪛 Gitleaks (8.27.2)

tests/integration/test_lists/test-db/l0_h100.yml

27-27: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (6)

tests/integration/test_lists/test-db/l0_h100.yml (2)

17-26: Verified: PyTorch sub-directory buckets exist

The sanity check confirms that all directories listed in tests/integration/test_lists/test-db/l0_h100.yml (lines 17–26) under tests/unittest/_torch/ are present. CI path-not-found failures should be resolved. Approved.

33-33: Verified: Test path and symbol exist

The test file and function have been confirmed:

tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py::test_trtllm_bench_backend_comparison

No further action required.

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py (3)

5-9: Centralized FlashInfer gating looks good.

Import-time side effects are minimized and the availability check is now unified via flashinfer_utils. This will simplify maintenance and avoid noisy ImportErrors on unsupported platforms.

37-55: ROPE wrapper: interface alignment looks correct.

Signature and mutates_args align with the underlying in-place behavior; passthrough of is_neox and head_size is consistent with typical FlashInfer ROPE APIs.

41-47: is_neox default is never relied upon—call sites pass it explicitly
The only invocation of flashinfer_apply_rope_with_cos_sin_cache_inplace in rotary_embedding.py supplies self.is_neox as the sixth argument, so the wrapper’s default (True) is never used. No positional shifts or default mismatches will occur.

tensorrt_llm/_torch/custom_ops/__init__.py (1)

16-17: Good addition to public API.

Exposing attn_custom_op_inplace and mla_custom_op_inplace here improves discoverability and keeps users on the stable import path.

tensorrt_llm/_torch/custom_ops/__init__.py

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py

tensorrt-cicd · 2025-08-20T09:42:06Z

PR_Github #15817 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11889 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Robin Kobus <[email protected]> Signed-off-by: Yuxin <[email protected]>

Funatiq requested a review from omera-nv August 5, 2025 15:02

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

tests/integration/test_lists/test-db/l0_b200.yml Show resolved Hide resolved

tests/integration/test_lists/test-db/l0_h100.yml Show resolved Hide resolved

tests/integration/test_lists/test-db/l0_rtx_pro_6000.yml Show resolved Hide resolved

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 1c7edd3 to c1e67e4 Compare August 6, 2025 08:46

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from c1e67e4 to 5072b39 Compare August 7, 2025 10:47

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 45208ff to 1be2649 Compare August 7, 2025 15:23

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 1be2649 to ccb6a05 Compare August 8, 2025 08:38

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from ccb6a05 to a3a1d2a Compare August 11, 2025 08:22

Funatiq marked this pull request as ready for review August 11, 2025 08:25

Funatiq requested review from a team as code owners August 11, 2025 08:25

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

tensorrt_llm/_torch/flashinfer_utils.py Show resolved Hide resolved

tensorrt_llm/_torch/flashinfer_utils.py Show resolved Hide resolved

chzblych approved these changes Aug 13, 2025

View reviewed changes

tburt-nv approved these changes Aug 13, 2025

View reviewed changes

liji-nv approved these changes Aug 14, 2025

View reviewed changes

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from 59da3ea to c691c0b Compare August 19, 2025 13:11

Funatiq added 8 commits August 19, 2025 18:52

ci: remove duplicated tests

679654c

Signed-off-by: Robin Kobus <[email protected]>

fixup! [refactor] move unittests to sub-directories

ae9261c

Signed-off-by: Robin Kobus <[email protected]>

Funatiq force-pushed the dev/ci/restructure_unit_tests branch from c691c0b to 52eee4b Compare August 19, 2025 18:52

coderabbitai bot reviewed Aug 19, 2025

View reviewed changes

tensorrt_llm/_torch/custom_ops/__init__.py Show resolved Hide resolved

tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py Show resolved Hide resolved

Funatiq enabled auto-merge (squash) August 20, 2025 06:54

Funatiq merged commit b95cab2 into NVIDIA:main Aug 20, 2025
5 checks passed

Funatiq deleted the dev/ci/restructure_unit_tests branch August 20, 2025 10:12

zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Aug 21, 2025

[None][ci] move unittests to sub-directories (NVIDIA#6635)

d271157

Signed-off-by: Robin Kobus <[email protected]> Signed-off-by: Yuxin <[email protected]>

coderabbitai bot mentioned this pull request Aug 26, 2025

[TRTLLM-7207][feat] Chat completions API for gpt-oss #7261

Merged

[None][ci] move unittests to sub-directories #6635

[None][ci] move unittests to sub-directories #6635

Uh oh!

Conversation

Funatiq commented Aug 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Funatiq commented Aug 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

Funatiq commented Aug 6, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

Uh oh!

Funatiq commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

Funatiq commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

Funatiq commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

Funatiq commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 9, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tburt-nv left a comment

Choose a reason for hiding this comment

Funatiq commented Aug 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 5, 2025 •

edited

Loading