-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[None][chore] Use llm args in create_py_executor #7239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[None][chore] Use llm args in create_py_executor #7239
Conversation
📝 WalkthroughWalkthroughExecutor creation was refactored to accept LLM-args objects (BaseLlmArgs/TorchLlmArgs/ADLllmArgs) across worker, autodeploy, and PyTorch paths; max_seq_len is computed from the model engine and propagated through sampler creation into PyExecutor; several public signatures were updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Worker as GenerationExecutorWorker
participant Creator as py_executor_creator.create_py_executor
participant Util as _util.create_py_executor_instance
participant PyEx as PyExecutor
participant AD as AutoDeploy
Client->>Worker: start(llm_args: BaseLlmArgs, ...)
alt llm_args.backend == "pytorch"
Worker->>Creator: create_py_executor(llm_args, checkpoint_dir, tokenizer, lora_config, logits_post_processor_config, parallel_config)
note right of Creator #E6F4EA: build executor_config from llm_args\ncompute max_seq_len
Creator->>Util: create_py_executor_instance(..., max_seq_len)
Util->>PyEx: __init__(..., max_seq_len)
PyEx-->>Creator: PyExecutor instance
Creator-->>Worker: PyExecutor
else llm_args.backend == "autodeploy"
Worker->>Worker: ad_config = llm_args.to_ad_config()
Worker->>AD: create_autodeploy_executor(ad_config)
note right of AD #FFF4E6: uses ad_config directly (signature change)
AD-->>Worker: Executor
end
Worker-->>Client: ready
sequenceDiagram
autonumber
participant Engine as ModelEngine
participant Util as _util.instantiate_sampler
participant Sampler as TorchSampler
Engine->>Util: instantiate_sampler(engine, pytorch_backend_config, ...)
note right of Util #E8F0FF: create_torch_sampler_args(max_seq_len=engine.max_seq_len, enable_mixed_sampler=cfg.enable_mixed_sampler)
Util->>Sampler: Args(max_seq_len, ...)
Sampler-->>Util: Sampler instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
1-1
: Missing NVIDIA copyright header.Per repo guidelines, prepend the NVIDIA copyright header (2025) to all Python sources.
I can generate a standardized header if you confirm the preferred form.
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
1-1
: Missing NVIDIA copyright header.Please add the 2025 NVIDIA copyright header at the top.
tensorrt_llm/executor/executor.py (1)
1-1
: Missing NVIDIA copyright header.Please prepend the standard 2025 NVIDIA header.
tensorrt_llm/_torch/pyexecutor/_util.py (1)
1-1
: Missing NVIDIA copyright header.Please add the 2025 NVIDIA header.
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
1-1
: Missing NVIDIA copyright header.Please add the 2025 NVIDIA header.
tensorrt_llm/executor/worker.py (1)
1-1
: Add NVIDIA copyright header (2025) per repo guidelinesThis source file is missing the required NVIDIA copyright header.
Apply at the very top:
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🧹 Nitpick comments (15)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (4)
261-266
: Update docstring to match new signature (no checkpoint_dir).Docstring still mentions "checkpoint directory" even though the function now only accepts ad_config: LlmArgs.
Apply this diff to align the docstring:
-def create_autodeploy_executor(ad_config: LlmArgs): - """Create an AutoDeploy executor from the given configuration and checkpoint directory. - - This is the entrypoint API to the _autodeploy backend. - """ +def create_autodeploy_executor(ad_config: LlmArgs): + """Create an AutoDeploy executor from the given LlmArgs configuration. + + This is the entrypoint API to the _autodeploy backend. + """
276-279
: Assertion message is stale; prefer explicit type check error.The message references "pytorch_backend_config" but the parameter is now ad_config. Consider raising a TypeError for clarity.
Apply this diff:
- msg = "pytorch_backend_config must be an AD LlmArgs object" - assert isinstance(ad_config, LlmArgs), msg + if not isinstance(ad_config, LlmArgs): + raise TypeError("ad_config must be an AutoDeploy LlmArgs object")
330-343
: Propagate garbage_collection_gen0_threshold and max_seq_len to PyExecutor.The PyExecutor now accepts max_seq_len and garbage_collection_gen0_threshold. In the Torch path (_util.create_py_executor_instance), both are passed; mirror that here for consistency and to avoid silently using defaults.
Apply this diff:
py_executor = PyExecutor( resource_manager, scheduler, model_engine=engine, sampler=sampler, dist=mpi_dist, max_num_sequences=max_num_sequences, disable_overlap_scheduler=ad_config.disable_overlap_scheduler, max_input_len=ad_config.max_input_len, max_batch_size=ad_config.max_batch_size, max_draft_len=max_draft_len, max_beam_width=ad_config.max_beam_width, + garbage_collection_gen0_threshold=ad_config.garbage_collection_gen0_threshold, + max_seq_len=ad_config.max_seq_len, )
267-275
: Optional: Avoid calling get_free_port() on every rank.Currently, all ranks call dist.get_free_port() and then broadcast. Minor improvement: only rank 0 should acquire the port; other ranks pass a placeholder to broadcast.
Example:
port = dist.get_free_port() if rank == 0 else None port = mpi_dist.broadcast(port, root=0)tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
156-158
: New max_seq_len argument: document intent and usage.The constructor stores max_seq_len but does not reference it elsewhere. If this is intentionally for external consumers (e.g., tooling/tests), add a short docstring note; otherwise, consider wiring it to where sequence-length decisions are made in executor code.
Would you like me to draft a concise constructor docstring including the new parameter?
tensorrt_llm/_torch/pyexecutor/_util.py (2)
1-20
: Duplicate/ambiguous imports (minor cleanup).ModelConfig is imported twice (tensorrt_llm._torch.model_config and ..model_config). Also, is_mla is imported from both ._util and .config_utils in other files. Consider deduplicating to avoid confusion.
592-601
: Confirm max_seq_len semantics in instantiate_samplerAudit of the two sampler‐instantiation paths reveals a divergence:
- In the AutoDeploy shim (tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py:321–323), the sampler is created with
max_seq_len=ad_config.max_seq_len- In the PyExecutor util (tensorrt_llm/_torch/pyexecutor/_util.py:592–601), it uses
max_seq_len=engine.max_seq_lenSince
create_py_executor
adjustsexecutor_config.max_seq_len
(adding 1 for overlap scheduling andmax_draft_len
for speculative decoding), you’ll want to verify whether the sampler should use that inflated limit rather than the rawengine.max_seq_len
.If the sampler must account for those extra tokens, update the util path to pass
executor_config.max_seq_len
; otherwise, leave it usingengine.max_seq_len
as originally intended.• File: tensorrt_llm/_torch/pyexecutor/_util.py, lines 592–601
• Suggested diff:- sampler_args = create_torch_sampler_args( - executor_config, - mapping, - max_seq_len=engine.max_seq_len, + sampler_args = create_torch_sampler_args( + executor_config, + mapping, + max_seq_len=executor_config.max_seq_len, enable_mixed_sampler=pytorch_backend_config.enable_mixed_sampler)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)
210-217
: Public API shift to llm_args is clear; consider adding a short docstring.Given the signature change, a brief docstring noting that executor_config is now derived from llm_args would help downstream users.
I can draft a Google-style docstring if helpful.
13-20
: Imports: ensure minimal necessary and consistent types.Adding LogitsPostProcessorConfig and ParallelConfig is fine. Minor nit: in this file, is_mla is imported twice (from ._util and .config_utils) elsewhere—dedupe when convenient.
tensorrt_llm/executor/worker.py (6)
93-99
: Type hint for engine contradicts runtime handling of listRuntime accepts a list of engines (per-rank), but the type hint is Union[Path, Engine]. Consider reflecting this in the signature to avoid confusion for callers and static analyzers.
Apply:
- engine: Union[Path, Engine], + engine: Union[Path, Engine, List[Union[Path, Engine]]],
96-104
: Guard against zero CUDA devices to avoid ZeroDivisionErrordevice_id modulo torch.cuda.device_count() will crash on systems without visible GPUs (e.g., CI, CPU-only dev nodes).
Apply:
- def _get_comm_ranks_device_id(): - device_id = self.global_rank % torch.cuda.device_count() + def _get_comm_ranks_device_id(): + num_devices = torch.cuda.device_count() + if num_devices <= 0: + raise RuntimeError("CUDA device not found. Ensure GPUs are visible before initializing executor.") + device_id = self.global_rank % num_devices torch.cuda.set_device(device_id)
107-115
: Replace asserts with explicit exceptions (production code path) and fix long-line lintAsserts may be stripped with -O and are not ideal for input validation; also flagged by E501 later. Use explicit checks and raise a clear exception.
Apply:
- assert hasattr( - self.llm_args, "backend" - ), "llm_args should be with backend in _create_py_executor" + if not hasattr(self.llm_args, "backend"): + raise ValueError("llm_args must define a 'backend' attribute in _create_py_executor()") @@ - else: - raise ValueError( - f"Unsupported backend config: {self.llm_args.backend}") + else: + raise ValueError(f"Unsupported backend config: {self.llm_args.backend}")Also applies to: 131-135
126-133
: Type check for AutoDeploy args should be explicit exception (not assert)Avoid assert for runtime type-checking; raise TypeError with a concise message.
Apply:
- assert isinstance(self.llm_args, ADLlmArgs) - args["ad_config"] = self.llm_args.get_pytorch_backend_config() + if not isinstance(self.llm_args, ADLlmArgs): + raise TypeError("For backend '_autodeploy', llm_args must be an instance of AutoDeploy LlmArgs.") + args["ad_config"] = self.llm_args.get_pytorch_backend_config()
653-666
: Use explicit exception instead of assert during shutdown; also resolves long line lintAsserts may be stripped at runtime; make the contract explicit and fix E501 on line 654.
Apply:
- if self.llm_args is not None: - assert self._executor_config is None, "An empty executor_config is expected in shutdown when LLM arguments are defined." + if self.llm_args is not None: + if self._executor_config is not None: + raise RuntimeError("executor_config must be None during shutdown when LLM arguments are defined.") if (self.llm_args.backend == "pytorch" and hasattr(self, "checkpoint_loader") and self.checkpoint_loader is not None): self.checkpoint_loader.cleanup() self.checkpoint_loader = None
694-712
: Consider documenting new parameters (llm_args) in public interfacesworker_main is a public entrypoint. Add a concise Google-style docstring describing llm_args expectations and backend behavior.
Example:
def worker_main( @@ - llm_args: Optional[BaseLlmArgs] = None, + llm_args: Optional[BaseLlmArgs] = None, ) -> None: + """Worker process main loop. + + Args: + engine: Engine path or Engine object for TRT-LLM runtime. + worker_queues: IPC addresses for worker-proxy communication. + log_level: Logging level string for the leader process. + executor_config: Legacy executor config; must be None when llm_args is provided. + batched_logits_processor: Optional batched logits processor. + worker_cls: Worker class to instantiate. + tracer_init_kwargs: Optional VizTracer initialization kwargs. + _torch_model_class_mapping: Optional mapping to augment model registry. + postproc_worker_config: Post-processing worker settings. + ready_signal: Message to send on successful init. + is_llm_executor: True if this worker handles request submission. + lora_config: LoRA configuration when enabling LoRA. + hf_model_dir: HuggingFace model directory for checkpoints. + tokenizer: Tokenizer instance. + llm_args: BaseLlmArgs-derived configuration. When provided, selects the backend + via llm_args.backend ('pytorch' or '_autodeploy') and must be used instead + of executor_config. + """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
(4 hunks)tensorrt_llm/executor/executor.py
(2 hunks)tensorrt_llm/executor/worker.py
(8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Code must target Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Preserve module namespaces when importing; import modules/packages and access members via the module (e.g., from package.subpackage import foo; foo.SomeClass())
Python file names should be snake_case
Python class names should be PascalCase
Python functions/methods and local variables should be snake_case; variables beginning with a number should be prefixed with k_ (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE prefixed with G_ (e.g., G_MY_GLOBAL); constants should be UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; comments should be reserved for in-function or file-local interfaces
Use Google-style docstrings for classes and functions; attributes and variables may be documented inline with trailing string literals
Avoid reflection when simpler, explicit code suffices (e.g., avoid dict(**locals()) patterns)
In try/except, catch the narrowest exceptions possible
For duck-typing patterns, keep the try body minimal and move logic to else to avoid masking unrelated failures
Files:
tensorrt_llm/executor/executor.py
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/_util.py
**/*.{c,cc,cpp,cxx,h,hh,hpp,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA copyright header (current year) to all source files (.cpp, .h, .cu, .py, etc.)
Files:
tensorrt_llm/executor/executor.py
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/_util.py
🧠 Learnings (1)
📚 Learning: 2025-08-20T06:56:02.889Z
Learnt from: eopXD
PR: NVIDIA/TensorRT-LLM#6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:577-579
Timestamp: 2025-08-20T06:56:02.889Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, maxSequenceLength is now enforced as a non-optional argument in the BlockManager constructor, so concerns about std::nullopt defaulting to 0 are not applicable. When windowSize > maxSequenceLength, a warning should be added instead of handling optional parameter cases.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/_util.py
🧬 Code graph analysis (5)
tensorrt_llm/executor/executor.py (1)
tensorrt_llm/llmapi/llm_args.py (1)
BaseLlmArgs
(1136-1913)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs
(237-351)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
tensorrt_llm/_torch/attention_backend/trtllm.py (2)
max_seq_len
(558-568)max_seq_len
(571-575)
tensorrt_llm/executor/worker.py (5)
tensorrt_llm/llmapi/llm_args.py (5)
BaseLlmArgs
(1136-1913)TorchLlmArgs
(2122-2544)get_pytorch_backend_config
(2479-2544)parallel_config
(1356-1357)to_mapping
(282-294)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
create_py_executor
(210-493)tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs
(237-351)tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
create_autodeploy_executor
(261-344)tensorrt_llm/_torch/pyexecutor/config.py (1)
_construct_checkpoint_loader
(175-197)
tensorrt_llm/_torch/pyexecutor/_util.py (2)
tensorrt_llm/_torch/attention_backend/trtllm.py (2)
max_seq_len
(558-568)max_seq_len
(571-575)tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
PyExecutor
(138-1784)
🪛 Ruff (0.12.2)
tensorrt_llm/executor/worker.py
494-494: Line too long (140 > 120)
(E501)
654-654: Line too long (136 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (17)
tensorrt_llm/executor/executor.py (1)
24-24
: Verified downstream compatibility for BaseLlmArgs
- In executor.py, the
create_executor
factory (lines 362–364) now takesllm_args: Optional[BaseLlmArgs]
as expected.- In worker.py,
GenerationExecutorWorker
’s constructor signature (lines 64–67) has been updated to acceptBaseLlmArgs
.- In proxy.py, all accesses to
worker_kwargs["llm_args"]
remain valid against theBaseLlmArgs
API (e.g.garbage_collection_gen0_threshold
).No further changes required—everything downstream already supports
BaseLlmArgs
.tensorrt_llm/_torch/pyexecutor/_util.py (3)
426-428
: Forwarding max_seq_len into create_py_executor_instance is correct.Good addition to keep PyExecutor aware of the configured sequence length; aligns with the new constructor contract.
574-576
: PyExecutor construction passes new knobs as intended.garbage_collection_gen0_threshold and max_seq_len are forwarded—consistent with PR goals. LGTM.
578-590
: Allcreate_torch_sampler_args
calls use keyword-only arguments
I’ve confirmed that the only invocation ofcreate_torch_sampler_args
sits at line 596 intensorrt_llm/_torch/pyexecutor/_util.py
and explicitly passes bothmax_seq_len=
andenable_mixed_sampler=
. There are no other call sites in the repo, so this API tightening is safe and won’t break positional usages.
- Call site:
_util.py:596
uses keyword args for both parameterstensorrt_llm/_torch/pyexecutor/py_executor_creator.py (4)
219-224
: Wiring logits_post_processor_config and parallel_config through executor_config looks correct.No functional issues spotted.
299-313
: max_seq_len plumbing: matches design described in PR summary.The computed (possibly inflated) max_seq_len is stored in executor_config and used downstream. Good.
443-444
: Pass-through of computed max_seq_len to create_py_executor_instance is correct.Both the first and the post-estimation code paths correctly forward max_seq_len.
Also applies to: 487-488
210-217
: Verify all callers use the newcreate_py_executor
signatureThe signature of
create_py_executor
in
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
(now takingllm_args
first and no longer acceptingexecutor_config
) has changed. Please audit and update every call site:• tensorrt_llm/executor/worker.py
– Implementation of_create_py_executor
(around line 105): ensure it callscreate_py_executor(llm_args, …)
rather than passing anexecutor_config
directly.
– Invocation at line 175: replace any use of_create_py_executor()
that relies on the old parameter order.• Search for any other calls to
create_py_executor(
across the codebase to confirm none still passexecutor_config
first.tensorrt_llm/executor/worker.py (9)
21-21
: Import set looks correct for new LLM-args pathwayImporting BaseLlmArgs and TorchLlmArgs is appropriate for the mixed backend code paths; PybindMirror is used later for lookahead_config. No action needed.
66-66
: Constructor now accepts BaseLlmArgs — good generalizationBroader type improves reuse across backends and keeps worker decoupled from Torch-specific args.
105-153
: LLM-args-based executor creation is well structuredClean backend dispatch to PyTorch and AutoDeploy, proper propagation of tokenizer/lora/logits config, and capturing mapping/checkpoint_loader for downstream use all look solid.
137-146
: Nice: Prepare mapping/checkpoint_loader and capture max_seq_len from executorStoring mapping and harmonizing max_seq_len with model engine avoids duplicated inference elsewhere. Good forward-compat hook.
Also applies to: 148-152
175-177
: Engine selection logic is clear and preserves legacy pathUsing llm_args gate to choose PyExecutor vs TRT C++ executor keeps behavior consistent.
199-211
: LoRA integration for PyTorch backend is correctFetching PEFT cache manager from resource manager and wiring through LoraManager is appropriate. Asserts presence of lora_model_config, which is fine given guarded path.
522-528
: Forwarding llm_args into _deduce_max_tokens is correctEnsures new pathway drives token deduction. Good.
494-494
: Resolve Ruff E501 (line too long)Long assert/message lines are flagged. The diffs above replace asserts and wrap messages, addressing these lint errors.
Also applies to: 654-654
695-713
: Review Python version requirements and PEP 604 union usageIt looks like you’ve applied PEP 604 unions (e.g.
Path | Engine
) in many places across the repo. That syntax is only valid under Python 3.10+ grammar, but the project’s setup still indicates Python 3.8+ support. Please verify which path you intend and adjust accordingly:• If you are now targeting Python 3.10 or newer, update your
python_requires
insetup.py
to at least">=3.10"
so the use ofA | B
parses correctly.
• If you must maintain Python 3.8 compatibility, you’ll need to replace all occurrences of PEP 604 unions withtyping.Union[...]
(orOptional[...]
) project-wide—fixing only the one signature inworker.py
won’t be sufficient.In
tensorrt_llm/executor/worker.py
(lines 695–713), for example, you could apply:- def worker_main( - engine: Path | Engine, + def worker_main( + engine: Union[Path, Engine], worker_queues: WorkerCommIpcAddrs, log_level: str, executor_config: Optional[tllm.ExecutorConfig] = None, … ) -> None: mpi_comm().barrier()But please confirm your overall strategy—either bump the minimum Python version in
setup.py
or convert all PEP 604 unions across the codebase.
/bot run |
PR_Github #16515 [ run ] triggered by Bot |
PR_Github #16515 [ run ] completed with state |
37a6357
to
1c25084
Compare
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tensorrt_llm/executor/worker.py (3)
1-1
: Add NVIDIA copyright header (2025).Required by repo guidelines; missing in this file.
Apply:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
56-67
: Validate mutually exclusive inputs early: llm_args vs executor_config.Fail fast if both are provided; avoids later asserts and ambiguous paths.
def __init__( self, engine: Union[Path, Engine], executor_config: Optional[tllm.ExecutorConfig] = None, @@ tokenizer: Optional[TokenizerBase] = None, llm_args: Optional[BaseLlmArgs] = None, ) -> None: + if llm_args is not None and executor_config is not None: + raise ValueError("Pass either llm_args or executor_config, not both.")
96-104
: Guard against zero CUDA devices in rank/device setup
Checktorch.cuda.device_count()
before using it in a modulo to avoid aZeroDivisionError
and raise a clear error when no GPUs are available:def _get_comm_ranks_device_id(): - device_id = self.global_rank % torch.cuda.device_count() + device_count = torch.cuda.device_count() + if device_count == 0: + raise RuntimeError("No CUDA devices found; a GPU is required to create the executor.") + device_id = self.global_rank % device_count torch.cuda.set_device(device_id) # Make sure C++ executor would use same devices/ranks as py_executor global_rank = global_mpi_rank()tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)
1-1
: Add NVIDIA copyright header (2025).Required by repo guidelines; missing in this file.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
246-247
: Fix logging call: incorrect usage of logger.info causes formatting error.Use %s placeholder; current code passes extra arg without placeholder.
- logger.info("ATTENTION RUNTIME FEATURES: ", attn_runtime_features) + logger.info("ATTENTION RUNTIME FEATURES: %s", attn_runtime_features)
♻️ Duplicate comments (1)
tensorrt_llm/executor/worker.py (1)
482-516
: Fix deduction of max_tokens: handle None max_seq_len, ceil split per CP, and cp_size validity.Prevents arithmetic with None, divide-by-zero, and underestimation due to floor division. Also shortens long lines (Ruff E501).
- def _deduce_max_tokens(request: GenerationRequest, - executor_config: tllm.ExecutorConfig, - llm_args: Optional[BaseLlmArgs] = None) -> int: + def _deduce_max_tokens( + request: GenerationRequest, + executor_config: Optional[tllm.ExecutorConfig], + llm_args: Optional[BaseLlmArgs] = None) -> int: @@ - cp_size = 1 - max_seq_len = None + cp_size = 1 + max_seq_len = None if llm_args is not None: - # deduce max_tokens by llm args - assert executor_config is None, "An empty executor_config in _deduce_max_tokens is expected when LLM arguments are defined." - if hasattr(self, - "mapping") and self.mapping.cp_size is not None: + # deduce max_tokens by llm args + if executor_config is not None: + raise ValueError( + "executor_config must be None when llm_args are provided to _deduce_max_tokens().") + if hasattr(self, "mapping") and getattr(self.mapping, "cp_size", None) is not None: cp_size = self.mapping.cp_size - max_seq_len = getattr(self, "max_seq_len", None) + if not hasattr(self, "max_seq_len") or self.max_seq_len is None: + raise RuntimeError( + "max_tokens is not set and cannot be deduced: " + "max_seq_len unavailable from llm_args/executor.") + max_seq_len = self.max_seq_len else: # deduce max_tokens by executor config - if hasattr(executor_config, "mapping" - ) and executor_config.mapping.cp_size is not None: - cp_size = executor_config.mapping.cp_size - max_seq_len = getattr(executor_config, "max_seq_len", None) - if max_seq_len is None: - logger.warning("`default_max_tokens` cannot be deduced") - if max_tokens is None: - raise ValueError( - "`max_tokens` must be set when `default_max_tokens` cannot be deduced" - ) - else: - # use max_tokens if can't deduce default_max_tokens - return max_tokens - splited_prompt_len = int(len(prompt_token_ids) / cp_size) - default_max_tokens = max_seq_len - splited_prompt_len - query_token_len + if hasattr(executor_config, "mapping") and getattr(executor_config.mapping, "cp_size", None) is not None: + cp_size = executor_config.mapping.cp_size + if not hasattr(executor_config, "max_seq_len") or executor_config.max_seq_len is None: + raise RuntimeError( + "max_tokens is not set and cannot be deduced: " + "executor_config.max_seq_len is missing.") + max_seq_len = executor_config.max_seq_len + if not isinstance(cp_size, int) or cp_size < 1: + raise ValueError(f"Invalid cp_size={cp_size}; expected integer >= 1.") + # ceil(len(prompt_token_ids) / cp_size) + split_prompt_len = (len(prompt_token_ids) + cp_size - 1) // cp_size + default_max_tokens = max_seq_len - split_prompt_len - query_token_len if default_max_tokens <= 0: - logger.warning( - f"`default_max_tokens` ({default_max_tokens}) should be greater than 0, " - f"`default_max_tokens` ({default_max_tokens}) = max_seq_len ({max_seq_len})" - f" - `splited_prompt_len` ({splited_prompt_len}) - `query_token_len` ({query_token_len})" - ) + logger.warning( + "default_max_tokens (%s) <= 0; max_seq_len(%s) - split_prompt_len(%s) - " + "query_token_len(%s).", + default_max_tokens, max_seq_len, split_prompt_len, query_token_len) if max_tokens is None: raise ValueError( "`max_tokens` must be set when `default_max_tokens` is illegal" )
🧹 Nitpick comments (2)
tensorrt_llm/executor/worker.py (1)
199-211
: LoRA setup may NPE if PEFT cache manager missing.Defensive None-check prevents AttributeError when lora_config is given but manager isn’t present.
if self.llm_args and getattr( self.llm_args, "backend", "") == "pytorch" and lora_config is not None: from tensorrt_llm._torch.pyexecutor.resource_manager import \ ResourceManagerType peft_cache_manager = self.engine.resource_manager.resource_managers.get( ResourceManagerType.PEFT_CACHE_MANAGER) - self._lora_manager = LoraManager( - cpp_peft_cache_manager=peft_cache_manager.impl) + if peft_cache_manager is None: + raise RuntimeError("PEFT cache manager not available; cannot enable LoRA.") + self._lora_manager = LoraManager(cpp_peft_cache_manager=peft_cache_manager.impl)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
30-35
: Duplicate import of is_mla; remove one to avoid shadowing/confusion.Keep a single source (config_utils) or _util, not both.
-from ._util import (KvCacheCreator, _adjust_torch_mem_fraction, - create_py_executor_instance, instantiate_sampler, is_mla) +from ._util import (KvCacheCreator, _adjust_torch_mem_fraction, + create_py_executor_instance, instantiate_sampler) from .config_utils import is_mla
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
(4 hunks)tensorrt_llm/executor/executor.py
(2 hunks)tensorrt_llm/executor/worker.py
(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- tensorrt_llm/executor/executor.py
- tensorrt_llm/_torch/pyexecutor/py_executor.py
- tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
- tensorrt_llm/_torch/pyexecutor/_util.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic
Files:
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header with current year to all source files
Files:
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧠 Learnings (2)
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which can contain default `cuda_graph_config` values, so `llm_args` may already have this config before the extra options processing.
Applied to files:
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧬 Code graph analysis (1)
tensorrt_llm/executor/worker.py (5)
tensorrt_llm/llmapi/llm_args.py (5)
BaseLlmArgs
(1172-1949)PybindMirror
(579-723)TorchLlmArgs
(2158-2586)get_pytorch_backend_config
(2521-2586)parallel_config
(1392-1393)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
create_py_executor
(210-493)tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs
(237-351)tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
create_autodeploy_executor
(261-344)tensorrt_llm/_torch/pyexecutor/config.py (1)
_construct_checkpoint_loader
(175-197)
🪛 Ruff (0.12.2)
tensorrt_llm/executor/worker.py
494-494: Line too long (140 > 120)
(E501)
669-669: Line too long (136 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tensorrt_llm/executor/worker.py (4)
137-146
: Checkpoint loader construction: limit to pytorch backend (LGTM), but prefer explicit None check.Current logic is fine; clear separation by backend.
148-152
: Propagate max_seq_len from executor (LGTM).Good: respects engine-updated max_seq_len.
541-543
: Passing llm_args into _deduce_max_tokens (LGTM).Correctly routes through new deduction path.
709-727
: worker_main: new llm_args arg (LGTM).Signature change consistent with worker.init and overall flow.
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (4)
13-20
: Imports extended for new surface (LGTM).New types (LogitsPostProcessorConfig, ParallelConfig, TorchLlmArgs, TokenizerBase) are appropriate here.
210-217
: Public API refactor of create_py_executor (LGTM).Clean switch to llm_args; parameters read clearly reflect the new surface.
219-224
: ExecutorConfig derivation and GC threshold wiring (LGTM).Correctly derives executor_config from llm_args and plumbs additional configs.
443-444
: Pass max_seq_len to create_py_executor_instance (LGTM).Keeps runtime in sync with engine-updated sequence length in both paths.
Also applies to: 487-488
PR_Github #16831 [ run ] triggered by Bot |
PR_Github #16831 [ run ] completed with state |
/bot run |
PR_Github #16838 [ run ] triggered by Bot |
1c25084
to
c11aad7
Compare
/bot run |
PR_Github #16840 [ run ] triggered by Bot |
PR_Github #16838 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
261-279
: Add explicit return type and enforce runtime validation
In tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py:
- Change
todef create_autodeploy_executor(ad_config: LlmArgs):def create_autodeploy_executor(ad_config: LlmArgs) -> PyExecutor:- Replace the type‐check assert
withmsg = "pytorch_backend_config must be an AD LlmArgs object" assert isinstance(ad_config, LlmArgs), msgif not isinstance(ad_config, LlmArgs): raise TypeError("ad_config must be an AutoDeploy LlmArgs instance")- Replace the beam‐width assert
withassert ad_config.max_beam_width <= 1, "_autodeploy + beam_search is not supported"if ad_config.max_beam_width > 1: raise ValueError("beam_search is not supported in AutoDeploy")Optional: to avoid out-of-range GPU errors on multi-node setups, use
torch.cuda.set_device(rank % torch.cuda.device_count())
♻️ Duplicate comments (3)
tensorrt_llm/executor/worker.py (3)
482-536
: Fix max_tokens deduction: None-handling, CP ceil split, and robust validation.Keep logic consistent with llm-args flow; avoid asserts and floor-division underestimation.
- def _deduce_max_tokens(request: GenerationRequest, - executor_config: tllm.ExecutorConfig, - llm_args: Optional[BaseLlmArgs] = None) -> int: + def _deduce_max_tokens( + request: GenerationRequest, + executor_config: Optional[tllm.ExecutorConfig], + llm_args: Optional[BaseLlmArgs] = None) -> int: @@ - cp_size = 1 - max_seq_len = None + cp_size = 1 + max_seq_len = None if llm_args is not None: - # deduce max_tokens by llm args - assert executor_config is None, "An empty executor_config in _deduce_max_tokens is expected when LLM arguments are defined." - if hasattr(self, - "mapping") and self.mapping.cp_size is not None: + # deduce max_tokens by llm args + if executor_config is not None: + raise ValueError("executor_config must be None when llm_args are provided to _deduce_max_tokens().") + if hasattr(self, "mapping") and getattr(self.mapping, "cp_size", None) is not None: cp_size = self.mapping.cp_size - max_seq_len = getattr(self, "max_seq_len", None) + max_seq_len = getattr(self, "max_seq_len", None) else: # deduce max_tokens by executor config - if hasattr(executor_config, "mapping" - ) and executor_config.mapping.cp_size is not None: + if hasattr(executor_config, "mapping") and getattr(executor_config.mapping, "cp_size", None) is not None: cp_size = executor_config.mapping.cp_size - max_seq_len = getattr(executor_config, "max_seq_len", None) - if max_seq_len is None: - logger.warning("`default_max_tokens` cannot be deduced") - if max_tokens is None: - raise ValueError( - "`max_tokens` must be set when `default_max_tokens` cannot be deduced" - ) - else: - # use max_tokens if can't deduce default_max_tokens - return max_tokens - splited_prompt_len = int(len(prompt_token_ids) / cp_size) - default_max_tokens = max_seq_len - splited_prompt_len - query_token_len + max_seq_len = getattr(executor_config, "max_seq_len", None) + if not isinstance(cp_size, int) or cp_size < 1: + raise ValueError(f"Invalid cp_size={cp_size}; expected integer >= 1.") + if max_seq_len is None: + if max_tokens is None: + raise ValueError("max_tokens is required: max_seq_len unavailable to deduce a default.") + return max_tokens + # ceil(len(prompt_token_ids)/cp_size) without importing math + split_prompt_len = (len(prompt_token_ids) + cp_size - 1) // cp_size + default_max_tokens = max_seq_len - split_prompt_len - query_token_len if default_max_tokens <= 0: - logger.warning( - f"`default_max_tokens` ({default_max_tokens}) should be greater than 0, " - f"`default_max_tokens` ({default_max_tokens}) = max_seq_len ({max_seq_len})" - f" - `splited_prompt_len` ({splited_prompt_len}) - `query_token_len` ({query_token_len})" - ) + logger.warning( + f"default_max_tokens ({default_max_tokens}) <= 0; " + f"per-CP prompt length {split_prompt_len} + query length {query_token_len} " + f"exceeds max_seq_len {max_seq_len}." + ) if max_tokens is None: raise ValueError( - "`max_tokens` must be set when `default_max_tokens` is illegal" + "max_tokens must be set when deduced default_max_tokens is invalid." ) # default_max_tokens is the biggest available value if max_tokens is None: return default_max_tokens elif max_tokens > default_max_tokens: logger.warning( - f"User-specified `max_tokens` ({max_tokens}) is greater than deduced " - f"`default_max_tokens` ({default_max_tokens}), using default_max_tokens instead." + f"User max_tokens ({max_tokens}) > deduced default_max_tokens ({default_max_tokens}); using default." ) return default_max_tokens return max_tokensThis also addresses Ruff E501 at the previous assert line.
105-137
: Don’t use asserts for runtime validation in executor creation.Replace both asserts with explicit errors; keeps checks in optimized runs and fixes Ruff E501 risk.
- def _create_py_executor(): + def _create_py_executor(): args = {} - assert hasattr( - self.llm_args, "backend" - ), "llm_args should be with backend in _create_py_executor" + if not hasattr(self.llm_args, "backend"): + raise ValueError("llm_args must have 'backend' in _create_py_executor().") @@ - elif self.llm_args.backend == "_autodeploy": + elif self.llm_args.backend == "_autodeploy": from tensorrt_llm._torch.auto_deploy.llm_args import \ LlmArgs as ADLlmArgs @@ - assert isinstance(self.llm_args, ADLlmArgs) + if not isinstance(self.llm_args, ADLlmArgs): + raise TypeError("llm_args must be AutoDeploy LlmArgs for '_autodeploy' backend.")
668-675
: Replace assert in shutdown with explicit error; wrap long condition (E501).Prevents assert stripping and fixes long line.
- if self.llm_args is not None: - assert self._executor_config is None, "An empty executor_config is expected in shutdown when LLM arguments are defined." - if (self.llm_args.backend == "pytorch" - and hasattr(self, "checkpoint_loader") - and self.checkpoint_loader is not None): + if self.llm_args is not None: + if self._executor_config is not None: + raise ValueError( + "executor_config must be None in shutdown when LLM arguments are defined." + ) + if ( + self.llm_args.backend == "pytorch" + and hasattr(self, "checkpoint_loader") + and self.checkpoint_loader is not None + ): self.checkpoint_loader.cleanup() self.checkpoint_loader = None
🧹 Nitpick comments (2)
tensorrt_llm/executor/worker.py (1)
66-67
: Constructor change: ensure docstring and param validation.Add a brief docstring note that llm_args drives executor creation when provided.
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
219-224
: executor_config.parallel_config is ignored by mapping
Mapping in_get_mapping
only readsexecutor_config.mapping
, soexecutor_config.parallel_config
(set at py_executor_creator.py:221) has no effect. Either remove this dead assignment or update_get_mapping
to useparallel_config.to_mapping()
(e.g., setexecutor_config.mapping = parallel_config.to_mapping()
) before deriving the mapping.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
(4 hunks)tensorrt_llm/executor/executor.py
(2 hunks)tensorrt_llm/executor/worker.py
(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tensorrt_llm/executor/executor.py
- tensorrt_llm/_torch/pyexecutor/py_executor.py
- tensorrt_llm/_torch/pyexecutor/_util.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic
Files:
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header with current year to all source files
Files:
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧠 Learnings (2)
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which can contain default `cuda_graph_config` values, so `llm_args` may already have this config before the extra options processing.
Applied to files:
tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
📚 Learning: 2025-08-26T09:37:10.463Z
Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧬 Code graph analysis (2)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)
tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs
(237-351)
tensorrt_llm/executor/worker.py (5)
tensorrt_llm/llmapi/llm_args.py (5)
BaseLlmArgs
(1172-1949)TorchLlmArgs
(2158-2586)get_pytorch_backend_config
(2521-2586)parallel_config
(1392-1393)to_mapping
(282-294)tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (1)
create_py_executor
(210-493)tensorrt_llm/auto_parallel/parallelization.py (1)
ParallelConfig
(51-77)tensorrt_llm/_torch/auto_deploy/llm_args.py (1)
LlmArgs
(237-351)tensorrt_llm/_torch/pyexecutor/config.py (1)
_construct_checkpoint_loader
(175-197)
🪛 Ruff (0.12.2)
tensorrt_llm/executor/worker.py
494-494: Line too long (140 > 120)
(E501)
669-669: Line too long (136 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/executor/worker.py (3)
21-21
: Import surface looks good.Switch to BaseLlmArgs/TorchLlmArgs/PybindMirror aligns with the refactor.
199-202
: LoRA manager gating per backend is correct.Condition ensures PyTorch-only LoRA path. LGTM.
726-727
: worker_main now accepts llm_args: OK.Signature aligns with constructor; proxy wiring remains consistent.
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)
210-217
: API change: create_py_executor(llm_args, …): Good migration.Signature is clear and typed; matches worker usage.
443-444
: Propagating max_seq_len to PyExecutor: LGTM.This keeps worker.max_seq_len in sync with engine-derived limits.
Also applies to: 487-488
PR_Github #16840 [ run ] completed with state |
c11aad7
to
a2b4248
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/bot run |
PR_Github #16967 [ run ] triggered by Bot |
PR_Github #16967 [ run ] completed with state |
Signed-off-by: leslie-fang25 <[email protected]>
de88835
to
ec3ed20
Compare
/bot run |
PR_Github #16982 [ run ] triggered by Bot |
PR_Github #16982 [ run ] completed with state |
/bot run |
PR_Github #17025 [ run ] triggered by Bot |
PR_Github #17025 [ run ] completed with state |
/bot run |
PR_Github #17029 [ run ] triggered by Bot |
PR_Github #17029 [ run ] completed with state |
/bot run |
PR_Github #17053 [ run ] triggered by Bot |
PR_Github #17053 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @lucaslie, @suyoggupta could you help to take a look of this PR? It seems need your kind approval to get landed. |
Summary by CodeRabbit
New Features
Refactor
Description
This PR mainly refactors the interface of
create_py_executor
by replacingexecutor config
withllm args
.Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.