-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. #6579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuxian Qiu <[email protected]>
📝 WalkthroughWalkthroughThe changes update the pre-commit configuration to exclude Markdown files from the trailing whitespace check and introduce a comprehensive Markdown guide detailing deployment, testing, evaluation, and benchmarking procedures for running the DeepSeek R1 model on TensorRT-LLM. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Docker
participant TensorRT-LLM Server
participant GPU
participant HTTP Client
User->>Docker: Launch TensorRT-LLM container with DeepSeek R1
Docker->>TensorRT-LLM Server: Start server with config and model
TensorRT-LLM Server->>GPU: Load model, allocate resources
HTTP Client->>TensorRT-LLM Server: Send inference request (prompt)
TensorRT-LLM Server->>GPU: Run inference
GPU-->>TensorRT-LLM Server: Return output tokens
TensorRT-LLM Server-->>HTTP Client: Respond with completion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (5)
3-4
: Fix heading level hierarchy
# Introduction
should be an H2 (##
) so that heading levels increment by one from the main title, eliminating the MD001 linter error.-# Introduction +## Introduction
194-195
: Remove stray back-tick at end of sentenceThe closing back-tick after
extra_llm_api_options
is orphaned and appears to be a typo.-... which can be used in the extra_llm_api_options`.` +... which can be used in the `extra_llm_api_options`.
251-256
: Specify a language for fenced code blockLinter MD040 flags this block. Add a language (e.g.,
shell
) after the opening back-ticks.-``` +```shell MODEL_PATH=deepseek-ai/DeepSeek-R1 ...
259-264
: Add language hint to result table blockThe fenced block containing the sample results lacks a language tag. Use
text
to silence MD040 and preserve monospace alignment.-``` +```text |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| ...
337-346
: Add language spec to benchmark output blockSame MD040 issue for the benchmark sample.
-``` +```text ============ Serving Benchmark Result ============ ...
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.pre-commit-config.yaml
(1 hunks)examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-01T07:34:42.734Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~161-~161: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
109-109: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4
(MD001, heading-increment)
251-251: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
337-337: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
.pre-commit-config.yaml (1)
30-30
: Trailing-whitespace hook exclusion looks correctExpanding the regex to
\.(patch|md)$
cleanly prevents markdown docs (like the one added in this PR) from being flagged while still keeping other file types covered. No further action needed.
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
190-196
: Remove stray back-tick at the end of the hyperlink sentenceThe dangling back-tick breaks inline-code formatting and renders a literal “`” in GitHub preview.
-... list of options which can be used in the extra_llm_api_options`. +... list of options which can be used in the `extra_llm_api_options`.
249-256
: Add language identifiers to fenced code blocks
markdownlint (MD040)
warns when a fence lacks a language.
Specifyshell
for commands andtext
for sample outputs/tables to enable proper syntax highlighting.-``` +```shell MODEL_PATH=deepseek-ai/DeepSeek-R1 @@ -``` +```text |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| @@ -``` +```text ============ Serving Benchmark Result ============ @@Repeat for every unlabeled block (three occurrences in this range). Also applies to: 259-264, 337-364 </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: .coderabbit.yaml** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 423692ce6f5e3024cc16494b17b507265d125439 and ecbd20b8c3cea90c5d7ed4d0e7bd33fa3b40d686. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (3)</summary> <details> <summary>📓 Common learnings</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.</details> <details> <summary>📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...</summary>
Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.**Applied to files:** - `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md` </details> <details> <summary>📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...</summary>
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.**Applied to files:** - `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md` </details> </details><details> <summary>🪛 LanguageTool</summary> <details> <summary>examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md</summary> [style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...ch** backend. #### `--max_batch_size`  **Description:** The maximum number of ... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...r processing. #### `--max_num_tokens`  **Description:** The maximum total numb... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...scheduled batch. #### `--max_seq_len`  **Description:** The maximum possible s... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...ut tokens. #### `--trust_remote_code`  **Description:** Allows TensorRT-LLM to... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...: Sets the data type for the KV cache.   **Default**: auto (uses the data ... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...checkpoint). #### `cuda_graph_config`  **Description**: A section for configur... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...or which a CUDA graph will be created.   **Default**: 0   **Rec... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...-max\_batch\_size command-line option.   batch\_sizes: A specific list of ... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...batch sizes to create CUDA graphs for.   **Default**: None #### `moe_conf... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...The backend to use for MoE operations.   **Default**: CUTLASS #### `atten... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...end to use for attention calculations.  **Default**: TRTLLM See the [https://g... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) --- [style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace \--mod... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md</summary> 109-109: Heading levels should only increment by one level at a time Expected: h3; Actual: h4 (MD001, heading-increment) --- 251-251: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 259-259: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 337-337: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
251-256
: Add language identifiers to fenced code blocks to satisfymarkdownlint
(MD040)The three blocks below lack a language tag:
- Lines 251-255 (
lm_eval …
) – usebash
- Lines 259-264 (table output) – use
text
- Lines 337-363 (benchmark sample output) – use
text
-``` +```bash # for the command block -``` +```text # for plain text / table outputThis removes the current MD040 violations and enables proper syntax highlighting.
Also applies to: 259-264, 337-363
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~161-~161: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
251-251: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
337-337: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
106-144
: Heading hierarchy now correct – nice cleanupThe section was demoted to
###
and option headings to####
, fixing the previous MD001 complaint.
No further action needed.
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
106-188
: Heading levels still skip H3 (MD001
)All option headings (
####
) are children of an H2 section, so they should be###
.
Same feedback was given earlier and remains unresolved.
55-66
: Config file is overwritten & FP4 config missing – creates runtime confusionYou still create
/tmp/config.yml
twice and both times hard-codedtype: fp8
.
The secondcat
clobbers the first one and there is no standalone FP4 config, so the FP4 workflow silently falls back to FP8.-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml # dedicated file for FP4 @@ -kv_cache_config: - dtype: fp8 +kv_cache_config: + dtype: fp4-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp8.yml # dedicated file for FP8 @@ kv_cache_config: dtype: fp8Remember to pass the correct file in
trtllm-serve --extra_llm_api_options
.
Without this fix users will get unexpected accuracy/perf results.Also applies to: 70-83
🧹 Nitpick comments (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
249-264
: Specify a language for fenced code blocks (MD040
)Add
shell
ortext
after the opening back-ticks to silence markdown-lint and enable syntax highlighting.-``` +```shellApply to the GSM8K result snippets and the benchmark sample too.
Also applies to: 266-283, 337-364
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
251-251: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
337-337: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
114-154
: Heading levels still skip H3 under “Configs and Parameters”
Configs and Parameters
is an H2, yet the option names start at H4 (####
).
Promote them to H3 (###
) to satisfy markdown-lint MD001 and keep a logical hierarchy.-#### `--tp_size` -#### `--ep_size` -#### `--kv_cache_free_gpu_memory_fraction` +### `--tp_size` +### `--ep_size` +### `--kv_cache_free_gpu_memory_fraction`Apply the same promotion to every option heading in this section.
55-90
: Config file is still overwritten and NVFP4 dtype hard-coded to fp8The two
cat << EOF > ${EXTRA_LLM_API_FILE}
blocks write to the same/tmp/config.yml
; the second block fully clobbers the first.
Additionally, the first block keepskv_cache_config.dtype: fp8
, so NVFP4 runs will silently fall back to FP8.-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml ... -kv_cache_config: - dtype: fp8 +kv_cache_config: + dtype: fp4And for the FP8 case:
-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp8.ymlUpdate the subsequent
trtllm-serve ... --extra_llm_api_options
invocation(s) to reference the correct file.
🧹 Nitpick comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-272
: Add language identifiers to fenced code blocks (MD040)Markdown-lint flags these blocks because no language is specified.
Useshell
,text
, or another suitable identifier:-``` +```shell # commandand
-``` +```text |Tasks|Version|…Do the same for the benchmark sample output at Lines 345-372.
Also applies to: 345-372
200-203
: Dangling back-tick breaks Markdown renderingThe closing sentence ends with ``options `.``` — the stray back-tick leaves the inline code span open.
-…extra_llm_api_options`.` +…`extra_llm_api_options`.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
267-267: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
345-345: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
114-131
: Heading level skips break Markdown-lint (MD001
)
Configs and Parameters
is an H2, but option names start at H4 (####
).
Promote them to H3 to maintain a one-level increment and keep autogenerated TOCs correct.-#### `--tp_size` -#### `--ep_size` -#### `--kv_cache_free_gpu_memory_fraction` -#### `--backend pytorch` -#### `--max_batch_size` -#### `--max_num_tokens` -#### `--max_seq_len` -#### `--trust_remote_code` +### `--tp_size` +### `--ep_size` +### `--kv_cache_free_gpu_memory_fraction` +### `--backend pytorch` +### `--max_batch_size` +### `--max_num_tokens` +### `--max_seq_len` +### `--trust_remote_code`
56-70
: Config file is overwritten & NVFP4 still usesfp8
dtypeThe same
${EXTRA_LLM_API_FILE}
path (/tmp/config.yml
) is re-used twice.
The secondcat <<EOF > …
completely clobbers the first YAML, so the FP4 settings are lost.
In addition,kv_cache_config.dtype
is hard-coded tofp8
, meaning the “FP4 guide” still runs FP8 under the hood.-EXTRA_LLM_API_FILE=/tmp/config.yml -… -kv_cache_config: - dtype: fp8 # <- wrong for FP4 +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml +… +kv_cache_config: + dtype: fp4 … -EXTRA_LLM_API_FILE=/tmp/config.yml # second block now becomes FP8-specific +EXTRA_LLM_API_FILE=/tmp/config_fp8.ymlThen launch the server with the matching file:
--extra_llm_api_options ${EXTRA_LLM_API_FILE} # adjust per modelFailing to separate the files silently produces wrong results and makes debugging painful.
Please split the configs or use the-a >>
append operator if you truly intend a single file.Also applies to: 75-91
🧹 Nitpick comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
200-203
: Stray back-tick breaks the sentenceThe line ends with
options which can be used in the extra\_llm\_api\_options`.
Remove the trailing back-tick (or move it before the period).-… extra_llm_api_options`.` +… extra_llm_api_options.
259-273
: Add language identifiers to fenced code blocks (MD040
)Markdown-lint flags code blocks without a language.
Usetext
for plain console output to silence the warning.-``` +```text |Tasks|Version| …Apply the same change to the benchmark sample block below.
Also applies to: 345-351
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
267-267: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
345-345: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
/bot run |
PR_Github #13928 [ run ] triggered by Bot |
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
55-90
: Config file is overwritten and FP4 dtype is wrong – create two distinct YAMLsBoth cat-hered blocks write to
/tmp/config.yml
, so the second block clobbers the first and you never get an FP4-specific config. In addition, the “FP4” block still hard-codesdtype: fp8
. Readers who copy-paste will silently end up running FP8 in all cases.-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml … -kv_cache_config: - dtype: fp8 +kv_cache_config: + dtype: fp4 … # For FP8 model -EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp8.ymland update the subsequent
trtllm-serve … --extra_llm_api_options
examples to point at the correct file (config_fp4.yml
orconfig_fp8.yml
).
This avoids the overwrite, sets the right dtype for FP4, and makes the guide copy-paste-safe.
🧹 Nitpick comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-272
: Specify a language for the fenced code block (MD040)Markdown-lint flags the evaluation result block because it lacks a language tag.
-``` +```text
345-372
: Add language identifier for the sample benchmark outputSame MD040 issue here—declare the block as plain text so linters pass and syntax highlighting is disabled.
-``` +```text
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
345-345: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
PR_Github #13928 [ run ] completed with state |
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
55-91
: Config file is still overwritten; fp4 config remains impossible
The twocat << EOF > ${EXTRA_LLM_API_FILE}
blocks target the same/tmp/config.yml
, so the first YAML is discarded when the second block executes. The fp4 use-case therefore never sees its intended settings and the KV-cache dtype staysfp8
.This was flagged in a previous review and is still unresolved.
-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml # first block # … fp4-specific YAML … # ------------------------------------------------- -EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp8.yml # second block # … fp8-specific YAML …Ensure the subsequent
trtllm-serve
command references the correct file for each quantization mode.
114-147
: Heading level jumps still violate MD001
Configs and Parameters
is an H2 (##
) but the option headings start at H4 (####
). They must be promoted to H3 (###
) to maintain a one-level increment.This exact issue was raised earlier but the markdown remains unchanged.
-#### `--tp_size` +### `--tp_size`Apply the same change to every option heading in this section.
🧹 Nitpick comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-264
: Specify language for fenced code block (MD040)
Markdown-lint flags code blocks without a language hint. Addshell
here.-``` +```shell MODEL_PATH=deepseek-ai/DeepSeek-R1-0528
345-372
: Second unlabeled code block needs a language tag
The sample benchmark output block also violates MD040. Usetext
orconsole
.-``` +```text ============ Serving Benchmark Result ============
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
345-345: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
55-66
: Config file is overwritten & NVFP4 still hard-coded to FP8Both
cat <<EOF > ${EXTRA_LLM_API_FILE}
blocks target the same/tmp/config.yml
.
The second block replaces the first, so the NVFP4-intended settings are lost andkv_cache_config.dtype
remainsfp8
, which will silently disable FP4 execution.-EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml # first block ... -kv_cache_config: - dtype: fp8 +kv_cache_config: + dtype: fp4 ... -EXTRA_LLM_API_FILE=/tmp/config.yml +EXTRA_LLM_API_FILE=/tmp/config_fp8.yml # second blockThen launch the server with the appropriate file:
--extra_llm_api_options ${EXTRA_LLM_API_FILE} # point to *_fp4.yml or *_fp8.ymlWithout this separation, FP4 users will unknowingly run an FP8 config and MoE users lose the extra settings.
Also applies to: 75-91
118-150
: Heading levels still skip H3, violating MD001
Configs and Parameters
is an H2. Option headings start at####
, jumping two levels.
Promote them uniformly to H3 (###
) to restore semantic hierarchy and avoid markdown-lint failures.-#### `--tp_size` +### `--tp_size`Apply to every option heading in this section.
🧹 Nitpick comments (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
260-264
: Specify the language on fenced code blocks (MD040)Add
shell
(ornone
) after the triple back-ticks so tools render and lint correctly:-``` +```shellDo the same for the benchmark result block (
```
→```text
).Also applies to: 346-373
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
🪛 LanguageTool
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
[style] ~137-~137: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~141-~141: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~145-~145: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~149-~149: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~164-~164: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~168-~168: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~178-~178: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~182-~182: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~184-~184: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~194-~194: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~200-~200: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~325-~325: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md
260-260: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Signed-off-by: Yuxian Qiu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md (4)
214-215
: Clarify container hostname vs. host accessInside the container the server binds to
0.0.0.0:8000
, but from the host you must uselocalhost:8000
only if thedocker run
used-p 8000:8000
.
If users change the port mapping, the curl example will fail. Add a brief reminder such as:“Replace
8000
with the host-side port you mapped indocker run
.”
260-266
: Specify language for fenced code block – fixes MD040Markdown-lint flags this block because it lacks a language hint.
-``` +```shell MODEL_PATH=deepseek-ai/DeepSeek-R1-0528 ...Do the same for other shell blocks to keep lint clean.
346-373
: Missing language spec on sample benchmark outputAdd
text (or
none) to silence MD040 and keep syntax highlighters from mis-detecting numbers as code.-``` +```text ============ Serving Benchmark Result ============ ... ==================================================
137-150
: Stylistic: consecutive sentences start with “The”LanguageTool flags several clusters (Lines 137-150, etc.). While not critical, varying openings improves readability—e.g., replace with “This option…”, “It sets…”, etc.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
🪛 LanguageTool
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
[style] ~137-~137: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size
Description: The maximum number of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~141-~141: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens
Description: The maximum total numb...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~145-~145: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len
Description: The maximum possible s...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~149-~149: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code
Description: Allows TensorRT-LLM to...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~164-~164: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~168-~168: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config
Description: A section for configur...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~170-~170: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~178-~178: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~182-~182: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~184-~184: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~190-~190: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~194-~194: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~200-~200: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~325-~325: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.17.2)
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
260-260: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
cat << EOF > ${EXTRA_LLM_API_FILE} | ||
enable_attention_dp: true | ||
cuda_graph_config: | ||
enable_padding: true | ||
max_batch_size: 128 | ||
kv_cache_config: | ||
dtype: fp8 | ||
stream_interval: 10 | ||
speculative_config: | ||
decoding_type: MTP | ||
num_nextn_predict_layers: 1 | ||
EOF | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config file is silently overwritten – FP8 example wipes the generic settings
Both cat << EOF > ${EXTRA_LLM_API_FILE}
blocks target the same /tmp/config.yml
.
Running the second block (FP8 + moe_config
) discards the first block’s contents, so users following the tutorial verbatim will only keep the FP8-specific YAML and lose the common settings.
-EXTRA_LLM_API_FILE=/tmp/config.yml
-cat << EOF > ${EXTRA_LLM_API_FILE}
+# General config
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml
+cat << EOF > "${EXTRA_LLM_API_FILE}"
...
EOF
...
-EXTRA_LLM_API_FILE=/tmp/config.yml
-cat << EOF > ${EXTRA_LLM_API_FILE}
+# FP8-specific config
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml
+cat << EOF > "${EXTRA_LLM_API_FILE}"
...
EOF
Then pass the appropriate file via --extra_llm_api_options
.
At minimum, call out in text that the second block replaces the first.
Also applies to: 75-92
🤖 Prompt for AI Agents
In
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
around lines 58 to 70, the second cat command overwrites the same config file as
the first, causing the initial generic settings to be lost. To fix this, either
merge the contents of both config blocks into a single file before writing or
write to separate files and clearly document that the second file replaces the
first when passed via --extra_llm_api_options. Also add a note in the tutorial
text explaining this replacement behavior to avoid confusion. Repeat the same
fix for lines 75 to 92.
trtllm-serve deepseek-ai/DeepSeek-R1-0528 \ | ||
--host 0.0.0.0 \ | ||
--port 8000 \ | ||
--backend pytorch \ | ||
--max_batch_size 1024 \ | ||
--max_num_tokens 3200 \ | ||
--max_seq_len 2048 \ | ||
--kv_cache_free_gpu_memory_fraction 0.8 \ | ||
--tp_size 8 \ | ||
--ep_size 8 \ | ||
--trust_remote_code \ | ||
--extra_llm_api_options ${EXTRA_LLM_API_FILE} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
--tp_size 8
and --ep_size 8
imply 64 GPUs – highlight or lower defaults
Setting both flags to 8 requires 8 × 8 = 64 GPUs for one model instance. Most users running a “quick-start” will not have that scale, and the server will abort at runtime.
Recommend either:
---tp_size 8 \
---ep_size 8 \
+# Adjust parallelism for your GPU count (e.g. 2 GPUs → --tp_size 2 --ep_size 1)
+--tp_size 1 \
+--ep_size 1 \
or add an explicit note explaining the requirement.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
trtllm-serve deepseek-ai/DeepSeek-R1-0528 \ | |
--host 0.0.0.0 \ | |
--port 8000 \ | |
--backend pytorch \ | |
--max_batch_size 1024 \ | |
--max_num_tokens 3200 \ | |
--max_seq_len 2048 \ | |
--kv_cache_free_gpu_memory_fraction 0.8 \ | |
--tp_size 8 \ | |
--ep_size 8 \ | |
--trust_remote_code \ | |
--extra_llm_api_options ${EXTRA_LLM_API_FILE} | |
``` | |
trtllm-serve deepseek-ai/DeepSeek-R1-0528 \ | |
--host 0.0.0.0 \ | |
--port 8000 \ | |
--backend pytorch \ | |
--max_batch_size 1024 \ | |
--max_num_tokens 3200 \ | |
--max_seq_len 2048 \ | |
--kv_cache_free_gpu_memory_fraction 0.8 \ | |
# Adjust parallelism for your GPU count (e.g. 2 GPUs → --tp_size 2 --ep_size 1) | |
--tp_size 1 \ | |
--ep_size 1 \ | |
--trust_remote_code \ | |
--extra_llm_api_options ${EXTRA_LLM_API_FILE} |
🤖 Prompt for AI Agents
In
examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md
around lines 99 to 111, the flags --tp_size 8 and --ep_size 8 imply a total of
64 GPUs, which is likely beyond the capacity of most users running a
quick-start. To fix this, either lower the default values of --tp_size and
--ep_size to reflect a smaller GPU count or add a clear note explicitly stating
that these settings require 64 GPUs and the server will abort if insufficient
GPUs are available.
/bot run |
PR_Github #14276 [ run ] triggered by Bot |
PR_Github #14276 [ run ] completed with state |
Bypass and merging. The doc change won't affect any CI. The failures are known in existing CI. |
Signed-off-by: Yuxian Qiu <[email protected]>
Summary by CodeRabbit
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.