[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. #6579

yuxianq · 2025-08-03T16:10:27Z

Summary by CodeRabbit

Documentation
- Added a comprehensive deployment and usage guide for running the DeepSeek R1 model on NVIDIA GPUs with TensorRT-LLM, including setup, configuration, testing, evaluation, and benchmarking instructions.
- Updated configuration to exclude Markdown files from trailing whitespace checks.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai · 2025-08-03T16:10:37Z

📝 Walkthrough

Walkthrough

The changes update the pre-commit configuration to exclude Markdown files from the trailing whitespace check and introduce a comprehensive Markdown guide detailing deployment, testing, evaluation, and benchmarking procedures for running the DeepSeek R1 model on TensorRT-LLM.

Changes

Cohort / File(s)	Change Summary
Pre-commit Hook Configuration `.pre-commit-config.yaml`	Expanded the exclusion pattern for the `trailing-whitespace` hook to skip both `.patch` and `.md` files.
Documentation: DeepSeek R1 on TensorRT-LLM `examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md`	Added a detailed deployment, usage, troubleshooting, evaluation, and benchmarking guide for DeepSeek R1 on TensorRT-LLM.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Docker
    participant TensorRT-LLM Server
    participant GPU
    participant HTTP Client

    User->>Docker: Launch TensorRT-LLM container with DeepSeek R1
    Docker->>TensorRT-LLM Server: Start server with config and model
    TensorRT-LLM Server->>GPU: Load model, allocate resources
    HTTP Client->>TensorRT-LLM Server: Send inference request (prompt)
    TensorRT-LLM Server->>GPU: Run inference
    GPU-->>TensorRT-LLM Server: Return output tokens
    TensorRT-LLM Server-->>HTTP Client: Respond with completion

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Possibly related PRs

[None][doc] Adding GPT-OSS Deployment Guide documentation #6637: Adds a deployment guide for another LLM on TensorRT-LLM, similar in documentation scope and structure.
[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220: Improves TensorRT-LLM serving documentation with benchmarking examples and performance tuning, closely related to the new guide.

Suggested labels

Community want to contribute

Suggested reviewers

nv-guomingz
kaiyux

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (5)
3-4: Fix heading level hierarchy

# Introduction should be an H2 (##) so that heading levels increment by one from the main title, eliminating the MD001 linter error.
-# Introduction
+## Introduction
194-195: Remove stray back-tick at end of sentence

The closing back-tick after extra_llm_api_options is orphaned and appears to be a typo.
-... which can be used in the extra_llm_api_options`.`
+... which can be used in the `extra_llm_api_options`.
251-256: Specify a language for fenced code block

Linter MD040 flags this block. Add a language (e.g., shell) after the opening back-ticks.
-```
+```shell
 MODEL_PATH=deepseek-ai/DeepSeek-R1
 ...
259-264: Add language hint to result table block

The fenced block containing the sample results lacks a language tag. Use text to silence MD040 and preserve monospace alignment.
-```
+```text
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
 ...
337-346: Add language spec to benchmark output block

Same MD040 issue for the benchmark sample.
-```
+```text
 ============ Serving Benchmark Result ============
 ...

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7abf8 and 423692c.

📒 Files selected for processing (2)

.pre-commit-config.yaml (1 hunks)
examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-01T07:34:42.734Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~161-~161: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

109-109: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

337-337: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

.pre-commit-config.yaml (1)

30-30: Trailing-whitespace hook exclusion looks correct

Expanding the regex to \.(patch|md)$ cleanly prevents markdown docs (like the one added in this PR) from being flagged while still keeping other file types covered. No further action needed.

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)

190-196: Remove stray back-tick at the end of the hyperlink sentence

The dangling back-tick breaks inline-code formatting and renders a literal “`” in GitHub preview.

-... list of options which can be used in the extra_llm_api_options`.
+... list of options which can be used in the `extra_llm_api_options`.

249-256: Add language identifiers to fenced code blocks

markdownlint (MD040) warns when a fence lacks a language.
Specify shell for commands and text for sample outputs/tables to enable proper syntax highlighting.

-```
+```shell
 MODEL_PATH=deepseek-ai/DeepSeek-R1
@@
-```
+```text
 |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
@@
-```
+```text
 ============ Serving Benchmark Result ============
@@

Repeat for every unlabeled block (three occurrences in this range).  



Also applies to: 259-264, 337-364

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 423692ce6f5e3024cc16494b17b507265d125439 and ecbd20b8c3cea90c5d7ed4d0e7bd33fa3b40d686.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (3)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.


</details>
<details>
<summary>📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...</summary>

Learnt from: yibinl-nvidia
PR: #6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.


**Applied to files:**
- `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md`

</details>
<details>
<summary>📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...</summary>

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.


**Applied to files:**
- `examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md`

</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md</summary>

[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend.  #### `--max_batch_size`  &emsp;**Description:** The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing.  #### `--max_num_tokens`  &emsp;**Description:** The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch.  #### `--max_seq_len`  &emsp;**Description:** The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens.  #### `--trust_remote_code`  &emsp;**Description:** Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache.  &emsp;&emsp;**Default**: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint).  #### `cuda_graph_config`  &emsp;**Description**: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created.  &emsp;&emsp;**Default**: 0  &emsp;&emsp;**Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max\_batch\_size command-line option.  &emsp;&emsp;batch\_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for.  &emsp;&emsp;**Default**: None  #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations.  &emsp;&emsp;**Default**: CUTLASS  #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations.  &emsp;**Default**: TRTLLM  See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

---

[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ```  To benchmark the FP4 model, replace \--mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md</summary>

109-109: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

---

251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

337-337: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Pre-commit Check

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
251-256: Add language identifiers to fenced code blocks to satisfy markdownlint (MD040)

The three blocks below lack a language tag:

Lines 251-255 (lm_eval …) – use bash

Lines 259-264 (table output) – use text

Lines 337-363 (benchmark sample output) – use text
-```
+```bash      # for the command block

-```
+```text      # for plain text / table output
This removes the current MD040 violations and enables proper syntax highlighting.

Also applies to: 259-264, 337-363

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbd20b and 531a7ca.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~161-~161: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

337-337: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)

106-144: Heading hierarchy now correct – nice cleanup

The section was demoted to ### and option headings to ####, fixing the previous MD001 complaint.
No further action needed.

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
106-188: Heading levels still skip H3 (MD001)

All option headings (####) are children of an H2 section, so they should be ###.
Same feedback was given earlier and remains unresolved.

55-66: Config file is overwritten & FP4 config missing – creates runtime confusion

You still create /tmp/config.yml twice and both times hard-code dtype: fp8.
The second cat clobbers the first one and there is no standalone FP4 config, so the FP4 workflow silently falls back to FP8.
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml        # dedicated file for FP4
@@
-kv_cache_config:
-  dtype: fp8
+kv_cache_config:
+  dtype: fp4
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml        # dedicated file for FP8
@@
 kv_cache_config:
   dtype: fp8
Remember to pass the correct file in trtllm-serve --extra_llm_api_options.
Without this fix users will get unexpected accuracy/perf results.

Also applies to: 70-83

🧹 Nitpick comments (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
249-264: Specify a language for fenced code blocks (MD040)

Add shell or text after the opening back-ticks to silence markdown-lint and enable syntax highlighting.
-```
+```shell
Apply to the GSM8K result snippets and the benchmark sample too.

Also applies to: 266-283, 337-364

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 531a7ca and d94211e.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~128-~128: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~132-~132: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~159-~159: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~173-~173: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~175-~175: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~185-~185: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~191-~191: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

337-337: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
114-154: Heading levels still skip H3 under “Configs and Parameters”

Configs and Parameters is an H2, yet the option names start at H4 (####).
Promote them to H3 (###) to satisfy markdown-lint MD001 and keep a logical hierarchy.
-#### `--tp_size`
-#### `--ep_size`
-#### `--kv_cache_free_gpu_memory_fraction`
+### `--tp_size`
+### `--ep_size`
+### `--kv_cache_free_gpu_memory_fraction`
Apply the same promotion to every option heading in this section.

55-90: Config file is still overwritten and NVFP4 dtype hard-coded to fp8

The two cat << EOF > ${EXTRA_LLM_API_FILE} blocks write to the same /tmp/config.yml; the second block fully clobbers the first.
Additionally, the first block keeps kv_cache_config.dtype: fp8, so NVFP4 runs will silently fall back to FP8.
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml
 ...
-kv_cache_config:
-  dtype: fp8
+kv_cache_config:
+  dtype: fp4
And for the FP8 case:
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml
Update the subsequent trtllm-serve ... --extra_llm_api_options invocation(s) to reference the correct file.

🧹 Nitpick comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-272: Add language identifiers to fenced code blocks (MD040)

Markdown-lint flags these blocks because no language is specified.
Use shell, text, or another suitable identifier:
-```
+```shell
   # command
and
-```
+```text
   |Tasks|Version|…
Do the same for the benchmark sample output at Lines 345-372.

Also applies to: 345-372

200-203: Dangling back-tick breaks Markdown rendering

The closing sentence ends with ``options `.``` — the stray back-tick leaves the inline code span open.
-…extra_llm_api_options`.`
+…`extra_llm_api_options`.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d94211e and 37f5d3c.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

267-267: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

345-345: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
114-131: Heading level skips break Markdown-lint (MD001)

Configs and Parameters is an H2, but option names start at H4 (####).
Promote them to H3 to maintain a one-level increment and keep autogenerated TOCs correct.
-#### `--tp_size`
-#### `--ep_size`
-#### `--kv_cache_free_gpu_memory_fraction`
-#### `--backend pytorch`
-#### `--max_batch_size`
-#### `--max_num_tokens`
-#### `--max_seq_len`
-#### `--trust_remote_code`
+### `--tp_size`
+### `--ep_size`
+### `--kv_cache_free_gpu_memory_fraction`
+### `--backend pytorch`
+### `--max_batch_size`
+### `--max_num_tokens`
+### `--max_seq_len`
+### `--trust_remote_code`
56-70: Config file is overwritten & NVFP4 still uses fp8 dtype

The same ${EXTRA_LLM_API_FILE} path (/tmp/config.yml) is re-used twice.
The second cat <<EOF > … completely clobbers the first YAML, so the FP4 settings are lost.
In addition, kv_cache_config.dtype is hard-coded to fp8, meaning the “FP4 guide” still runs FP8 under the hood.
-EXTRA_LLM_API_FILE=/tmp/config.yml
-…
-kv_cache_config:
-  dtype: fp8          # <- wrong for FP4
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml
+…
+kv_cache_config:
+  dtype: fp4
…
-EXTRA_LLM_API_FILE=/tmp/config.yml   # second block now becomes FP8-specific
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml
Then launch the server with the matching file:
--extra_llm_api_options ${EXTRA_LLM_API_FILE}   # adjust per model
Failing to separate the files silently produces wrong results and makes debugging painful.
Please split the configs or use the -a >> append operator if you truly intend a single file.

Also applies to: 75-91

🧹 Nitpick comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
200-203: Stray back-tick breaks the sentence

The line ends with options which can be used in the extra\_llm\_api\_options`.
Remove the trailing back-tick (or move it before the period).
-… extra_llm_api_options`.`
+… extra_llm_api_options.
259-273: Add language identifiers to fenced code blocks (MD040)

Markdown-lint flags code blocks without a language.
Use text for plain console output to silence the warning.
-```
+```text
|Tasks|Version| …
Apply the same change to the benchmark sample block below.

Also applies to: 345-351

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37f5d3c and 2ee475a.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.550Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

267-267: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

345-345: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

yuxianq · 2025-08-04T05:27:08Z

/bot run

tensorrt-cicd · 2025-08-04T05:32:40Z

PR_Github #13928 [ run ] triggered by Bot

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
55-90: Config file is overwritten and FP4 dtype is wrong – create two distinct YAMLs

Both cat-hered blocks write to /tmp/config.yml, so the second block clobbers the first and you never get an FP4-specific config. In addition, the “FP4” block still hard-codes dtype: fp8. Readers who copy-paste will silently end up running FP8 in all cases.
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml
 …
-kv_cache_config:
-  dtype: fp8
+kv_cache_config:
+  dtype: fp4
…
 # For FP8 model
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml
and update the subsequent trtllm-serve … --extra_llm_api_options examples to point at the correct file (config_fp4.yml or config_fp8.yml).
This avoids the overwrite, sets the right dtype for FP4, and makes the guide copy-paste-safe.

🧹 Nitpick comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-272: Specify a language for the fenced code block (MD040)

Markdown-lint flags the evaluation result block because it lacks a language tag.
-```
+```text
345-372: Add language identifier for the sample benchmark output

Same MD040 issue here—declare the block as plain text so linters pass and syntax highlighting is disabled.
-```
+```text

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ee475a and e6696e1.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [https://g...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

345-345: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

tensorrt-cicd · 2025-08-04T07:41:27Z

PR_Github #13928 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10490 completed with status: 'FAILURE'

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
55-91: Config file is still overwritten; fp4 config remains impossible
The two cat << EOF > ${EXTRA_LLM_API_FILE} blocks target the same /tmp/config.yml, so the first YAML is discarded when the second block executes. The fp4 use-case therefore never sees its intended settings and the KV-cache dtype stays fp8.

This was flagged in a previous review and is still unresolved.
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml        # first block

 # … fp4-specific YAML …

 # -------------------------------------------------
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml        # second block

 # … fp8-specific YAML …
Ensure the subsequent trtllm-serve command references the correct file for each quantization mode.

114-147: Heading level jumps still violate MD001
Configs and Parameters is an H2 (##) but the option headings start at H4 (####). They must be promoted to H3 (###) to maintain a one-level increment.

This exact issue was raised earlier but the markdown remains unchanged.
-#### `--tp_size`
+### `--tp_size`
Apply the same change to every option heading in this section.

🧹 Nitpick comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
259-264: Specify language for fenced code block (MD040)
Markdown-lint flags code blocks without a language hint. Add shell here.
-```
+```shell
 MODEL_PATH=deepseek-ai/DeepSeek-R1-0528
345-372: Second unlabeled code block needs a language tag
The sample benchmark output block also violates MD040. Use text or console.
-```
+```text
 ============ Serving Benchmark Result ============

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6696e1 and 9ecd493.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~136-~136: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~140-~140: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~144-~144: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~148-~148: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~163-~163: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~167-~167: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~169-~169: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~177-~177: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~181-~181: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~183-~183: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~189-~189: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~193-~193: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~199-~199: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~324-~324: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

259-259: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

345-345: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (2)
55-66: Config file is overwritten & NVFP4 still hard-coded to FP8

Both cat <<EOF > ${EXTRA_LLM_API_FILE} blocks target the same /tmp/config.yml.
The second block replaces the first, so the NVFP4-intended settings are lost and kv_cache_config.dtype remains fp8, which will silently disable FP4 execution.
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp4.yml   # first block

 ...
-kv_cache_config:
-  dtype: fp8
+kv_cache_config:
+  dtype: fp4
 ...
-EXTRA_LLM_API_FILE=/tmp/config.yml
+EXTRA_LLM_API_FILE=/tmp/config_fp8.yml   # second block
Then launch the server with the appropriate file:
--extra_llm_api_options ${EXTRA_LLM_API_FILE}   # point to *_fp4.yml or *_fp8.yml
Without this separation, FP4 users will unknowingly run an FP8 config and MoE users lose the extra settings.

Also applies to: 75-91

118-150: Heading levels still skip H3, violating MD001

Configs and Parameters is an H2. Option headings start at ####, jumping two levels.
Promote them uniformly to H3 (###) to restore semantic hierarchy and avoid markdown-lint failures.
-#### `--tp_size`
+### `--tp_size`
Apply to every option heading in this section.

🧹 Nitpick comments (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1)
260-264: Specify the language on fenced code blocks (MD040)

Add shell (or none) after the triple back-ticks so tools render and lint correctly:
-```
+```shell
Do the same for the benchmark result block (``` → ```text).

Also applies to: 346-373

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ecd493 and b3f7dfe.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-04T02:12:17.582Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

🪛 LanguageTool

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

[style] ~137-~137: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~141-~141: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~145-~145: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~149-~149: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~164-~164: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~168-~168: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~178-~178: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~182-~182: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~184-~184: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~194-~194: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~200-~200: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~325-~325: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md

260-260: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md (4)
214-215: Clarify container hostname vs. host access

Inside the container the server binds to 0.0.0.0:8000, but from the host you must use localhost:8000 only if the docker run used -p 8000:8000.
If users change the port mapping, the curl example will fail. Add a brief reminder such as:

“Replace 8000 with the host-side port you mapped in docker run.”

260-266: Specify language for fenced code block – fixes MD040

Markdown-lint flags this block because it lacks a language hint.
-```
+```shell
 MODEL_PATH=deepseek-ai/DeepSeek-R1-0528
 ...
Do the same for other shell blocks to keep lint clean.

346-373: Missing language spec on sample benchmark output

Add text (or none) to silence MD040 and keep syntax highlighters from mis-detecting numbers as code.
-```
+```text
 ============ Serving Benchmark Result ============
 ...
==================================================
137-150: Stylistic: consecutive sentences start with “The”

LanguageTool flags several clusters (Lines 137-150, etc.). While not critical, varying openings improves readability—e.g., replace with “This option…”, “It sets…”, etc.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3f7dfe and efa0cf4.

📒 Files selected for processing (1)

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

🪛 LanguageTool

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

[style] ~137-~137: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ch** backend. #### --max_batch_size Description: The maximum number of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~141-~141: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...r processing. #### --max_num_tokens Description: The maximum total numb...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~145-~145: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...scheduled batch. #### --max_seq_len Description: The maximum possible s...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~149-~149: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ut tokens. #### --trust_remote_code Description: Allows TensorRT-LLM to...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~164-~164: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...: Sets the data type for the KV cache. Default: auto (uses the data ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~168-~168: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...checkpoint). #### cuda_graph_config Description: A section for configur...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~170-~170: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...g CUDA graphs to optimize performance. Options: enable_paddi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~178-~178: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...or which a CUDA graph will be created. Default: 0 **Rec...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~182-~182: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...-max_batch_size command-line option. batch_sizes: A specific list of ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~184-~184: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...batch sizes to create CUDA graphs for. Default: None #### `moe_conf...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~190-~190: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...n for Mixture-of-Experts (MoE) models. Options: backend: The ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~194-~194: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...The backend to use for MoE operations. Default: CUTLASS #### `atten...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~200-~200: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...end to use for attention calculations. Default: TRTLLM See the [TorchLlmA...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~325-~325: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...l,e2el" done EOF chmod +x bench.sh ``` To benchmark the FP4 model, replace --mod...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.17.2)

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

260-260: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

coderabbitai · 2025-08-06T08:26:41Z

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

+cat << EOF > ${EXTRA_LLM_API_FILE}
+enable_attention_dp: true
+cuda_graph_config:
+  enable_padding: true
+  max_batch_size: 128
+kv_cache_config:
+  dtype: fp8
+stream_interval: 10
+speculative_config:
+  decoding_type: MTP
+  num_nextn_predict_layers: 1
+EOF
+```


⚠️ Potential issue

Config file is silently overwritten – FP8 example wipes the generic settings

Both cat << EOF > ${EXTRA_LLM_API_FILE} blocks target the same /tmp/config.yml.
Running the second block (FP8 + moe_config) discards the first block’s contents, so users following the tutorial verbatim will only keep the FP8-specific YAML and lose the common settings.

-EXTRA_LLM_API_FILE=/tmp/config.yml -cat << EOF > ${EXTRA_LLM_API_FILE} +# General config +EXTRA_LLM_API_FILE=/tmp/config_fp4.yml +cat << EOF > "${EXTRA_LLM_API_FILE}" ... EOF ... -EXTRA_LLM_API_FILE=/tmp/config.yml -cat << EOF > ${EXTRA_LLM_API_FILE} +# FP8-specific config +EXTRA_LLM_API_FILE=/tmp/config_fp8.yml +cat << EOF > "${EXTRA_LLM_API_FILE}" ... EOF

Then pass the appropriate file via --extra_llm_api_options.
At minimum, call out in text that the second block replaces the first.

Also applies to: 75-92

🤖 Prompt for AI Agents

In examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md around lines 58 to 70, the second cat command overwrites the same config file as the first, causing the initial generic settings to be lost. To fix this, either merge the contents of both config blocks into a single file before writing or write to separate files and clearly document that the second file replaces the first when passed via --extra_llm_api_options. Also add a note in the tutorial text explaining this replacement behavior to avoid confusion. Repeat the same fix for lines 75 to 92.

coderabbitai · 2025-08-06T08:26:41Z

examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md

+trtllm-serve deepseek-ai/DeepSeek-R1-0528 \
+    --host 0.0.0.0 \
+    --port 8000 \
+    --backend pytorch \
+    --max_batch_size 1024 \
+    --max_num_tokens 3200 \
+    --max_seq_len 2048 \
+    --kv_cache_free_gpu_memory_fraction 0.8 \
+    --tp_size 8 \
+    --ep_size 8 \
+    --trust_remote_code \
+    --extra_llm_api_options ${EXTRA_LLM_API_FILE}
+```


🛠️ Refactor suggestion

--tp_size 8 and --ep_size 8 imply 64 GPUs – highlight or lower defaults

Setting both flags to 8 requires 8 × 8 = 64 GPUs for one model instance. Most users running a “quick-start” will not have that scale, and the server will abort at runtime.

Recommend either:

---tp_size 8 \ ---ep_size 8 \ +# Adjust parallelism for your GPU count (e.g. 2 GPUs → --tp_size 2 --ep_size 1) +--tp_size 1 \ +--ep_size 1 \

or add an explicit note explaining the requirement.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

trtllm-serve deepseek-ai/DeepSeek-R1-0528 \

--host 0.0.0.0 \

--port 8000 \

--backend pytorch \

--max_batch_size 1024 \

--max_num_tokens 3200 \

--max_seq_len 2048 \

--kv_cache_free_gpu_memory_fraction 0.8 \

--tp_size 8 \

--ep_size 8 \

--trust_remote_code \

--extra_llm_api_options ${EXTRA_LLM_API_FILE}

```

trtllm-serve deepseek-ai/DeepSeek-R1-0528 \

--host 0.0.0.0 \

--port 8000 \

--backend pytorch \

--max_batch_size 1024 \

--max_num_tokens 3200 \

--max_seq_len 2048 \

--kv_cache_free_gpu_memory_fraction 0.8 \

# Adjust parallelism for your GPU count (e.g. 2 GPUs → --tp_size 2 --ep_size 1)

--tp_size 1 \

--ep_size 1 \

--trust_remote_code \

--extra_llm_api_options ${EXTRA_LLM_API_FILE}

🤖 Prompt for AI Agents

In examples/models/core/deepseek_v3/quick-start-recipe-for-deepseek-r1-on-trt-llm.md around lines 99 to 111, the flags --tp_size 8 and --ep_size 8 imply a total of 64 GPUs, which is likely beyond the capacity of most users running a quick-start. To fix this, either lower the default values of --tp_size and --ep_size to reflect a smaller GPU count or add a clear note explicitly stating that these settings require 64 GPUs and the server will abort if insufficient GPUs are available.

yuxianq · 2025-08-06T08:41:23Z

/bot run

tensorrt-cicd · 2025-08-06T08:46:25Z

PR_Github #14276 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-06T13:50:10Z

PR_Github #14276 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10781 completed with status: 'FAILURE'

litaotju · 2025-08-06T14:13:49Z

Bypass and merging. The doc change won't affect any CI. The failures are known in existing CI.

Signed-off-by: Yuxian Qiu <[email protected]>

doc: Add DeepSeek R1 deployment guide.

423692c

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq requested a review from nv-guomingz August 3, 2025 16:10

coderabbitai bot reviewed Aug 3, 2025

View reviewed changes

Fix doc.

ecbd20b

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

yuxianq mentioned this pull request Aug 4, 2025

[None][doc] Create deployment guide for Llama 3.3 70B FP8 and NVFP4 #6543

Open

Address comments.

531a7ca

Signed-off-by: Yuxian Qiu <[email protected]>

nv-guomingz requested a review from chenopis August 4, 2025 02:40

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

litaotju reviewed Aug 4, 2025

View reviewed changes

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

Address comments.

d94211e

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

yuxianq changed the title ~~doc: Add DeepSeek R1 deployment guide.~~ doc: [TRTLLM-6859] Add DeepSeek R1 deployment guide. Aug 4, 2025

Update config.

37f5d3c

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

Update config.

2ee475a

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

yuxianq changed the title ~~doc: [TRTLLM-6859] Add DeepSeek R1 deployment guide.~~ [TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. Aug 4, 2025

Update results.

e6696e1

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

nv-guomingz reviewed Aug 4, 2025

View reviewed changes

examples/models/core/deepseek_v3/deployment-guide-for-trt-llm-deepseek-r1.md Outdated Show resolved Hide resolved

Update config and result.

9ecd493

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

Update config and result.

b3f7dfe

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot reviewed Aug 4, 2025

View reviewed changes

Rename doc.

efa0cf4

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq requested a review from a team as a code owner August 6, 2025 08:22

yuxianq requested review from QiJune and kevinch-nv August 6, 2025 08:22

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

Merge branch 'main' into dsr1-deployment-guide

d534bb5

litaotju approved these changes Aug 6, 2025

View reviewed changes

litaotju merged commit 3a71ddf into NVIDIA:main Aug 6, 2025
3 of 4 checks passed

jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025

[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. (NVIDIA#6579)

74d8ba1

Signed-off-by: Yuxian Qiu <[email protected]>

coderabbitai bot mentioned this pull request Aug 13, 2025

[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM #6853

Merged

coderabbitai bot mentioned this pull request Aug 29, 2025

[None] [doc] Update DeepSeek example doc #7358

Merged

1 task

[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. #6579

[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. #6579

Uh oh!

Conversation

yuxianq commented Aug 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yuxianq commented Aug 4, 2025

Uh oh!

tensorrt-cicd commented Aug 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 4, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

yuxianq commented Aug 6, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

yuxianq commented Aug 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 3, 2025 •

edited

Loading