[doc] Update perf_overview.md for release 0.21 #6270

zbpatel · 2025-07-22T21:58:06Z

Summary by CodeRabbit

Documentation
- Updated performance overview with new reference link to NVIDIA benchmarking blog.
- Revised and reformatted throughput tables for Llama models, reflecting updated GPU hardware and configurations.
- Added notes on recent performance issues and version recommendations.
- Clarified benchmark data and runtime details in tables.
- Updated benchmark data version references.

Description

This MR updates the perf_overview.md document with benchmarks from the 0.21 release.
In addition to updating the numbers, I added a link to @FrankD412 's blog post on benchmarking with trtllm-bench

Test Coverage

N/A doc

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-07-22T21:58:15Z

Walkthrough

The documentation for performance benchmarking was updated to include a new NVIDIA blog reference, revised throughput tables with updated GPU models and tensor parallel sizes, and adjusted performance values. Tables were reformatted for compactness, notes added about Llama 4 performance issues on short sequences, version references updated, and minor clarifications made to benchmark details.

Changes

File(s)	Change Summary
Performance Documentation `docs/source/performance/perf-overview.md`	Added NVIDIA blog link; revised throughput tables for Llama models with updated GPUs (GB200, GH200, H100), TP sizes, and throughput; reformatted tables; added notes on Llama 4 performance issues and version updates; clarified runtime input/output lengths.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

doc: update release notes #6324: Updates documentation on LLaMA4 performance and known issues in version 0.21, complementing benchmark table revisions and notes in this PR.
doc: update release notes #6438: Clarifies scope of Llama 4 performance regression in release notes, related to the performance and hardware details updated here.

Suggested labels

Documentation

Suggested reviewers

laikhtewari
FrankD412

Poem

🐇 In docs where numbers dance and play,
New GPUs and tables lead the way.
Llama strides with speed anew,
Notes to guide what we must do.
Through benchmarks bright, the rabbits cheer—
Performance tales, now crystal clear!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Signed-off-by: zpatel <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

docs/source/performance/perf-overview.md (8)
34-38: Table header is hard to parse – consider collapsing the multi-row header

The current five rows of “empty” cells render as a single table with several nearly-blank rows, which looks broken in most Markdown renderers and makes copy-pasting awkward. A single-row header is clearer:
-|                          | GPU:   | B200      | GB200 180GB   |
-|:-----------------------------|:---|:----------|:--------------|
-|    | TP Size   | 1      | 1          |
-| ISL, OSL |    |           |               |
-|                              |    |           |               |
+| ISL | OSL | TP Size | B200 | GB200 180GB |
+|:--:|:--:|:------:|------:|------------:|
This keeps the same information while eliminating the “blank” rows.

53-57: Same multi-row-header issue as above

The 405 B FP4 table repeats the pattern that causes rendering problems. Consider applying the same single-row-header refactor for consistency.

80-96: Large 8 B FP8 table: readability could be improved

With three GPUs in one table, a short caption or footnote explaining why all TP sizes are “1” would help readers unfamiliar with the context.

99-115: 70 B FP8 table header spacing

Minor: “GPU:” is only useful if there is a second descriptor. Dropping the colon keeps the heading tighter:
-|                        | GPU:     | H100 80GB HBM3   | H200 141GB HBM3   |
+|                        | GPU      | H100 80GB HBM3   | H200 141GB HBM3   |
117-133: Lint failure (MD058) – add a blank line before the table

markdownlint flags this table because it is not surrounded by blank lines.
-#### Llama 3.1 405B FP8
-|                         | GPU:    | H100 80GB HBM3   | H200 141GB HBM3   |
+#### Llama 3.1 405B FP8
+
+|                         | GPU     | H100 80GB HBM3   | H200 141GB HBM3   |
136-136: Style: turn the long sentence into a blockquote or note for better emphasis
-*Performance for Llama 4 on sequence lengths less than 8,192 tokens is affected by an issue introduced in v0.21. To reproduce the Llama 4 performance noted here, please use v0.20
+> **NOTE:** An issue introduced in v0.21 affects Llama 4 performance for sequence lengths **< 8 192** tokens. To reproduce the numbers below, run **v0.20**.
227-249: YAML options block: shortcode for long lists

Showing the full cuda_graph_batch_sizes list clutters the doc. Consider collapsing with an ellipsis to keep focus on the relevant flags:
use_cuda_graph: true
cuda_graph_padding_enabled: true
cuda_graph_batch_sizes: [1, 2, 4, 8, 16, … , 4096, 8192]
This still conveys the idea while shortening ~20 lines.

10-12: Typo: “situaiton” → “situation”
-Tuning batch sizes, parallelism configurations, and other options may lead to improved performance depending on your situaiton.
+Tuning batch sizes, parallelism configurations, and other options may lead to improved performance depending on your situation.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a55c631 and b8d1a6d.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (4 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

117-117: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/performance/perf-overview.md (1)

15-16: Blog link addition looks good

The cross-reference to FrankD412’s benchmarking post is a valuable pointer for readers who want deeper guidance. Nothing to change here.

zbpatel · 2025-07-24T05:19:08Z

I just pushed the tables with v0.21 numbers, there are 2 minor TODOs left:

If we have an issue tracking the Maverick TP8 work, I would like to link that here.
Update the dataset table with the increased values for v0.21 (working on this now)

zbpatel · 2025-07-24T05:36:22Z

I just pushed the tables with v0.21 numbers, there are 2 minor TODOs left:

If we have an issue tracking the Maverick TP8 work, I would like to link that here.

Update the dataset table with the increased values for v0.21 (working on this now)

Ok turns out I was wrong about 2, we kept these values the same.

docs/source/performance/perf-overview.md

juney-nvidia

Overall looks good to me.

Signed-off-by: zpatel <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

docs/source/performance/perf-overview.md (2)

34-49: Missing * marker requested in prior review
A previous reviewer (see “laikhtewari” comment) asked for an asterisk next to every updated number. These FP4-70B figures were changed but still lack the marker.

53-68: Same asterisk request applies here
The 405B-FP4 table is also updated and still omits the requested asterisks.

🧹 Nitpick comments (4)

docs/source/performance/perf-overview.md (4)

80-96: Insert a blank line before the table to satisfy MD058
markdownlint flags this because the table immediately follows the heading.

-#### Llama 3.1 8B FP8
-|                          | GPU:   | GH200   | H100 80GB HBM3   | H200 141GB HBM3   |
+#### Llama 3.1 8B FP8
+
+|                          | GPU:   | GH200   | H100 80GB HBM3   | H200 141GB HBM3   |

99-115: Blank line needed before this table as well
Same MD058 issue as above.

-#### Llama 3.3 70B FP8
-|                        | GPU:     | H100 80GB HBM3   | H200 141GB HBM3   |
+#### Llama 3.3 70B FP8
+
+|                        | GPU:     | H100 80GB HBM3   | H200 141GB HBM3   |

117-133: Blank line missing before FP8-405B table
This is the specific line markdownlint reported (MD058).

-#### Llama 3.1 405B FP8
-|                         | GPU:    | H100 80GB HBM3   | H200 141GB HBM3   |
+#### Llama 3.1 405B FP8
+
+|                         | GPU:    | H100 80GB HBM3   | H200 141GB HBM3   |

136-138: Add blank line between the note and the table
Keeps consistency and resolves MD058 for this section.

-Note: Performance for Llama 4 on sequence lengths less than 8,192 tokens is affected by an issue introduced in v0.21. To reproduce the Llama 4 performance noted here, please use v0.20
-|                          | GPU    | H200 141GB HBM3   | H100 80GB HBM3   |
+Note: Performance for Llama 4 on sequence lengths less than 8,192 tokens is affected by an issue introduced in v0.21. To reproduce the Llama 4 performance noted here, please use v0.20
+
+|                          | GPU    | H200 141GB HBM3   | H100 80GB HBM3   |

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8d1a6d and c676791.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (4 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

117-117: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

docs/source/performance/perf-overview.md (2)

15-16: Nice contextual link added
The extra reference to the trtllm-bench blog post is helpful for users who want deeper tuning guidance.

227-249: Appreciate the concrete llm_options.yml example
Including the exact CUDA-graph batch-size list makes reproduction straightforward.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/source/performance/perf-overview.md (1)
10-12: Fix typo in introductory paragraph

situaiton ➜ situation
Minor, but it appears early in the doc and slightly detracts from polish.
-Tuning batch sizes, parallelism configurations, and other options may lead to improved performance depending on your situaiton.
+Tuning batch sizes, parallelism configurations, and other options may lead to improved performance depending on your situation.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c676791 and 9613485.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (4 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

117-117: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

docs/source/performance/perf-overview.md

zbpatel · 2025-07-29T06:17:34Z

@laikhtewari do you mind reviewing / approving changes? I think the PR is blocked waiting for an approval from you.

docs/source/performance/perf-overview.md

Signed-off-by: zpatel <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/source/performance/perf-overview.md (1)
125-142: Surround the 405B-FP8 table with blank lines (MD058)

The table starting on Line 126 is not preceded and followed by blank lines, causing the MD058 blanks-around-tables linter failure.
-#### Llama 3.1 405B FP8
-|                         | GPU:    | H100   | H200   |
+#### Llama 3.1 405B FP8
+
+|                         | GPU:    | H100   | H200   |
@@
-| 20000, 2000                  |    | 363.27           | 509.87            |
+| 20000, 2000                  |    | 363.27           | 509.87            |
+

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9613485 and b1d010b.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (6 hunks)

🧰 Additional context used

🧠 Learnings (1)

docs/source/performance/perf-overview.md (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

26-26: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)

126-126: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

docs/source/performance/perf-overview.md

Signed-off-by: zpatel <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/source/performance/perf-overview.md (1)
126-133: Insert a blank line before the table to satisfy MD058

The table following “#### Llama 3.1 405B FP8” is not preceded by a blank line, triggering MD058.
-#### Llama 3.1 405B FP8
-|                         | GPU:    | H100   | H200   |
+#### Llama 3.1 405B FP8
+
+|                         | GPU:    | H100   | H200   |
No other adjustments are needed—there’s already a trailing blank line after the table.

🧹 Nitpick comments (1)

docs/source/performance/perf-overview.md (1)

26-35: Clarify GH200 memory figures to avoid reader confusion

Line 30 mixes GPU memory (96 GB HBM3) with system memory (480 GB LPDDR5X) in a single parenthetical, which previously generated questions.
Consider making the split explicit (e.g. “GH200 96 GB HBM3 GPU / 480 GB LPDDR5X system”) to prevent misinterpretation.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1d010b and 9271f09.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

docs/source/performance/perf-overview.md (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

127-127: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/performance/perf-overview.md (1)

15-16: Blog-post reference looks good

Adds useful context for users wanting deeper benchmarking guidance.

docs/source/performance/perf-overview.md

Signed-off-by: zpatel <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/source/performance/perf-overview.md (1)
128-145: Surround the table with blank lines to satisfy MD058

Markdown-lint requires a blank line before and after tables. Insert one above line 129 and one after line 144.
-#### Llama 3.1 405B FP8
-|                         | GPU:    | H100   | H200   |
+#### Llama 3.1 405B FP8
+
+|                         | GPU:    | H100   | H200   |
 ...
-| 20000, 2000                  |    | 363.27           | 509.87            |
+
+| 20000, 2000                  |    | 363.27           | 509.87            |
+
This unblocks CI and improves readability.

🧹 Nitpick comments (1)

docs/source/performance/perf-overview.md (1)
26-35: Clarify GH200 memory spec to avoid renewed confusion

The parenthetical “(480GB LPDDR5X)” re-introduces the same ambiguity that was removed in an earlier commit (see Juney’s prior comment). Readers may again assume 480 GB is GPU HBM rather than system RAM.
- - GH200 96GB HBM3 (480GB LPDDR5X)
+ - GH200 96 GB HBM3 (Grace-CPU 480 GB LPDDR5X system RAM)
Adding a brief note or splitting GPU vs. system memory explicitly will prevent misinterpretation.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9271f09 and 79daeb1.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

docs/source/performance/perf-overview.md (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

129-129: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/performance/perf-overview.md (1)

15-16: 👍 Good contextual reference added

The link to the NVIDIA blog post gives readers a clear next step for deeper benchmarking guidance. No further action needed here.

docs/source/performance/perf-overview.md

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/source/performance/perf-overview.md (1)
15-16: Nice contextual reference – tiny readability polish suggested

The added blog link is great for readers who want deeper benchmarking guidance.
Consider inserting a comma after “trtllm-bench” so the sentence flows a bit better:
-For more information on benchmarking with `trtllm-bench` see this NVIDIA
+For more information on benchmarking with `trtllm-bench`, see this NVIDIA

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79daeb1 and 6f44a99.

📒 Files selected for processing (1)

docs/source/performance/perf-overview.md (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

docs/source/performance/perf-overview.md (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🪛 markdownlint-cli2 (0.17.2)

docs/source/performance/perf-overview.md

129-129: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

docs/source/performance/perf-overview.md (1)

26-32: GH200 bullet re-introduces the 480 GB figure that previously caused confusion

Past discussion removed the “480 GB” notation because it mixes host LPDDR5X capacity
with GPU HBM capacity and led to mis-interpretation (see Juney’s prior comment).
The new bullet adds it back:
- GH200 96GB HBM3 (480GB LPDDR5X)
Unless the host-memory number is essential here (it is not referenced anywhere in
the tables), please drop the parenthetical to stay consistent with earlier fixes
and the other bullets:
- - GH200 96GB HBM3 (480GB LPDDR5X)
+ - GH200 96GB HBM3

docs/source/performance/perf-overview.md

FrankD412 · 2025-07-30T21:29:14Z

/bot skip --comment "Documentation update."

tensorrt-cicd · 2025-07-30T22:01:39Z

PR_Github #13575 [ skip ] triggered by Bot

tensorrt-cicd · 2025-07-30T22:14:16Z

PR_Github #13575 [ skip ] completed with state SUCCESS
Skipping testing for commit 6f44a99

Signed-off-by: zpatel <[email protected]>

Signed-off-by: zpatel <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

Signed-off-by: zpatel <[email protected]>

zbpatel requested review from a team as code owners July 22, 2025 21:58

zbpatel requested review from juney-nvidia, FrankD412 and liji-nv July 22, 2025 21:58

zbpatel changed the base branch from main to release/0.21 July 22, 2025 21:58

zbpatel requested a review from a team as a code owner July 22, 2025 21:58

zbpatel added 3 commits July 23, 2025 21:02

add link to Frank's trtllm-bench blog post to perf_overview.md

af51914

Signed-off-by: zpatel <[email protected]>

add initial 0.21 perf tables

ab941ca

Signed-off-by: zpatel <[email protected]>

add 405b maverick perf data and note

b8d1a6d

Signed-off-by: zpatel <[email protected]>

zbpatel force-pushed the user/zpatel/perf_overview_0.21 branch from 4052b26 to b8d1a6d Compare July 24, 2025 04:57

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

zbpatel changed the title ~~[doc] WIP - Update perf_overview.md for release 0.21~~ [doc] Update perf_overview.md for release 0.21 Jul 24, 2025

zbpatel requested a review from laikhtewari July 24, 2025 05:20

laikhtewari requested changes Jul 24, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

juney-nvidia reviewed Jul 24, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

juney-nvidia approved these changes Jul 24, 2025

View reviewed changes

zbpatel added 2 commits July 23, 2025 23:39

clarify working on maverick perf note

c676791

Signed-off-by: zpatel <[email protected]>

change GH200 memory to reflect only the GPU vram

9613485

Signed-off-by: zpatel <[email protected]>

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

zbpatel requested a review from laikhtewari July 25, 2025 20:26

FrankD412 approved these changes Jul 29, 2025

View reviewed changes

QiJune reviewed Jul 29, 2025

View reviewed changes

docs/source/performance/perf-overview.md Show resolved Hide resolved

zbpatel added 2 commits July 29, 2025 18:55

add hardware heading with information about benchmarked gpu variants

a688926

Signed-off-by: zpatel <[email protected]>

add hardware heading with information about benchmarked gpu variants

b1d010b

Signed-off-by: zpatel <[email protected]>

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

zbpatel added 2 commits July 29, 2025 19:12

add gh200 information, remove some colons

67d20a7

Signed-off-by: zpatel <[email protected]>

remove 20k/2k data point from llama 4 maverick benchmarks

9271f09

Signed-off-by: zpatel <[email protected]>

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

zbpatel mentioned this pull request Jul 30, 2025

[Performance issue] Unable to Reproduce the Throughput of Llama 3.1 8B FP8 on H100 #6294

Closed

zbpatel added 2 commits July 29, 2025 19:27

add space after headers based on code rabbit lint feedback

79daeb1

Signed-off-by: zpatel <[email protected]>

add text header to code block

6f44a99

Signed-off-by: zpatel <[email protected]>

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

docs/source/performance/perf-overview.md Outdated Show resolved Hide resolved

zbpatel self-assigned this Jul 30, 2025

zbpatel added the Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity. label Jul 30, 2025

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

docs/source/performance/perf-overview.md Show resolved Hide resolved

FrankD412 enabled auto-merge (squash) July 30, 2025 21:30

juney-nvidia disabled auto-merge July 31, 2025 04:13

juney-nvidia merged commit 3bf405f into NVIDIA:release/0.21 Jul 31, 2025
3 checks passed

dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Aug 1, 2025

[doc] Update perf_overview.md for release 0.21 (NVIDIA#6270)

7569eb0

Signed-off-by: zpatel <[email protected]>

dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Aug 1, 2025

[doc] Update perf_overview.md for release 0.21 (NVIDIA#6270)

cc256b4

Signed-off-by: zpatel <[email protected]>

dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Aug 4, 2025

[doc] Update perf_overview.md for release 0.21 (NVIDIA#6270)

3b63938

Signed-off-by: zpatel <[email protected]>

dc3671 pushed a commit that referenced this pull request Aug 4, 2025

[doc] Update perf_overview.md for release 0.21 (#6270)

18d1941

Signed-off-by: zpatel <[email protected]>

lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025

[doc] Update perf_overview.md for release 0.21 (NVIDIA#6270)

ea770f1

Signed-off-by: zpatel <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025

[doc] Update perf_overview.md for release 0.21 (NVIDIA#6270)

61c5d29

Signed-off-by: zpatel <[email protected]>

[doc] Update perf_overview.md for release 0.21 #6270

[doc] Update perf_overview.md for release 0.21 #6270

Uh oh!

Conversation

zbpatel commented Jul 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

zbpatel commented Jul 24, 2025

Uh oh!

zbpatel commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

juney-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zbpatel commented Jul 29, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FrankD412 commented Jul 30, 2025

Uh oh!

tensorrt-cicd commented Jul 30, 2025

Uh oh!

tensorrt-cicd commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

zbpatel commented Jul 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 22, 2025 •

edited

Loading