-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[https://nvbugs/5423962][fix] Address broken links #6531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Andrew Chen <[email protected]>
📝 WalkthroughWalkthroughThis update makes minor corrections and adjustments to documentation. Changes include updating or correcting hyperlinks to models, repositories, and images, simplifying references and removing outdated or redundant links, and making small textual edits in release notes and developer guides. No code or logic changes are involved. Changes
Sequence Diagram(s)No sequence diagram generated as changes are limited to documentation and do not affect control flow or feature logic. Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested labels
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/release-notes.md (1)
1048-1048
: Clarify the “OPT” reference and tighten phrasingThe trailing “and OPT” is ambiguous (is it the model, the optimizer, or something else?) and the sentence now scans awkwardly. Consider an explicit, parallel construction:
- - New Python builder API and `trtllm-build` command and OPT + - Added Python builder API, `trtllm-build` command, and OPT supportdocs/source/advanced/speculative-decoding.md (2)
63-63
: Link is now too generic; point users directly to the exampleDropping the deep link makes the reference harder to follow. Suggest linking to a stable location such as the dedicated README or example script:
- An example of this orchestration process can be found in the [TensorRT-LLM Triton backend](https://github.com/triton-inference-server/tensorrtllm_backend). + An example orchestration script is available in the Triton backend repository’s + [draft-target-model client example](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/client/python/draft_target_model_client.py).Ensure the chosen path is future-proof (e.g., link to a README anchor) to avoid another broken link cycle.
175-175
: Fix casing of “PyTorch” and tighten wordingMinor style/grammar tweak:
-[Disaggregated Serving](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/advanced/disaggregated-service.md) with EAGLE3 using the two model approach is supported in the Pytorch backend. +[Disaggregated Serving](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/advanced/disaggregated-service.md) with EAGLE-3 using the two-model approach is supported in the PyTorch backend.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/advanced/expert-parallelism.md
(1 hunks)docs/source/advanced/speculative-decoding.md
(2 hunks)docs/source/blogs/Falcon180B-H200.md
(1 hunks)docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
(1 hunks)docs/source/performance/perf-benchmarking.md
(1 hunks)docs/source/release-notes.md
(2 hunks)docs/source/torch.md
(0 hunks)
💤 Files with no reviewable changes (1)
- docs/source/torch.md
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/blogs/Falcon180B-H200.md
docs/source/performance/perf-benchmarking.md
docs/source/advanced/speculative-decoding.md
docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
docs/source/release-notes.md
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Applied to files:
docs/source/advanced/speculative-decoding.md
📚 Learning: in the tensorrt-llm waive list merging system, removed lines are always located at the end of the me...
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
Applied to files:
docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Applied to files:
docs/source/release-notes.md
🪛 LanguageTool
docs/source/advanced/expert-parallelism.md
[grammar] ~7-~7: Ensure spelling is correct
Context: ...ns arrive, the router layer selects the TopK experts for each token. The correspondi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
docs/source/blogs/Falcon180B-H200.md (1)
36-36
: Image may not render –/blob/<sha>/...
requires?raw=true
Unlike the earlier image on Line 20, this new
<img>
tag still uses the/blob/…
path but omits?raw=true
.
GitHub serves an HTML page for/blob/*
URLs; browsers therefore fail to treat the response as an image unless?raw=true
is appended (or the path is rewritten to/raw/…
). That would defeat the purpose of this “broken-link” fix.-<img src="https://github.com/NVIDIA/TensorRT-LLM/blob/5aec7af45fc0abd876fa68a9ae8c8cae084f3af3/docs/source/blogs/media/Falcon180B-H200_acc.png" +<img src="https://github.com/NVIDIA/TensorRT-LLM/blob/5aec7af45fc0abd876fa68a9ae8c8cae084f3af3/docs/source/blogs/media/Falcon180B-H200_acc.png?raw=true"Please verify all newly-pinned image links follow the same pattern.
docs/source/advanced/expert-parallelism.md (1)
7-7
: LGTM – link now lands on the model card instead of the JSON fileThe new Hugging Face URL is clearer for readers and mirrors the pattern used elsewhere in the docs.
docs/source/performance/perf-benchmarking.md (1)
82-82
: Consistent naming but double-check availability of the instruct checkpoint
mistralai/Mixtral-8x7B-Instruct-v0.1
looks correct, but both the model card and its FP16 weights were only recently published. Please run an external check (or Hugging Face CLI) before merging to be sure the path is live; otherwise the link checker will flag it again.docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md (1)
128-128
: Header file reference is clearer – confirm file exists inmain
Good catch switching to
RoutingKernelTopK.cuh
; that’s the public header that actually defines the Top-K routing kernel.
Just ensure the file path (cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernelTopK.cuh
) really exists onmain
; an extra directory level or case-mismatch will re-introduce a broken link.docs/source/release-notes.md (1)
642-642
: LGTM – trimmed note reads cleanlyThe simplified Windows-specific known-issue line is accurate and concise.
d1f3186
to
0c2a183
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/advanced/expert-parallelism.md (3)
7-7
: Tighten wording & avoid duplicate brand name in reference textMinor polish will read more cleanly and match the official model name:
-Mixture of Experts (MoE) architectures have been used widely recently, such as [Mistral Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). +Mixture of Experts (MoE) architectures have become widely used recently, such as [Mixtral 8×7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) from Mistral AI.
26-26
: Correct script name typo (convert_coneckpoint.py
→convert_checkpoint.py
)-when calling `convert_coneckpoint.py`. +when calling `convert_checkpoint.py`.
30-31
: Remove stray closing parenthesis that breaks Markdown rendering-.../config.json). -) +.../config.json).
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/advanced/expert-parallelism.md
(1 hunks)docs/source/advanced/speculative-decoding.md
(2 hunks)docs/source/blogs/Falcon180B-H200.md
(1 hunks)docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
(1 hunks)docs/source/performance/perf-benchmarking.md
(1 hunks)docs/source/release-notes.md
(2 hunks)docs/source/torch.md
(0 hunks)
💤 Files with no reviewable changes (1)
- docs/source/torch.md
✅ Files skipped from review due to trivial changes (5)
- docs/source/release-notes.md
- docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
- docs/source/advanced/speculative-decoding.md
- docs/source/blogs/Falcon180B-H200.md
- docs/source/performance/perf-benchmarking.md
🧰 Additional context used
🪛 LanguageTool
docs/source/advanced/expert-parallelism.md
[grammar] ~7-~7: Ensure spelling is correct
Context: ...ns arrive, the router layer selects the TopK experts for each token. The correspondi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
0c2a183
to
cfa2d42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/source/advanced/expert-parallelism.md (1)
7-7
: Polish wording & acronym consistency in updated paragraphThe new Hugging Face link is correct, but the sentence can be tightened and made stylistically consistent:
- Prefer the acronym “MoE” over “MOE”.
- Use the conventional “top-k” (or “Top-k”) instead of “TopK”.
- “have been used widely recently” reads awkwardly; “have become widespread recently” flows better.
Optional patch:
-Mixture of Experts (MoE) architectures have been used widely recently, such as [Mistral Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). Specifically, MOE’s structure supports multiple parallel Feedforward Neural Network (FFN) layers (called experts) to replace the single FFN layer in the dense model. When tokens arrive, the router layer selects the TopK experts for each token. The corresponding hidden state of the token is then dispatched to the selected TopK experts, respectively. +Mixture-of-Experts (MoE) architectures have become widespread recently, with models such as [Mistral Mixtral 8×7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). Specifically, MoE’s structure supports multiple parallel feed-forward neural-network (FFN) layers (called experts) in place of the single FFN layer in a dense model. When tokens arrive, the router layer selects the top-k experts for each token, and the corresponding hidden state of each token is dispatched to those experts.Purely editorial—take it or leave it.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/advanced/expert-parallelism.md
(1 hunks)docs/source/advanced/speculative-decoding.md
(2 hunks)docs/source/blogs/Falcon180B-H200.md
(1 hunks)docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
(1 hunks)docs/source/performance/perf-benchmarking.md
(1 hunks)docs/source/release-notes.md
(2 hunks)docs/source/torch.md
(0 hunks)
💤 Files with no reviewable changes (1)
- docs/source/torch.md
✅ Files skipped from review due to trivial changes (4)
- docs/source/blogs/Falcon180B-H200.md
- docs/source/blogs/tech_blog/blog1_Pushing_Latency_Boundaries_Optimizing_DeepSeek-R1_Performance_on_NVIDIA_B200_GPUs.md
- docs/source/performance/perf-benchmarking.md
- docs/source/release-notes.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/source/advanced/speculative-decoding.md
🧰 Additional context used
🪛 LanguageTool
docs/source/advanced/expert-parallelism.md
[grammar] ~7-~7: Ensure spelling is correct
Context: ...ns arrive, the router layer selects the TopK experts for each token. The correspondi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
Accept feedback from coderabbitai Signed-off-by: Andrew Chen <[email protected]>
cfa2d42
to
8bfdb31
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@nv-guomingz does someone in @NVIDIA/trtllm-bench-reviewers need to approve this? |
@NVIDIA/trtllm-bench-reviewers How do I get blossom-ci to run? |
/bot run |
@FrankD412 @venkywonka How do I get this PR merged? It always seems to get stuck on the |
/bot run --stage-list "A10-Build_Docs" |
/bot skip --comment "Updates to documentation, skipping full CI" |
@FrankD412 I don't think I'm on the authorized person list. Is this something you can run for me? |
/bot skip --comment "Updates to documentation, skipping full CI" |
Let me give it a shot. |
@chenopis -- I've set it to automerge pending the bot. |
Thank you! |
PR_Github #14175 [ skip ] triggered by Bot |
PR_Github #14175 [ skip ] completed with state |
@FrankD412 can you try merging this again? |
Done -- it's set to auto merge |
/bot skip --comment "Updates to documentation, skipping full CI" |
PR_Github #14510 [ skip ] triggered by Bot |
PR_Github #14510 [ skip ] completed with state |
Fix some links from bug using link check report.
Dead Link Checker report - https___nvidia.github.io_TensorRT-LLM_index.html.pdf
Summary by CodeRabbit