-
Notifications
You must be signed in to change notification settings - Fork 603
build: update trtllm to v1.1.0rc5 to enable trtllm + KVBM integration #3119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
57c0cbd
to
97ec7f6
Compare
WalkthroughThis PR updates TensorRT-LLM references from 1.1.0rc3 to 1.1.0rc5 across docs and config, bumps the default experimental commit hash in the build script, and simplifies the KVBM guide’s build instructions to a single command without commit pinning. Changes
Sequence Diagram(s)Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
Signed-off-by: richardhuo-nv <[email protected]>
97ec7f6
to
c62a71e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
docs/support_matrix.md (1)
51-53
: Call out Python 3.12 constraint for KVBM more explicitly.You already note KVBM is Python 3.12-only. Consider adding “and thus requires Ubuntu 24.04 image” in this file to reduce user footguns when following the guide.
README.md (2)
200-204
: Container version mapping example updated correctly to rc5.LGTM. Consider adding a brief note that future TRT-LLM releases may change the required NGC base tag and to always check BASE_TAG in Dockerfile.multi.
208-217
: Optional: surface cuda-python pin in package extras instead of README-only.You pin cuda-python in the README instructions; moving this into the trtllm extra would make pip installs more reproducible.
Proposed change in pyproject.toml shown in that file’s comment.
pyproject.toml (1)
49-52
: Add cuda-python pin to trtllm extra to match README prerequisites.Keeps installations consistent across docs and packaging.
Apply:
trtllm =[ "uvloop", - "tensorrt-llm==1.1.0rc5", + "tensorrt-llm==1.1.0rc5", + "cuda-python>=12,<13", ]container/build.sh (3)
92-96
: Document which TRT-LLM tag rc5 commit this corresponds to.Add a short comment tying DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT to the rc5 cut to aid future bumps.
Apply:
-DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="0c9430e5a530ba958fc9dca561a3ad865ad9f492" +DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="0c9430e5a530ba958fc9dca561a3ad865ad9f492" +# ^ Commit known-good for TensorRT-LLM v1.1.0rc5 (used when --use-default-experimental-tensorrtllm-commit is passed)
101-103
: Guard against providing both --tensorrtllm-commit and --tensorrtllm-pip-wheel.Right now, if both are set (without the “use default” flag), commit is silently ignored. Fail fast to avoid confusion.
Apply near option parsing completion (after get_options) or at the start of the TRTLLM block:
if [[ $FRAMEWORK == "TRTLLM" ]]; then + if [ -n "$TRTLLM_COMMIT" ] && [ -n "$TENSORRTLLM_PIP_WHEEL" ]; then + echo "ERROR: Provide only one of --tensorrtllm-commit or --tensorrtllm-pip-wheel, not both." + exit 1 + fi
540-575
: Minor: echo chosen default when neither commit nor wheel is provided.Helps users see that rc5 wheel path was selected.
Apply:
if [ -z "$TENSORRTLLM_PIP_WHEEL" ] && [ -z "$TRTLLM_COMMIT" ]; then TENSORRTLLM_PIP_WHEEL="$DEFAULT_TENSORRTLLM_PIP_WHEEL" + echo "INFO: Neither --tensorrtllm-commit nor --tensorrtllm-pip-wheel provided; using default: ${TENSORRTLLM_PIP_WHEEL}" fi
docs/guides/run_kvbm_in_trtllm.md (3)
20-22
: Fix typo: “mange” → “manage”.Small doc polish.
Apply:
-This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm). +This guide explains how to leverage KVBM (KV Block Manager) to manage KV cache and do KV offloading in TensorRT-LLM (trtllm).
41-46
: Nice simplification to a single build command; add Python 3.12 reminder inline.Since KVBM is Python 3.12-only (per support matrix), add a one-liner so users don’t miss it.
Apply:
-# Build a container that includes TensorRT-LLM and KVBM. +# Build a container that includes TensorRT-LLM and KVBM. +# Note: KVBM requires Python 3.12 (Ubuntu 24.04 base). ./container/build.sh --framework trtllm --enable-kvbm
86-97
: Fix typos in example request payload.A few misspellings in the sample prompt.
Apply:
- "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden." + "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time. You are an intrepid explorer who has stumbled upon an ancient map hinting that Aeloria holds a secret so profound it could reshape reality. Your task: Develop a detailed character background, motivations, skills and weaknesses, and any personal connections to the city or its legends."
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
lib/bindings/python/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (5)
README.md
(1 hunks)container/build.sh
(2 hunks)docs/guides/run_kvbm_in_trtllm.md
(2 hunks)docs/support_matrix.md
(1 hunks)pyproject.toml
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-30T00:34:35.810Z
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
Applied to files:
docs/support_matrix.md
📚 Learning: 2025-08-28T20:43:54.701Z
Learnt from: bhuvan002
PR: ai-dynamo/dynamo#2702
File: components/backends/trtllm/src/dynamo/trtllm/logits_processing/adapter.py:7-13
Timestamp: 2025-08-28T20:43:54.701Z
Learning: CI import failures for optional dependencies like tensorrt_llm can be resolved by removing __init__.py files and restructuring imports, rather than wrapping imports in try/except blocks.
Applied to files:
pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - sglang
🔇 Additional comments (3)
pyproject.toml (1)
49-52
: Pin bump to tensorrt-llm==1.1.0rc5 is correct and minimal.No issues spotted.
docs/guides/run_kvbm_in_trtllm.md (1)
30-32
: Good: explicit minimum version (rc5) for KVBM.Nice simplification vs. a specific commit hash.
docs/support_matrix.md (1)
70-74
: Version bump to TensorRT-LLM rc5 looks good — repo-wide consistency verified.No matches for 1.1.0rc3; 1.1.0rc5 appears in pyproject.toml, README.md, docs/guides/run_kvbm_in_trtllm.md, docs/support_matrix.md, container/build.sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming gitlab CI TRTLLM tests pass
https://gitlab-master.nvidia.com/dl/ai-dynamo/dynamo-ci/-/pipelines/35145191 |
…#3119) Signed-off-by: richardhuo-nv <[email protected]>
Overview:
Update the default trtllm to v1.1.0rc5, to be able to have the trtllm + KVBM integration by default
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit