Skip to content

Commit fb29bd5

Browse files
build: update trtllm to v1.1.0rc5 to enable trtllm + KVBM integration (#3119)
Signed-off-by: richardhuo-nv <[email protected]>
1 parent a8fd127 commit fb29bd5

File tree

5 files changed

+8
-12
lines changed

5 files changed

+8
-12
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ It is recommended to use [NGC PyTorch Container](https://catalog.ngc.nvidia.com/
199199

200200
> [!Note]
201201
> Ensure that you select a PyTorch container image version that matches the version of TensorRT-LLM you are using.
202-
> For example, if you are using `tensorrt-llm==1.1.0rc3`, use the PyTorch container image version `25.06`.
202+
> For example, if you are using `tensorrt-llm==1.1.0rc5`, use the PyTorch container image version `25.06`.
203203
> To find the correct PyTorch container version for your desired `tensorrt-llm` release, visit the [TensorRT-LLM Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) on GitHub. Switch to the branch that matches your `tensorrt-llm` version, and look for the `BASE_TAG` line to identify the recommended PyTorch container tag.
204204
205205
> [!Important]

container/build.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ TENSORRTLLM_PIP_WHEEL_DIR="/tmp/trtllm_wheel/"
8989
# TensorRT-LLM commit to use for building the trtllm wheel if not provided.
9090
# Important Note: This commit is not used in our CI pipeline. See the CI
9191
# variables to learn how to run a pipeline with a specific commit.
92-
DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="e81c50dbd2811ec858eccc2c71b5e7a330ff7e24"
92+
DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="0c9430e5a530ba958fc9dca561a3ad865ad9f492"
9393
TRTLLM_COMMIT=""
9494
TRTLLM_USE_NIXL_KVCACHE_EXPERIMENTAL="0"
9595
TRTLLM_GIT_URL=""
@@ -98,7 +98,7 @@ TRTLLM_GIT_URL=""
9898
TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple"
9999
# TODO: Remove the version specification from here and use the ai-dynamo[trtllm] package.
100100
# Need to update the Dockerfile.trtllm to use the ai-dynamo[trtllm] package.
101-
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.1.0rc3"
101+
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.1.0rc5"
102102
TENSORRTLLM_PIP_WHEEL=""
103103

104104

docs/guides/run_kvbm_in_trtllm.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest
2727
> - KVBM only supports TensorRT-LLM’s PyTorch backend.
2828
> - To enable disk cache offloading, you must first enable a CPU memory cache offloading.
2929
> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
30-
> - KVBM requires TensorRT-LLM at commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6 or newer.
30+
> - KVBM requires TensorRT-LLM v1.1.0rc5 or newer.
3131
> - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.
3232
3333
## Quick Start
@@ -38,12 +38,8 @@ To use KVBM in TensorRT-LLM, you can follow the steps below:
3838
# start up etcd for KVBM leader/worker registration and discovery
3939
docker compose -f deploy/docker-compose.yml up -d
4040

41-
# Build a container that includes TensorRT-LLM and KVBM. Note: KVBM integration is only available in TensorRT-LLM commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer.
42-
# When building with the --tensorrtllm-commit option, you may notice that https://github.com keeps prompting for a username and password.
43-
# This happens because cloning TensorRT-LLM can hit GitHub’s rate limit.
44-
# To work around this, you can keep pressing "Enter" or "Return.".
45-
# Setting "export GIT_LFS_SKIP_SMUDGE=1" may also reduce the number of prompts.
46-
./container/build.sh --framework trtllm --tensorrtllm-commit dcd110cfac07e577ce01343c455917832b0f3d5e --enable-kvbm
41+
# Build a container that includes TensorRT-LLM and KVBM.
42+
./container/build.sh --framework trtllm --enable-kvbm
4743

4844
# launch the container
4945
./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds

docs/support_matrix.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
6767
| **Build Dependency** | **Version** |
6868
| :------------------- | :------------------------------------------------------------------------------- |
6969
| **Base Container** | [25.03](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda-dl-base/tags) |
70-
| **TensorRT-LLM** | 1.1.0rc3 |
70+
| **TensorRT-LLM** | 1.1.0rc5 |
7171
| **NIXL** | 0.4.1 |
7272
| **vLLM** | 0.10.1.1 |
7373
| **SGLang** | 0.5.0rc2 |

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Repository = "https://github.com/ai-dynamo/dynamo.git"
4848
[project.optional-dependencies]
4949
trtllm =[
5050
"uvloop",
51-
"tensorrt-llm==1.1.0rc3",
51+
"tensorrt-llm==1.1.0rc5",
5252
]
5353

5454
vllm = [

0 commit comments

Comments
 (0)