You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/run_kvbm_in_trtllm.md
+3-7Lines changed: 3 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest
27
27
> - KVBM only supports TensorRT-LLM’s PyTorch backend.
28
28
> - To enable disk cache offloading, you must first enable a CPU memory cache offloading.
29
29
> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
30
-
> - KVBM requires TensorRT-LLM at commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6 or newer.
30
+
> - KVBM requires TensorRT-LLM v1.1.0rc5 or newer.
31
31
> - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.
32
32
33
33
## Quick Start
@@ -38,12 +38,8 @@ To use KVBM in TensorRT-LLM, you can follow the steps below:
38
38
# start up etcd for KVBM leader/worker registration and discovery
39
39
docker compose -f deploy/docker-compose.yml up -d
40
40
41
-
# Build a container that includes TensorRT-LLM and KVBM. Note: KVBM integration is only available in TensorRT-LLM commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer.
42
-
# When building with the --tensorrtllm-commit option, you may notice that https://github.com keeps prompting for a username and password.
43
-
# This happens because cloning TensorRT-LLM can hit GitHub’s rate limit.
44
-
# To work around this, you can keep pressing "Enter" or "Return.".
45
-
# Setting "export GIT_LFS_SKIP_SMUDGE=1" may also reduce the number of prompts.
0 commit comments