Skip to content

Commit 3b6dbef

Browse files
qimcisPeaBrane
andauthored
feat: Update docs to indicate need to use consistent hashing for KV events in backend engines (#2981)
Signed-off-by: PeaBrane <[email protected]> Co-authored-by: Yan Ru Pei <[email protected]>
1 parent 0006106 commit 3b6dbef

File tree

4 files changed

+19
-3
lines changed

4 files changed

+19
-3
lines changed

benchmarks/router/run_engines.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,8 @@ for i in $(seq 1 $NUM_WORKERS); do
125125
"${EXTRA_ARGS[@]}"
126126
else
127127
echo "[Worker-$i] Using GPUs: $GPU_DEVICES"
128-
# Run vLLM engine (exec with env for proper syntax)
129-
exec env CUDA_VISIBLE_DEVICES=$GPU_DEVICES python -m dynamo.vllm \
128+
# Run vLLM engine with PYTHONHASHSEED=0 for deterministic event IDs in KV-aware routing
129+
exec env PYTHONHASHSEED=0 CUDA_VISIBLE_DEVICES=$GPU_DEVICES python -m dynamo.vllm \
130130
--model "$MODEL_PATH" \
131131
--endpoint dyn://test.vllm.generate \
132132
--tensor-parallel-size $TENSOR_PARALLEL_SIZE \

components/backends/sglang/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,4 +237,4 @@ We currently provide deployment examples for Kubernetes and SLURM.
237237
- **[Deploying Dynamo with SGLang on Kubernetes](deploy/README.md)**
238238

239239
## SLURM
240-
- **[Deploying Dynamo with SGLang on SLURM](slurm_jobs/README.md)**
240+
- **[Deploying Dynamo with SGLang on SLURM](slurm_jobs/README.md)**

components/backends/vllm/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,18 @@ See `args.py` for the full list of configuration options and their defaults.
168168

169169
The [documentation](https://docs.vllm.ai/en/v0.9.2/configuration/serve_args.html?h=serve+arg) for the vLLM CLI args points to running 'vllm serve --help' to see what CLI args can be added. We use the same argument parser as vLLM.
170170

171+
### Hashing Consistency for KV Events
172+
173+
When using KV-aware routing, ensure deterministic hashing across processes to avoid radix tree mismatches. Choose one of the following:
174+
175+
- Set `PYTHONHASHSEED=0` for all vLLM processes when relying on Python's builtin hashing for prefix caching.
176+
- If your vLLM version supports it, configure a deterministic prefix caching algorithm, for example:
177+
178+
```bash
179+
vllm serve ... --enable-prefix-caching --prefix-caching-algo sha256
180+
```
181+
See the high-level notes in [KV Cache Routing](../../../docs/architecture/kv_cache_routing.md) on deterministic event IDs.
182+
171183
## Request Migration
172184

173185
You can enable [request migration](../../../docs/architecture/request_migration.md) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:

docs/architecture/kv_cache_routing.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,10 @@ The two types of events are:
203203

204204
The publisher can be initialized and used through C bindings or Python bindings.
205205

206+
### Deterministic Event IDs
207+
208+
For KV-aware routing to work across multiple workers and restarts, engines must emit deterministic block identifiers in KV events. Ensure all workers use identical engine versions/configuration so that block IDs for the same token content remain consistent. If your engine relies on Python's builtin `hash()` for any event IDs, set `PYTHONHASHSEED=0`; otherwise this setting has no effect. The router recomputes local block hashes from tokens for matching, but parent/child links and removals depend on engine-provided IDs being stable.
209+
206210
### KVIndexer
207211
The KVIndexer builds and maintains a global view of cached blocks in a prefix tree. We modify the original prefix tree by also storing the worker id on each node. This is so we can return the number of matched blocks for each worker.
208212

0 commit comments

Comments
 (0)