[Doc] Add allocate_slots parameter docs (#29777)

maang-h · heheda12345 · web-flow · commit 5d91d2b292be · 2025-12-02T23:23:09.000Z
Signed-off-by: maang &lt;maang_h@163.com&gt;
Signed-off-by: maang-h &lt;55082429+maang-h@users.noreply.github.com&gt;
Co-authored-by: Chen Zhang &lt;zhangch99@outlook.com&gt;
diff --git a/vllm/v1/core/kv_cache_manager.py b/vllm/v1/core/kv_cache_manager.py
@@ -230,6 +230,9 @@ def allocate_slots(
             delay_cache_blocks: Whether to skip caching the blocks. This is
                 used by P/D when allocating blocks used in a KV transfer
                 which will complete in a future step.
+            num_encoder_tokens: The number of encoder tokens to allocate for
+                cross-attention in encoder-decoder models(e.g., Whisper).
+                For decoder-only models, this should be 0.
 
         Blocks layout:
         ```