Commit 271ce85
Automatically adjust VLLM_DECODE_BLOCK_BUCKET_MIN if it exceeds max_blocks (#432)
# What does this PR do?
During engine warmup, the max decode block bucket size
`VLLM_DECODE_BLOCK_BUCKET_MAX` is capped based on the available
`max_blocks`. However, the minimum bucket size
`VLLM_DECODE_BLOCK_BUCKET_MIN` was not similarly constrained, which
could lead to a configuration where VLLM_DECODE_BLOCK_BUCKET_MIN >
VLLM_DECODE_BLOCK_BUCKET_MAX (or even > `max_blocks`). This invalid
state causes runtime error.
This PR ensures that `VLLM_DECODE_BLOCK_BUCKET_MIN` is automatically
clamped to `max_blocks` (and not greater than
`VLLM_DECODE_BLOCK_BUCKET_MAX`) during initialization, preventing
invalid bucket size configurations.
Signed-off-by: Daniel Socek <[email protected]>
Co-authored-by: Michał Kuligowski <[email protected]>1 parent f3f66f6 commit 271ce85
1 file changed
+6
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
71 | 77 | | |
72 | 78 | | |
73 | 79 | | |
| |||
0 commit comments