-
Notifications
You must be signed in to change notification settings - Fork 70
Automatically adjust VLLM_DECODE_BLOCK_BUCKET_MIN if it exceeds max_blocks #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically adjust VLLM_DECODE_BLOCK_BUCKET_MIN if it exceeds max_blocks #432
Conversation
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
b04364f to
99bdd52
Compare
✅ CI PassedAll checks passed successfully against the following vllm commit: |
| logger().info( | ||
| f'VLLM_DECODE_BLOCK_BUCKET_MIN={decode_block_bucket_cfg[0]} is higher than max_blocks={max_blocks}. Your configuration VLLM_DECODE_BLOCK_BUCKET_MIN={decode_block_bucket_cfg[0]} will be overwritten to VLLM_DECODE_BLOCK_BUCKET_MIN={max_blocks - decode_block_bucket_cfg[1]}' | ||
| ) | ||
| decode_block_bucket_cfg[0] = max_blocks - decode_block_bucket_cfg[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need also to check that this value is >0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a clip
99bdd52 to
af4e4fb
Compare
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
…locks Signed-off-by: Daniel Socek <[email protected]>
af4e4fb to
a40ba66
Compare
|
@xuechendi @iboiko-habana can you help review this PR? Its a fix that unblocks certain validations |
| f'VLLM_DECODE_BLOCK_BUCKET_MAX={decode_block_bucket_cfg[2]} is higher than max_blocks={max_blocks}. Your configuration VLLM_DECODE_BLOCK_BUCKET_MAX={decode_block_bucket_cfg[2]} will be overwritten to VLLM_DECODE_BLOCK_BUCKET_MAX={max_blocks}' | ||
| ) | ||
| decode_block_bucket_cfg[2] = max_blocks | ||
| if decode_block_bucket_cfg[0] > max_blocks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it indented under if decode_block_bucket_cfg[2] > max_blocks:?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only need to run this conditional check if max has been potentially modified and set to max_blocks. If indented to outer scope, it can work too but would unnecessarily spend cycles.
| ) | ||
| decode_block_bucket_cfg[2] = max_blocks | ||
| if decode_block_bucket_cfg[0] > max_blocks: | ||
| decode_block_bucket_min = max(1, max_blocks - decode_block_bucket_cfg[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we subtracting step from max blocks? We don't we just change it to max blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately if min is set to max_blocks it causes error. Setting to 1 step before max resolves the issue.
|
@adobrzyn @xuechendi can you help review this PR? We keep on having to apply this patch over and over again :) |
|
@kzawora-intel Hi Konrad, do you mind helping to review or assign a reviewer to this small patch? Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
✅ CI PassedAll checks passed successfully against the following vllm commit: |
What does this PR do?
During engine warmup, the max decode block bucket size
VLLM_DECODE_BLOCK_BUCKET_MAXis capped based on the availablemax_blocks. However, the minimum bucket sizeVLLM_DECODE_BLOCK_BUCKET_MINwas not similarly constrained, which could lead to a configuration where VLLM_DECODE_BLOCK_BUCKET_MIN > VLLM_DECODE_BLOCK_BUCKET_MAX (or even >max_blocks). This invalid state causes runtime error.This PR ensures that
VLLM_DECODE_BLOCK_BUCKET_MINis automatically clamped tomax_blocks(and not greater thanVLLM_DECODE_BLOCK_BUCKET_MAX) during initialization, preventing invalid bucket size configurations.