Skip to content

Conversation

adobrzyn
Copy link
Collaborator

This PR enables to read buckets from file. Files can be passed through user flags:
- VLLM_PROMPT_BUCKETING_FILE
- VLLM_DECODE_BUCKETING_FILE
Valid files should have each bucket listed in new line in this order:
batch_size, query_length, number_of_context_blocks OR batch_size, query_length/number_of_context_blocks
Any other line will be ignored.
It is possible to read buckets from file for only one phase, and use dfifferent bucketing strategy for another phase

Signed-off-by: Agata Dobrzyniewicz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant