-
Notifications
You must be signed in to change notification settings - Fork 51
Introducing sampler warmup as separate warmup step #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing sampler warmup as separate warmup step #131
Conversation
Signed-off-by: Krzysztof Smusz <[email protected]>
/run-gaudi-tests |
Only codeowners can request to run Gaudi tests. Contact list: kzawora-intel, xuechendi, mswiniarsk, adobrzyn |
/run-gaudi-tests |
Signed-off-by: Krzysztof Smusz <[email protected]>
/run-gaudi-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces sampler warmup as a separate warmup step to reduce graph recompilations during model execution. The warmup tests various sampling configurations with different batch sizes and temperature/top-p/top-k values to pre-compile sampler graphs, reducing compilation overhead during actual inference.
Key changes:
- Added separate sampler warmup phase before model graph warmup
- Refactored sampling code to extract common functionality into reusable methods
- Modified worker warmup conditions to prevent redundant warmups
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
vllm_gaudi/v1/worker/hpu_worker.py | Added condition to prevent redundant model warmup when graphs are already compiled |
vllm_gaudi/v1/worker/hpu_model_runner.py | Added comprehensive sampler warmup functionality and refactored sampling code |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
/run-gaudi-tests |
Signed-off-by: Krzysztof Smusz <[email protected]>
Signed-off-by: Krzysztof Smusz <[email protected]>
/run-gaudi-tests |
/run-gaudi-tests |
1 similar comment
/run-gaudi-tests |
Warming up the sampler with different configurations removes graph recompilations of bigger sampler graphs seen within the actual execution. As tested with example workloads and batch sizes, the only recompilations left from the sampler are from minor graphs, which have minimal influence to the execution time. The warmup of the sampler takes around 1-3 seconds, depending on the buckets' batch sizes to be warmed up. Additionally, removed the situation, where the warmup method is called twice (seen as duplicated prints within the warmup phase but with empty warmed up buckets, as these have all been already warmed up). --------- Signed-off-by: Krzysztof Smusz <[email protected]> Signed-off-by: Katarzyna Fojcik <[email protected]>
Warming up the sampler with different configurations removes graph recompilations of bigger sampler graphs seen within the actual execution. As tested with example workloads and batch sizes, the only recompilations left from the sampler are from minor graphs, which have minimal influence to the execution time.
The warmup of the sampler takes around 1-3 seconds, depending on the buckets' batch sizes to be warmed up.
Additionally, removed the situation, where the warmup method is called twice (seen as duplicated prints within the warmup phase but with empty warmed up buckets, as these have all been already warmed up).