-
Notifications
You must be signed in to change notification settings - Fork 460
[Fix] Add operations in _dummy_run
to maintain synchronization with _process_reqs
, resolving a service hang
#2454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a service hang when the batch size is smaller than the data parallelism (DP) size by adding a missing collective operation in _dummy_run
to synchronize with _process_reqs
. While adding the get_dp_padding
call is the correct step to resolve the hang, the implementation introduces an inconsistency. The padding calculated is not applied to num_tokens
in _dummy_run
, whereas it is applied in _process_reqs
when using ACL graphs. This can lead to a mismatch between the captured graph and runtime execution, potentially causing errors or incorrect behavior. My review includes a suggestion to address this inconsistency.
num_pad, num_tokens_across_dp_native = self.get_dp_padding( | ||
num_tokens) | ||
# num_tokens += num_pad ## Uncomment this after TorchAir is removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of get_dp_padding
correctly introduces the necessary collective call to prevent hangs. However, the calculated num_pad
is not applied to num_tokens
because the line num_tokens += num_pad
is commented out. This creates an inconsistency with the _process_reqs
method (line 1075), which does apply this padding when use_aclgraph
is true. Since _dummy_run
is used for capturing ACL graphs, this discrepancy can lead to a mismatch between the captured graph's expected input size and the actual input size at runtime, which is a critical issue. The padding should be applied to ensure consistency.
num_pad, num_tokens_across_dp_native = self.get_dp_padding( | |
num_tokens) | |
# num_tokens += num_pad ## Uncomment this after TorchAir is removed | |
num_pad, num_tokens_across_dp_native = self.get_dp_padding( | |
num_tokens) | |
num_tokens += num_pad |
Thanks for the fix, could you add an e2e test to prevent this from breaking again? |
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
please make CI happy |
@wangxiyuan Ready, waiting for other fixes. |
…implementation specific to TorchAir. Make sure server do not hang when batch size < DP size. Signed-off-by: Yizhou Liu <[email protected]>
@MengqingCao Has the refactoring of the |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2454 +/- ##
=======================================
Coverage 78.04% 78.04%
=======================================
Files 132 132
Lines 17557 17557
=======================================
Hits 13702 13702
Misses 3855 3855
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The refactor of |
LGTM except:
|
|
num_pad, num_tokens_across_dp_native = self.get_dp_padding(num_tokens) | ||
# num_tokens += num_pad ## Uncomment this after TorchAir is removed | ||
|
||
# Padding for DP (for TorchAir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this _get_forward_metadata_across_dp_and_pad function not for torchair, this method can be rewritten in torchair runner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will figure out a way to merge these two paths.
CI seems failed due to flaky hccl timeout , I retried let's see latest results: |
we need fix this issue first. For the pad logic, let's create a RFC to improve the performance. |
What this PR does / why we need it?
Fixes hang when batch size < DP size.
Does this PR introduce any user-facing change?
None.
How was this patch tested?
After this change, the function in DP case will work now.