[Fix] Add operations in `_dummy_run` to maintain synchronization with `_process_reqs`, resolving a service hang #2454

yiz-liu · 2025-08-20T06:45:18Z

What this PR does / why we need it?

Fixes hang when batch size < DP size.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

After this change, the function in DP case will work now.

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@d9a5520

gemini-code-assist

Code Review

This pull request aims to fix a service hang when the batch size is smaller than the data parallelism (DP) size by adding a missing collective operation in _dummy_run to synchronize with _process_reqs. While adding the get_dp_padding call is the correct step to resolve the hang, the implementation introduces an inconsistency. The padding calculated is not applied to num_tokens in _dummy_run, whereas it is applied in _process_reqs when using ACL graphs. This can lead to a mismatch between the captured graph and runtime execution, potentially causing errors or incorrect behavior. My review includes a suggestion to address this inconsistency.

gemini-code-assist · 2025-08-20T06:46:34Z

vllm_ascend/worker/model_runner_v1.py

+        num_pad, num_tokens_across_dp_native = self.get_dp_padding(
+            num_tokens)
+        # num_tokens += num_pad  ## Uncomment this after TorchAir is removed


The addition of get_dp_padding correctly introduces the necessary collective call to prevent hangs. However, the calculated num_pad is not applied to num_tokens because the line num_tokens += num_pad is commented out. This creates an inconsistency with the _process_reqs method (line 1075), which does apply this padding when use_aclgraph is true. Since _dummy_run is used for capturing ACL graphs, this discrepancy can lead to a mismatch between the captured graph's expected input size and the actual input size at runtime, which is a critical issue. The padding should be applied to ensure consistency.

Suggested change

num_pad, num_tokens_across_dp_native = self.get_dp_padding(

num_tokens)

# num_tokens += num_pad ## Uncomment this after TorchAir is removed

num_pad, num_tokens_across_dp_native = self.get_dp_padding(

num_tokens)

num_tokens += num_pad

MengqingCao · 2025-08-20T06:58:02Z

Thanks for the fix, could you add an e2e test to prevent this from breaking again?

github-actions · 2025-08-20T07:43:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-08-20T08:54:10Z

please make CI happy

yiz-liu · 2025-08-21T06:23:03Z

please make CI happy

@wangxiyuan Ready, waiting for other fixes.

…implementation specific to TorchAir. Make sure server do not hang when batch size < DP size. Signed-off-by: Yizhou Liu <[email protected]>

yiz-liu · 2025-08-22T02:25:23Z

Thanks for the fix, could you add an e2e test to prevent this from breaking again?

@MengqingCao Has the refactoring of the ModelRunner class been completed?

codecov · 2025-08-22T02:52:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.04%. Comparing base (3629bc4) to head (e659504).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2454   +/-   ##
=======================================
  Coverage   78.04%   78.04%           
=======================================
  Files         132      132           
  Lines       17557    17557           
=======================================
  Hits        13702    13702           
  Misses       3855     3855

Flag	Coverage Δ
unittests	`78.04% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MengqingCao · 2025-08-22T03:35:39Z

@MengqingCao Has the refactoring of the ModelRunner class been completed?

The refactor of ModelRunner is included in #2445, but I'm not sure if it is all the refactoring, maybe @weiguihua2 could give more info

Yikun · 2025-08-23T07:07:41Z

LGTM except:

what's the regression e2e test plan as mengqing required?
quick question: We already have vllm_ascend/torchair/torchair_model_runner.py, why there is still torchair specific code in vllm_ascend/worker/model_runner_v1.py?

yiz-liu · 2025-08-23T07:15:55Z

LGTM except:

what's the regression e2e test plan as mengqing required?

quick question: We already have vllm_ascend/torchair/torchair_model_runner.py, why there is still torchair specific code in vllm_ascend/worker/model_runner_v1.py?

Working on it, tests will be added with or after [2/N][Feat] Add MC2 communication method for MoE layers #2469 .
@linfeng-yuan PLS take a look at this.

weiguihua2 · 2025-08-23T07:56:55Z

vllm_ascend/worker/model_runner_v1.py

+        num_pad, num_tokens_across_dp_native = self.get_dp_padding(num_tokens)
+        # num_tokens += num_pad  ## Uncomment this after TorchAir is removed
+
+        # Padding for DP (for TorchAir)


this _get_forward_metadata_across_dp_and_pad function not for torchair, this method can be rewritten in torchair runner

Sure, will figure out a way to merge these two paths.

Yikun · 2025-08-23T08:07:28Z

CI seems failed due to flaky hccl timeout , I retried let's see latest results:
https://github.com/vllm-project/vllm-ascend/actions/runs/17171726635/job/48723250542?pr=2454

wangxiyuan · 2025-08-25T08:45:51Z

we need fix this issue first. For the pad logic, let's create a RFC to improve the performance.

gemini-code-assist bot reviewed Aug 20, 2025

View reviewed changes

wangxiyuan approved these changes Aug 20, 2025

View reviewed changes

yiz-liu force-pushed the fix-dp branch from d3be50d to 97e70d3 Compare August 21, 2025 01:44

This change separates the general DP padding logic from the existing …

9c5d40e

…implementation specific to TorchAir. Make sure server do not hang when batch size < DP size. Signed-off-by: Yizhou Liu <[email protected]>

yiz-liu force-pushed the fix-dp branch from 97e70d3 to 9c5d40e Compare August 22, 2025 02:22

Merge branch 'vllm-project:main' into fix-dp

e659504

wangxiyuan approved these changes Aug 23, 2025

View reviewed changes

weiguihua2 reviewed Aug 23, 2025

View reviewed changes

wangxiyuan approved these changes Aug 25, 2025

View reviewed changes

MengqingCao mentioned this pull request Aug 25, 2025

[Release]: Release checklist for v0.10.1rc1 #2525

Closed

48 tasks

wangxiyuan merged commit 99bf25a into vllm-project:main Aug 25, 2025
36 of 41 checks passed

yiz-liu deleted the fix-dp branch August 25, 2025 12:02

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Add operations in `_dummy_run` to maintain synchronization with `_process_reqs`, resolving a service hang #2454

[Fix] Add operations in `_dummy_run` to maintain synchronization with `_process_reqs`, resolving a service hang #2454

yiz-liu commented Aug 20, 2025 •

edited by wangxiyuan

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 20, 2025

Uh oh!

MengqingCao commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

wangxiyuan commented Aug 20, 2025

Uh oh!

yiz-liu commented Aug 21, 2025

Uh oh!

yiz-liu commented Aug 22, 2025

Uh oh!

codecov bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

MengqingCao commented Aug 22, 2025

Uh oh!

Yikun commented Aug 23, 2025 •

edited

Loading

Uh oh!

yiz-liu commented Aug 23, 2025

Uh oh!

weiguihua2 Aug 23, 2025

Uh oh!

yiz-liu Aug 25, 2025

Uh oh!

Yikun commented Aug 23, 2025

Uh oh!

wangxiyuan commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang #2454

[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang #2454

Conversation

yiz-liu commented Aug 20, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

wangxiyuan commented Aug 20, 2025

Uh oh!

yiz-liu commented Aug 21, 2025

Uh oh!

yiz-liu commented Aug 22, 2025

Uh oh!

codecov bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MengqingCao commented Aug 22, 2025

Uh oh!

Yikun commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiz-liu commented Aug 23, 2025

Uh oh!

weiguihua2 Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun commented Aug 23, 2025

Uh oh!

wangxiyuan commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

[Fix] Add operations in `_dummy_run` to maintain synchronization with `_process_reqs`, resolving a service hang #2454

[Fix] Add operations in `_dummy_run` to maintain synchronization with `_process_reqs`, resolving a service hang #2454

yiz-liu commented Aug 20, 2025 •

edited by wangxiyuan

Loading

codecov bot commented Aug 22, 2025 •

edited

Loading

Yikun commented Aug 23, 2025 •

edited

Loading