Skip to content

Conversation

rjg-lyh
Copy link
Collaborator

@rjg-lyh rjg-lyh commented Aug 19, 2025

What this PR does / why we need it?

This PR fix bugs and refactor cached mask generation logic. Now just pre-construct and use the cached mask on cpu instead of device on npu.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention mask generation logic to keep the cached mask on the CPU and only move it to the device when needed. This is a good optimization that simplifies the logic and reduces device memory pressure. The changes in vllm_ascend/worker/model_runner_v1.py are consistent with this refactoring, correctly preparing and passing CPU tensors for mask creation.

I've found one critical issue: a typo in a method call within get_splitfuse_attn_mask that will cause an AttributeError at runtime. Please see the specific comment for details.

current_row += q_len

return attn_mask.to(device, non_blocking=True)
self.update_attn_cache(max_seq_len, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the method call here. The method is named _update_attn_cache (with a leading underscore), but it's being called as update_attn_cache. This will raise an AttributeError at runtime.

Suggested change
self.update_attn_cache(max_seq_len, dtype)
self._update_attn_cache(max_seq_len, dtype)

@rjg-lyh rjg-lyh force-pushed the main branch 3 times, most recently from 0092f92 to 25b1903 Compare August 20, 2025 08:17
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@rjg-lyh rjg-lyh force-pushed the main branch 2 times, most recently from 24ebd95 to bf6ffbe Compare August 24, 2025 07:22
Copy link

codecov bot commented Aug 24, 2025

Codecov Report

❌ Patch coverage is 97.22222% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.96%. Comparing base (5d8ec28) to head (2effecc).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/attention/attention_mask.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2442      +/-   ##
==========================================
- Coverage   77.99%   77.96%   -0.03%     
==========================================
  Files         134      134              
  Lines       18498    18474      -24     
==========================================
- Hits        14427    14403      -24     
  Misses       4071     4071              
Flag Coverage Δ
unittests 77.96% <97.22%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rjg-lyh rjg-lyh force-pushed the main branch 3 times, most recently from 86565ef to 4f01f49 Compare August 26, 2025 01:35
Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@wangxiyuan
Copy link
Collaborator

@ApsarasX please double check the response, if it's fine. Feel free to merge this.

@ApsarasX ApsarasX merged commit 2bfbf9b into vllm-project:main Aug 27, 2025
25 checks passed
device=torch.device("cpu"),
)

def test_mask_value_cleanliness(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add this test here. @ApsarasX

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add this test here. @ApsarasX

OK, I see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants