Skip to content

Conversation

MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Aug 19, 2025

What this PR does / why we need it?

Fix some ci issue and refactor modelrunner

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test.

Potabk and others added 5 commits August 19, 2025 17:00
 * [AclGraph] Adapt aclgraph into new graph dispatcher arch

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the model runner and attention mechanisms. The key changes include decoupling the attention metadata builders from the model runner, introducing a common attention metadata structure, and adding a generic ACL graph wrapper for graph capture and replay. These changes improve code modularity, maintainability, and align the codebase with a more modern architecture for handling graph-based execution. The tests have also been substantially improved to reflect these changes. Overall, this is a high-quality refactoring with no apparent critical issues.

Signed-off-by: MengqingCao <[email protected]>
Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known issue:

  1. lint is disabled. - @MengqingCao
  2. unit test failed - @MengqingCao @Potabk
  3. lora test failed @paulyu12
  4. multicard test is cancelled - @MengqingCao
  5. ds r1+quantization + tp8 failed. @weiguihua2
  6. ep doesn't work with tp <4 in torchair mode

We'll fix them in the next PRs in quick

Let's merge this to unblock other PR.

@wangxiyuan wangxiyuan merged commit 1327f9b into vllm-project:main Aug 20, 2025
18 of 20 checks passed
@MengqingCao MengqingCao deleted the aclgraph4 branch August 20, 2025 01:09
@MengqingCao MengqingCao restored the aclgraph4 branch August 20, 2025 01:29
wangxiyuan pushed a commit that referenced this pull request Aug 20, 2025
### What this PR does / why we need it?
add lint block before running e2e. follow up
#2445

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
N/A

Signed-off-by: MengqingCao <[email protected]>
wangxiyuan pushed a commit that referenced this pull request Aug 21, 2025
### What this PR does / why we need it?
This PR move current unified mla backend to torchair folder and remove
torchair-related code in attention/mla_v1.py (1.3k -> 0.9k).

 
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Running eager mode with mla backend, and torchair mode with code before
[2445](#2445)


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@f571ff8

Signed-off-by: linfeng-yuan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants