-
Notifications
You must be signed in to change notification settings - Fork 425
Fix some ci issue and refactor modelrunner #2445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wangli <[email protected]>
* [AclGraph] Adapt aclgraph into new graph dispatcher arch Signed-off-by: MengqingCao <[email protected]> Signed-off-by: wangli <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring of the model runner and attention mechanisms. The key changes include decoupling the attention metadata builders from the model runner, introducing a common attention metadata structure, and adding a generic ACL graph wrapper for graph capture and replay. These changes improve code modularity, maintainability, and align the codebase with a more modern architecture for handling graph-based execution. The tests have also been substantially improved to reflect these changes. Overall, this is a high-quality refactoring with no apparent critical issues.
Signed-off-by: MengqingCao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Known issue:
- lint is disabled. - @MengqingCao
- unit test failed - @MengqingCao @Potabk
- lora test failed @paulyu12
- multicard test is cancelled - @MengqingCao
- ds r1+quantization + tp8 failed. @weiguihua2
- ep doesn't work with tp <4 in torchair mode
We'll fix them in the next PRs in quick
Let's merge this to unblock other PR.
### What this PR does / why we need it? add lint block before running e2e. follow up #2445 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A Signed-off-by: MengqingCao <[email protected]>
### What this PR does / why we need it? This PR move current unified mla backend to torchair folder and remove torchair-related code in attention/mla_v1.py (1.3k -> 0.9k). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running eager mode with mla backend, and torchair mode with code before [2445](#2445) - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@f571ff8 Signed-off-by: linfeng-yuan <[email protected]>
What this PR does / why we need it?
Fix some ci issue and refactor modelrunner
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
CI passed with existing test.