-
Notifications
You must be signed in to change notification settings - Fork 433
refact model runner v1 #2435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refact model runner v1 #2435
Conversation
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant and well-executed refactoring of the model runner logic. The key changes, including separating the torchair-specific logic into NPUTorchairModelRunner
and splitting the input preparation from model execution in NPUModelRunner
, greatly improve the code's structure, clarity, and maintainability. The introduction of AscendCommonAttentionMetadata
is also a good step towards better-structured code.
I've found one critical issue where the _generate_process_reqs_hidden_states
method in the base NPUModelRunner
is missing required arguments in its call to the model's forward method. This would likely cause a TypeError
in non-torchair execution paths. Please see the detailed comment for the fix.
hidden_states = self.model( | ||
input_ids=input_ids, | ||
positions=positions, | ||
intermediate_tensors=intermediate_tensors, | ||
inputs_embeds=inputs_embeds, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to self.model()
is missing the kv_caches
and attn_metadata
arguments. The model's forward
method requires these arguments to be passed. While they are available in the forward context for attention layers, the top-level model forward signature still needs them. This will likely cause a TypeError
at runtime for non-torchair execution paths.
hidden_states = self.model( | |
input_ids=input_ids, | |
positions=positions, | |
intermediate_tensors=intermediate_tensors, | |
inputs_embeds=inputs_embeds, | |
) | |
hidden_states = self.model( | |
input_ids=input_ids, | |
positions=positions, | |
kv_caches=self.kv_caches, | |
attn_metadata=attn_metadata, | |
intermediate_tensors=intermediate_tensors, | |
inputs_embeds=inputs_embeds, | |
) |
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?