Skip to content

Conversation

JC-ut0
Copy link
Contributor

@JC-ut0 JC-ut0 commented Aug 19, 2025

What this PR does / why we need it?

support mtp in disaggregated-prefill scenario

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • v0.9.1-dev
  • A3 [TP16] [DP4 TP4]
  • A3 4P1D

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two bug fixes to support MTP speculative decoding in a disaggregated prefill scenario. In vllm_ascend/attention/mla_v1.py, the calculation of actual_seq_lengths_q is corrected for torchair graph mode. In vllm_ascend/worker/model_runner_v1.py, the attention state is correctly set to SpecDecoding for deepseek_mtp in cases where it would have been misidentified as DecodeOnly. The changes are correct and well-targeted. I have no further suggestions.

@JC-ut0 JC-ut0 force-pushed the v0.9.1-dev branch 3 times, most recently from d02aff1 to ebe1b95 Compare August 20, 2025 07:52
@wangxiyuan
Copy link
Collaborator

please update the commit message

@wangxiyuan wangxiyuan merged commit f64208b into vllm-project:v0.9.1-dev Aug 20, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants