[WIP] Simplified alternative padded-speculation acceptance rate fix #29845

LucasWilkinson · 2025-12-02T06:26:04Z

Alternative fix to: #26498

Simplify the padded drafter batch fix by adjusting seq_lens and seq_lens_cpu inside the drafting loop at token_index==0, rather than using complex mask calculations. This addresses the acceptance rate issue outlined in vllm-project#26191 where AL is reduced by about 5% when long speculative sequences are used. Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]>

Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson added 4 commits December 1, 2025 21:07

Update test_eagle.py for new prepare_inputs_padded return signature

1bf4289

Co-authored-by: Benjamin Chislett <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]>

wip

d924c9c

Signed-off-by: Lucas Wilkinson <[email protected]>

simplify

98930c7

Signed-off-by: Lucas Wilkinson <[email protected]>

mergify bot added speculative-decoding v1 labels Dec 2, 2025

simplify

f545487

Signed-off-by: Lucas Wilkinson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Simplified alternative padded-speculation acceptance rate fix #29845

[WIP] Simplified alternative padded-speculation acceptance rate fix #29845

LucasWilkinson commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[WIP] Simplified alternative padded-speculation acceptance rate fix #29845

Are you sure you want to change the base?

[WIP] Simplified alternative padded-speculation acceptance rate fix #29845

Conversation

LucasWilkinson commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant