Skip to content

Conversation

@joerunde
Copy link
Collaborator

@joerunde joerunde commented Oct 28, 2025

Description

Upgrades vllm to 0.11.1, adding backwards compatibility code where necessary.

This PR:

  • Updates the default vllm install to 0.11.1
  • Retains the lower bound of 0.10.2
  • Adds a new entry in the backwards compatibility tests to maintain test coverage of 0.11.0
  • Changes the uv.lock settings to install vllm from source instead of from cuda wheels

There was one really fun change here where the type of sampled_token_ids changed, but was then changed back for 0.12.0.

TODO: There is still a problem with running qunatized models. I'm not sure what's going on there, as neither the torch version nor modeling code changed, but we're getting an error from torch 🤔

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@joerunde joerunde added the ready label Oct 28, 2025
@joerunde joerunde changed the title ✨ vllm main support for upcoming 0.111.1 release ✨ vllm main support for upcoming 0.11.1 release Nov 4, 2025
@joerunde joerunde requested a review from ckadner as a code owner December 4, 2025 20:29
@joerunde joerunde removed the ready label Dec 4, 2025
Signed-off-by: Joe Runde <[email protected]>
]

[tool.uv.sources]
vllm = { git = "https://github.com/vllm-project/vllm", rev = "v0.11.1" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing vllm this way (with VLLM_TARGET_DEVICE=empty) leaves out extra cuda-only dependencies from the uv.lock, since the published vllm wheels on pypi are only built for cuda.

@joerunde joerunde changed the title ✨ vllm main support for upcoming 0.11.1 release ✨ vllm support for 0.11.1 release Dec 5, 2025
Copy link
Collaborator

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow you make backwards compatiblity elegant

Comment on lines 111 to 115
extra_args = {}
if "structured_output_request_ids" in dataclass_fields(SchedulerOutput):
extra_args["structured_output_request_ids"] = {}
if "grammar_bitmask" in dataclass_fields(SchedulerOutput):
extra_args["grammar_bitmask"] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we could just import and use _get_extra_args() from the spyre_worker to reduce code duplication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private imports!!

but yeah, for a test file that's probably fine

) -> None:
"""Raises if this request is unsupported on this platform"""

# TODO: fix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO for this PR to fix before merging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh- maybe 🤔

I think I put the TODO in because the lazy import was suuuper ugly, but I do think the import has to stay lazy or we'll hit a circular import :(. The TODO here might be to just remove the TODO and replace with a comment about why this is the way it is

@tjohnson31415
Copy link
Collaborator

TODO: There is still a problem with running qunatized models. I'm not sure what's going on there, as neither the torch version nor modeling code changed, but we're getting an error from torch

This PR bumps fms-model-optimizer to 0.7.0 in uv.lock. I confirmed the quantized model tests fail after upgrading 0.6.0 -> 0.7.0. Installing fms-mo from main resolved the torch error in my dev pod.

@joerunde
Copy link
Collaborator Author

joerunde commented Dec 6, 2025

Alright @tjohnson31415, looks like we are 🟢 for now. Thanks for the fms-mo hint, I validated that fms-mo 0.7.0 still works on spyre and it's just the cpu execution that's broken. I've bumped here to the latest main commit, which also appears to work fine on spyre too.

Let's talk on Monday- maybe we should get a new official fms-mo release instead of pinning a commit, and then I'm not entirely sure with our current release cadence whether we'd want to bump the actual vllm install to 0.11.1 or flip this around and just add a compatibility test for 0.11.1 and keep the uv.lock at 0.11.0. Then either way we should get the currently-good set of spyre unit tests run on this before merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants