-
Notifications
You must be signed in to change notification settings - Fork 30
✨ vllm support for 0.11.1 release #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
| ] | ||
|
|
||
| [tool.uv.sources] | ||
| vllm = { git = "https://github.com/vllm-project/vllm", rev = "v0.11.1" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installing vllm this way (with VLLM_TARGET_DEVICE=empty) leaves out extra cuda-only dependencies from the uv.lock, since the published vllm wheels on pypi are only built for cuda.
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
tjohnson31415
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow you make backwards compatiblity elegant
| extra_args = {} | ||
| if "structured_output_request_ids" in dataclass_fields(SchedulerOutput): | ||
| extra_args["structured_output_request_ids"] = {} | ||
| if "grammar_bitmask" in dataclass_fields(SchedulerOutput): | ||
| extra_args["grammar_bitmask"] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we could just import and use _get_extra_args() from the spyre_worker to reduce code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private imports!!
but yeah, for a test file that's probably fine
vllm_spyre/platform.py
Outdated
| ) -> None: | ||
| """Raises if this request is unsupported on this platform""" | ||
|
|
||
| # TODO: fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO for this PR to fix before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh- maybe 🤔
I think I put the TODO in because the lazy import was suuuper ugly, but I do think the import has to stay lazy or we'll hit a circular import :(. The TODO here might be to just remove the TODO and replace with a comment about why this is the way it is
This PR bumps fms-model-optimizer to 0.7.0 in |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
|
Alright @tjohnson31415, looks like we are 🟢 for now. Thanks for the fms-mo hint, I validated that fms-mo 0.7.0 still works on spyre and it's just the cpu execution that's broken. I've bumped here to the latest main commit, which also appears to work fine on spyre too. Let's talk on Monday- maybe we should get a new official fms-mo release instead of pinning a commit, and then I'm not entirely sure with our current release cadence whether we'd want to bump the actual vllm install to 0.11.1 or flip this around and just add a compatibility test for 0.11.1 and keep the uv.lock at 0.11.0. Then either way we should get the currently-good set of spyre unit tests run on this before merging |
Description
Upgrades vllm to 0.11.1, adding backwards compatibility code where necessary.
This PR:
There was one really fun change here where the type of
sampled_token_idschanged, but was then changed back for 0.12.0.TODO: There is still a problem with running qunatized models. I'm not sure what's going on there, as neither the torch version nor modeling code changed, but we're getting an error from torch 🤔