Skip to content

Add vLLM x TorchAO integration workflow #2610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 26, 2025

This adds a workflow to run vLLM x TorchAO tests with the latest vLLM main. The setup is vLLM main x PyTorch stable x FBGEMM stable x TorchAO PR/main commits.

If there are not enough H100 capacity and you observe queueing, please let me know. We might need to tweak the workflow to run on H100 only when pushing to main.

Testing

https://github.com/pytorch/ao/actions/runs/16535494171/job/46769051375

Copy link

pytorch-bot bot commented Jul 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2610

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a6b2aaa with merge base 5fe4ebd (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 26, 2025
@huydhn huydhn added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Jul 26, 2025
huydhn added 11 commits July 25, 2025 17:38
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn requested a review from jerryzh168 July 26, 2025 02:38
@huydhn huydhn marked this pull request as ready for review July 26, 2025 02:40
@huydhn huydhn changed the title Add vLLM x ao integration workflow Add vLLM x TorchAO integration workflow Jul 26, 2025
@huydhn
Copy link
Contributor Author

huydhn commented Jul 26, 2025

@jerryzh168 I run the same logic to install fbgemm-gen-ai from nightly and build TorchAO, but running pytest currently ends up with this error loading fbgemm https://github.com/pytorch/ao/actions/runs/16535051236/job/46767869902?pr=2610#step:7:400. Do you know what I did wrong here?

I need to install fbgemm-gen-ai from stable to avoid rebuilding it. I think we could stay with that, but let me know if it's ok. In this setup we have, vLLM main x PyTorch stable x FBGEMM stable x TorchAO PR/main commits

@huydhn
Copy link
Contributor Author

huydhn commented Jul 26, 2025

@pytorchbot drci

huydhn added 3 commits July 25, 2025 20:46
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
@jerryzh168
Copy link
Contributor

current torchao main doesn't work with fbgemm gpu genai stable I think, maybe just use stable for everything and we'll make sure next stable torchao release can work with fbgemm gpu genai stable

@huydhn
Copy link
Contributor Author

huydhn commented Jul 28, 2025

current torchao main doesn't work with fbgemm gpu genai stable I think, maybe just use stable for everything and we'll make sure next stable torchao release can work with fbgemm gpu genai stable

Interesting, the tests pass https://github.com/pytorch/ao/actions/runs/16537610493?pr=2610, it means that torchao main is working with vLLM main. Is that what we are looking for here? I guess if we need to test torchao vs fbgemm gpu genai, it makes more sense to have another workflow to test only these twos?

@jerryzh168
Copy link
Contributor

current torchao main doesn't work with fbgemm gpu genai stable I think, maybe just use stable for everything and we'll make sure next stable torchao release can work with fbgemm gpu genai stable

Interesting, the tests pass pytorch/ao/actions/runs/16537610493?pr=2610, it means that torchao main is working with vLLM main. Is that what we are looking for here? I guess if we need to test torchao vs fbgemm gpu genai, it makes more sense to have another workflow to test only these twos?

OK it could be because we haven't landed everything we want in main yet, and some of our new changes would require fbgemm main

we do have tests to test torchao + fbgemm gpu genai (nightly) in our CI

on:
push:
branches:
- main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: nightly might be enough I think

set -eux

# vLLM docker image is using CUDA 12.8 and python 3.12
pip install --pre fbgemm-gpu-genai --index-url https://download.pytorch.org/whl/cu128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fbgemm gpu genai nightly would require pytorch nightly I think, and with that we probably need vllm to depend no torch nightly, is that doable with docker? or docker is build with stable pytorch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants