Skip to content

Conversation

wangshulei098
Copy link

@wangshulei098 wangshulei098 commented Oct 8, 2025

What type of PR is this?

feature

What this PR does / why we need it

When using LWS with Gang scheduling, sometime we need to configure certain annotations in the PodGroup that are used by the Volcano scheduler to control specific scheduling behaviors. I think this is a relatively common requirement. In some large-scale model training job orchestration frameworks, such as the PyTorch Operator, when creating a PyTorchJob custom resource, the PodGroup inherits certain annotations.

Which issue(s) this PR fixes

Fixes #669
Certain specific annotations can be passed through from LWS to flexibly control various features of the scheduler.

Special notes for your reviewer

Does this PR introduce a user-facing change?


@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 8, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wangshulei098
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and yankay October 8, 2025 08:52
@k8s-ci-robot
Copy link
Contributor

Welcome @wangshulei098!

It looks like this is your first PR to kubernetes-sigs/lws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/lws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @wangshulei098. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 8, 2025
Copy link

netlify bot commented Oct 8, 2025

Deploy Preview for kubernetes-sigs-lws canceled.

Name Link
🔨 Latest commit 09b0c00
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-lws/deploys/68ecbe8c4117750008fb1d4d

@Edwinhr716
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2025
@wangshulei098
Copy link
Author

/retest

@wangshulei098
Copy link
Author

@ahg-g @yankay Help me check the reason for the pull-lws-test-e2e check failure. I think it might be due to version issues in the CI environment. thanks

@Edwinhr716
Copy link
Contributor

Created #673, I will take a look

@Edwinhr716
Copy link
Contributor

/retest

@wangshulei098 wangshulei098 requested a review from ahg-g October 11, 2025 07:16
@yankay
Copy link
Member

yankay commented Oct 13, 2025

HI @JesseStutler

How do you think about the change :-)

return totalResources
}

func InheritVolcanoAnnotations(lws *leaderworkerset.LeaderWorkerSet) map[string]string {
Copy link
Member

@yankay yankay Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wangshulei098

I feel this part of the code doesn't belong in "utils.go" because this "utils.go" provides some general utility functions. It might be more appropriate to place it in "volcano_provider.go".

Also,
Would it be better to have a testcase and a docs at https://github.com/kubernetes-sigs/lws/blob/main/docs/examples/sample/gang-scheduling/README.md ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add test cases and usage documentation.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 13, 2025
@wangshulei098 wangshulei098 requested a review from yankay October 13, 2025 07:19
@JesseStutler
Copy link
Contributor

HI @JesseStutler

How do you think about the change :-)

I think it's very useful, indeed because users can't directly configure podgroup, inherit from lws is the direct way

@wangshulei098 wangshulei098 force-pushed the main branch 3 times, most recently from 3da42dd to 3a95885 Compare October 13, 2025 08:46
Co-authored-by: Abdullah Gharaibeh <[email protected]>
@yankay
Copy link
Member

yankay commented Oct 13, 2025

Thanks @wangshulei098
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 13, 2025
@wangshulei098 wangshulei098 marked this pull request as draft October 15, 2025 11:29
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2025
@wangshulei098 wangshulei098 marked this pull request as ready for review October 15, 2025 11:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable certain features of the scheduler using annotations

6 participants