-
Notifications
You must be signed in to change notification settings - Fork 2.8k
use containerd 2.0 in presubmit scalability jobs #35073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a single project with this much quota, so any 5k node jobs have to not conflict in terms of scheduling.
cc @kubernetes/sig-scalability / @kubernetes/sig-scalability-leads
FYI @kubernetes/sig-k8s-infra-leads @kubernetes/sig-testing-leads
The other question would be: should we just use containerd 2.x on the scale jobs? SIG Node would probably know this best actually ... @SergeyKanzhelev or @dims most likely. |
Given there is no justification for this, we should probably not add this. |
@BenTheElder a recent run of the 5k GCE job shows that we are using I'd recommend starting with a clone of |
Here we are overriding it for pull-kubernetes-e2e-gce instead of the version in the COS image I think:
|
There's an open question if we should do it via phasing over via additional jobs (as in this PR and @dims's suggestion) or not, but the issue is there's been some scaling gaps between the legacy 1.x we are still using in these scale jobs and the 2.x we are using in most of CI now. We should be aligning these to 2.x eventually, the question is the approach. x-ref: test-infra/config/jobs/kubernetes/sig-scalability/sig-scalability-presubmit-jobs.yaml Lines 88 to 89 in 52f5173
This problem currently extends to the 100 node jobs, which are not blocking but informing. The default blocking job is on 2.x |
... as @AnishShah reminded me asking questions out of band ... that's because that job is on Ubuntu, we can't install over the system containerd on COS. We usually defer to @SergeyKanzhelev and SIG node for COS support & bump timing in the GCE jobs. |
/test pull-kubernetes-e2e-gce-100-performance |
@AnishShah: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I modified |
/assign @SergeyKanzhelev |
You have to do that in the repo it is defined for (so kubernetes/kubernetes), and you won't be able to do that before merge. |
@@ -43,6 +43,8 @@ presubmits: | |||
- --cluster= | |||
- --env=HEAPSTER_MACHINE_TYPE=e2-standard-8 | |||
- --env=KUBEMARK_APISERVER_TEST_ARGS=--max-requests-inflight=80 --max-mutating-requests-inflight=0 --profiling --contention-profiling | |||
- --env=KUBE_COS_INSTALL_CONTAINERD_VERSION=v2.0.5 | |||
- --env=KUBE_COS_INSTALL_RUNC_VERSION=v1.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK since this job is not blocking anyhow, but we should be prepared to rollback.
@@ -43,6 +43,8 @@ presubmits: | |||
- --cluster= | |||
- --env=HEAPSTER_MACHINE_TYPE=e2-standard-8 | |||
- --env=KUBEMARK_APISERVER_TEST_ARGS=--max-requests-inflight=80 --max-mutating-requests-inflight=0 --profiling --contention-profiling | |||
- --env=KUBE_COS_INSTALL_CONTAINERD_VERSION=v2.0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to maintain and update it? What's the plan for bumping those versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We manually bump containerd version in other sig-node jobs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
I think we generally try to run master tests on a COS family closest to the latest and corresponding containerd version. These will match the EOL of 1.34 the best and easiest for maintainers and users for unification.
I think bumping containerd as a first step is OK.
I think if we open to rollback right away if issue will be detected, the effort of duplicating this expensive job temporarily is not worth it. If no immidiate concerns from @dims, please unhold
/unhold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AnishShah, SergeyKanzhelev, upodroid The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@AnishShah: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This change is to check pod startup latency with containerd 2.0