Skip to content

Conversation

@everettraven
Copy link
Contributor

@everettraven everettraven commented Jun 12, 2025

This PR adds the tests necessary to promote the ExternalOIDC and ExternalOIDCWithUIDAndExtraClaimMappings feature-gates on OpenShift.

It adds a new test suite specific to tests that test the external OIDC provider authentication mode.
Some tests are intentionally marked as skipped as the functionality does not yet exist to test, but I wanted to still provide the skeleton for these tests so that we can easily make some updates when that functionality is implemented.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@everettraven everettraven changed the title WIP: Add tests for ExternalOIDC CNTRLPLANE-945: WIP: Add tests for ExternalOIDC Jun 12, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 12, 2025

@everettraven: This pull request references CNTRLPLANE-945 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@everettraven
Copy link
Contributor Author

/test all

2 similar comments
@everettraven
Copy link
Contributor Author

/test all

@everettraven
Copy link
Contributor Author

/test all

@openshift-trt
Copy link

openshift-trt bot commented Jul 2, 2025

Job Failure Risk Analysis for sha: 5d8b755

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade Medium
Job run should complete before timeout
This test has passed 96.51% of 3980 runs on release 4.20 [Overall] in the last week.

@everettraven
Copy link
Contributor Author

/test all

@openshift-trt
Copy link

openshift-trt bot commented Jul 10, 2025

Job Failure Risk Analysis for sha: 81377a4

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
Potential external regression detected for High Risk Test analysis

Open Bugs
etcdMembersDown should not fire on healthy etcd scaling event
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-kube-api apiserver/kube-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
Showing 4 of 5 test results

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2025
@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Jul 10, 2025
@everettraven everettraven force-pushed the feature/external-oidc-tests branch from 937d61b to f70c43a Compare July 11, 2025 12:07
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 11, 2025
@everettraven everettraven force-pushed the feature/external-oidc-tests branch from efa4dee to effd33f Compare July 15, 2025 16:44
}

func (c *CLI) UserConfig() *rest.Config {
if c.token != "" {
Copy link
Contributor Author

@everettraven everettraven Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers - this change was added because whenever a token was set here, getting the user config would attempt to pull the config from an empty config path.

This resulted in a consistent error so I updated this to handle the case where a token is explicitly set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a job that exercise this path? This seems to the only change that might impact common path. So I want to double check.

Copy link
Contributor Author

@everettraven everettraven Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are a few references in tests that I would expect to run in various jobs: https://github.com/search?q=repo%3Aopenshift/origin%20UserConfig&type=code (planning to search for an explicit job and update here)

If it feels too risky to make this change here, I could try to come up with some other approach.

Copy link
Contributor Author

@everettraven everettraven Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some oauthserver and etcd tests use the UserConfig() method.

Specifically,

var _ = g.Describe("[sig-api-machinery] API data in etcd", func() {
defer g.GinkgoRecover()
cli := exutil.NewCLIWithPodSecurityLevel("etcd-storage-path", psapi.LevelBaseline)
adminCLI := cli.AsAdmin()
g.It("should be stored at the correct location and version for all resources [Serial]", func() {
controlPlaneTopology, err := exutil.GetControlPlaneTopology(adminCLI)
o.Expect(err).NotTo(o.HaveOccurred())
if *controlPlaneTopology == configv1.ExternalTopologyMode {
e2eskipper.Skipf("External clusters run etcd outside of the cluster. Etcd cannot be accessed directly from within the cluster")
}
etcdClientCreater := &etcdPortForwardClient{kubeClient: adminCLI.AdminKubeClient()}
defer etcdClientCreater.closeAll()
// for the cleaning mechanism (cli.TeardownProject being invoked in g.AfterEach)
// we need to use the original client, AsAdmin replaces the instaces and thus
// the newely created objects won't get pruned afte the test finishes
etcdUser := cli.CreateUser("test-etcd-storage-path")
err = adminCLI.Run("adm", "policy", "add-cluster-role-to-user").Args("cluster-admin", etcdUser.Name, "--rolebinding-name", etcdUser.Name).Execute()
// make sure the clusterrolebinding also gets removed
cli.AddExplicitResourceToDelete(rbacv1.SchemeGroupVersion.WithResource("clusterrolebindings"), "", etcdUser.Name)
o.Expect(err).NotTo(o.HaveOccurred())
adminCLI.ChangeUser(etcdUser.Name)
testEtcd3StoragePath(g.GinkgoT(2), adminCLI, etcdClientCreater.getEtcdClient)
})
})
looks like it is run as part of the standard OCP serial conformance suite - https://sippy.dptools.openshift.org/sippy-ng/tests/4.20/analysis?test=%5Bsig-api-machinery%5D%20API%20data%20in%20etcd%20should%20be%20stored%20at%20the%20correct%20location%20and%20version%20for%20all%20resources%20%5BSerial%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D&filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22%5Bsig-api-machinery%5D%20API%20data%20in%20etcd%20should%20be%20stored%20at%20the%20correct%20location%20and%20version%20for%20all%20resources%20%5BSerial%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fserial%5D%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22never-stable%22%7D%2C%7B%22columnField%22%3A%22variants%22%2C%22not%22%3Atrue%2C%22operatorValue%22%3A%22contains%22%2C%22value%22%3A%22aggregated%22%7D%5D%2C%22linkOperator%22%3A%22and%22%7D

Because it calls

adminCLI.ChangeUser(etcdUser.Name)
it should end up eventually triggering this code path because that call ends up using the same underlying call that I use in my test - GetClientConfigForUser():
clientConfig := c.GetClientConfigForUser(name)

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 15, 2025

@everettraven: This pull request references CNTRLPLANE-945 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This PR adds the tests necessary to promote the ExternalOIDC and ExternalOIDCWithUIDAndExtraClaimMappings feature-gates on OpenShift.

It adds a new test suite specific to tests that test the external OIDC provider authentication mode.
Some tests are intentionally marked as skipped as the functionality does not yet exist to test, but I wanted to still provide the skeleton for these tests so that we can easily make some updates when that functionality is implemented.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@everettraven everettraven changed the title CNTRLPLANE-945: WIP: Add tests for ExternalOIDC CNTRLPLANE-945: Add tests for ExternalOIDC and ExternalOIDCWithUIDAndExtraClaimMappings features Jul 15, 2025
@everettraven everettraven marked this pull request as ready for review July 15, 2025 16:59
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 15, 2025
@openshift-ci openshift-ci bot requested review from deads2k and ibihim July 15, 2025 17:02
@openshift-trt
Copy link

openshift-trt bot commented Jul 15, 2025

Job Failure Risk Analysis for sha: effd33f

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-publicnet-2of2 Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-proxy Medium
Job run should complete before timeout
This test has passed 80.54% of 3648 runs on release 4.20 [Overall] in the last week.

Copy link
Member

@liouk liouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the keycloak_*.go files into their own package; it'll make it clearer to reuse functionality in other tests if ever needed.

Looks great in general, just a few comments. I won't block the PR on the nits :)

This test suite runs tests to validate cluster behavior when cluster authentication is configured to use an external OIDC provider.
`),
Qualifiers: []string{
`name.contains("[Suite:openshift/auth/external-oidc") && !name.contains("[Skipped]")`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`name.contains("[Suite:openshift/auth/external-oidc") && !name.contains("[Skipped]")`,
`name.contains("[Suite:openshift/auth/external-oidc]") && !name.contains("[Skipped]")`,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intentionally left off the end bracket because it seems to be the pattern other test suites follow.

Presumably this is so you can run sub-suites of this as part of this suite (i.e if I did something like [Suite:openshift/auth/external-oidc/some-sub-thing] the test with this "tag" would still run as part of the openshift/auth/external-oidc test suite.

If we think this is unnecessary and that we should deviate from existing convention, I can add the closing bracket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a good point -- I just thought this was a typo, but this makes sense 👍


o.Expect(apiServerArgs["authentication-token-webhook-config-file"]).To(o.BeNil(), "authentication-token-webhook-config-file argument should not be specified when OIDC authentication is configured")
o.Expect(apiServerArgs["authentication-token-webhook-version"]).To(o.BeNil(), "authentication-token-webhook-version argument should not be specified when OIDC authentication is configured")
o.Expect(apiServerArgs["authConfig"]).To(o.BeNil(), "authConfig argument should not be specified when OIDC authentication is configured")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

authConfig is not under apiServerArguments, it's on the same level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think what I meant here is authentication-config - will update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think authConfig is still a valid oauth related arg to validate missing -- it basically contains the path for the oauth metadata file. It's just in the same level as apiServerArguments instead of part of it.

copiedOC := *oc
tokenOC := copiedOC.WithToken(keycloakCli.AccessToken())

_, err = tokenOC.KubeClient().AuthorizationV1().SelfSubjectAccessReviews().Create(context.TODO(), &authzv1.SelfSubjectAccessReview{
Copy link
Member

@liouk liouk Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not kill two birds with one stone with the next SSR test? Proves both token acceptance and cluster identity mapping. We could still split it into two tests that validate these two things.

Also, why not also use an SSR to prove the previous oauth token test as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not kill two birds with one stone with the next SSR test? Proves both token acceptance and cluster identity mapping. We could still split it into two tests that validate these two things.

Fair question. We can certainly perform a shared SSR.

Also, why not also use an SSR to prove the previous oauth token test as well?

Certainly can. I think I just went with the GET on Pods because it was a quick and easy way to say "we are now unauthorized to get something we were previously authorized to get".

I think an SSR is much more reliable, so I'll switch it to use that.

copiedOC := *oc
tokenOC := copiedOC.WithToken(keycloakCli.AccessToken())

_, err = tokenOC.KubeClient().AuthorizationV1().SelfSubjectAccessReviews().Create(context.TODO(), &authzv1.SelfSubjectAccessReview{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the spirit of previous comments, maybe an SSR is sufficient?


o.Expect(apiServerArgs["authentication-token-webhook-config-file"]).NotTo(o.BeNil(), "authentication-token-webhook-config-file argument should be specified when OIDC authentication is not configured")
o.Expect(apiServerArgs["authentication-token-webhook-version"]).NotTo(o.BeNil(), "authentication-token-webhook-version argument should be specified when OIDC authentication is not configured")
o.Expect(apiServerArgs["authConfig"]).NotTo(o.BeNil(), "authConfig argument should be specified when OIDC authentication is not configured")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, authConfig isn't part of apiServerArguments.

@everettraven
Copy link
Contributor Author

We could move the keycloak_*.go files into their own package; it'll make it clearer to reuse functionality in other tests if ever needed.

Is it likely that we use this beyond testing authentication? IMO we can extract this whenever we have a use case to re-use it. For now, I imagine we only care to use it for authentication related tests.

@everettraven everettraven force-pushed the feature/external-oidc-tests branch from effd33f to 68dad7b Compare July 16, 2025 13:16
@everettraven
Copy link
Contributor Author

/hold

botched rebase

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2025
@everettraven everettraven force-pushed the feature/external-oidc-tests branch from 0b3f84e to e72069c Compare July 22, 2025 17:41
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2025
@everettraven
Copy link
Contributor Author

Build issues were due to a need to rebase and properly pickup changes from an o/api bump I missed. Should be fixed now.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2025
@everettraven
Copy link
Contributor Author

/retest

1 similar comment
@everettraven
Copy link
Contributor Author

/retest

@xueqzhan
Copy link
Contributor

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2025
@openshift-trt
Copy link

openshift-trt bot commented Jul 24, 2025

Job Failure Risk Analysis for sha: e72069c

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (30) are below the historical average (305): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
Potential external regression detected for High Risk Test analysis

Open Bugs
etcd platform pod exist test failing on etcd-scaling jobs
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (339): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.93% of 4157 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time

@openshift-trt
Copy link

openshift-trt bot commented Jul 24, 2025

Job Failure Risk Analysis for sha: e72069c

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (30) are below the historical average (220): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
Potential external regression detected for High Risk Test analysis

Open Bugs
etcd platform pod exist test failing on etcd-scaling jobs
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (230): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling High
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 99.93% of 4157 runs on release 4.20 [Overall] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time

Comment on lines 49 to 51
group := map[string]interface{}{
"name": name,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Alternatives:

Suggested change
group := map[string]interface{}{
"name": name,
}
group := struct{
Name string `json:"name"`
}{
Name: name,
}
Suggested change
group := map[string]interface{}{
"name": name,
}
group := map[string]any{
"name": name,
}

Applies to similar code in this file.

Comment on lines 110 to 116
data := url.Values{
"username": []string{username},
"password": []string{password},
"grant_type": []string{"password"},
"client_id": []string{clientID},
"scope": []string{"openid"},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
data := url.Values{
"username": []string{username},
"password": []string{password},
"grant_type": []string{"password"},
"client_id": []string{clientID},
"scope": []string{"openid"},
}
data := url.Values{}
data.Set("username", username)
data.Set("password", password)
data.Set("grant_type", "password")
data.Set("client_id", clientID)
data.Set("scope", "openid")

Comment on lines 128 to 136
respBodyData, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("reading response data: %w", err)
}

err = json.Unmarshal(respBodyData, &respBody)
if err != nil {
return fmt.Errorf("unmarshalling response body %s: %w", respBodyData, err)
}
Copy link
Contributor

@ibihim ibihim Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streams without buffering entire reponse:

Suggested change
respBodyData, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("reading response data: %w", err)
}
err = json.Unmarshal(respBodyData, &respBody)
if err != nil {
return fmt.Errorf("unmarshalling response body %s: %w", respBodyData, err)
}
if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
return fmt.Errorf("unmarshalling response data: %w", err)
}

Applies to other occurrences as well. Only thing that I feel strongly about. But I would not block the PR as this is not production code.

}
defer resp.Body.Close()

respBody := map[string]interface{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am a strong believer in anonymous structs for type-safety 😄

  var tokenResponse struct {
      AccessToken string `json:"access_token"`
      IDToken     string `json:"id_token"`
      TokenType   string `json:"token_type,omitempty"`
      ExpiresIn   int    `json:"expires_in,omitempty"`
      Scope       string `json:"scope,omitempty"`
  }

Comment on lines 75 to 88
user := map[string]interface{}{
"username": username,
"email": fmt.Sprintf("%[email protected]", username),
"enabled": true,
"emailVerified": true,
"groups": groups,
"credentials": []map[string]interface{}{
{
"temporary": false,
"type": "password",
"value": password,
},
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

  user := struct {
      Username      string `json:"username"`
      Email         string `json:"email"`
      Enabled       bool   `json:"enabled"`
      EmailVerified bool   `json:"emailVerified"`
      Groups        []string `json:"groups"`
      Credentials   []struct {
          Temporary bool   `json:"temporary"`
          Type      string `json:"type"`
          Value     string `json:"value"`
      } `json:"credentials"`
  }{
      Username:      username,
      Email:         fmt.Sprintf("%[email protected]", username),
      Enabled:       true,
      EmailVerified: true,
      Groups:        groups,
      Credentials: []struct {
          Temporary bool   `json:"temporary"`
          Type      string `json:"type"`
          Value     string `json:"value"`
      }{
          {
              Temporary: false,
              Type:      "password",
              Value:     password,
          },
      },
  }

tokenURL := *kc.adminURL
tokenURL.Path = fmt.Sprintf("/realms/%s/protocol/openid-connect/token", kc.realm)

resp, err := kc.DoRequest(http.MethodPost, tokenURL.String(), "application/x-www-form-urlencoded", false, bytes.NewBuffer([]byte(data.Encode())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Maybe something like this, would be cool:

    kc.client.PostForm(tokenURL.String(), data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only post form data for this one type of request. If we find ourselves repeating this logic quite frequently I could see value in a common method.

}
cleanups = append(cleanups, cleanup)

return cleanups, waitForKeycloakAvailable(ctx, client, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, let me talk you through the wonderful world of functional programming, the next time we drink a coffee together... 😄

Nit: a builder pattern would be cool (like we use it frequently in library-go) and a dedicated package (looks pretty interesting to import for kube-rbac-proxy and OIDC testing 😛)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A builder pattern feels a bit overkill for now. If we have use cases for a common library here, lets take that as a follow-up action to create that and use it across all the places necessary?

keycloakCli, err = keycloakClientFor(kcURL)
o.Expect(err).NotTo(o.HaveOccurred(), "should not encounter an error creating a keycloak client")

// First authenticate as the admin keyloak user so we can add new groups and users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// First authenticate as the admin keyloak user so we can add new groups and users
// First authenticate as the admin keycloak user so we can add new groups and users

}
}

return errors.Join(errs...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

errors.NewAggregate is very popular and widely used around our codebases. Otherwise, there seem to be some nice filtering capabilities as well:

errors.FilterOut(errors.NewAggregate(errs), apierrors.IsNotFound)

func waitForRollout(ctx context.Context, client *exutil.CLI) {
kasCli := client.AdminOperatorClient().OperatorV1().KubeAPIServers()

// First wait for KAS to flip to progessing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// First wait for KAS to flip to progessing
// First wait for KAS to flip to progressing

@openshift-trt
Copy link

openshift-trt bot commented Jul 24, 2025

Job Failure Risk Analysis for sha: bca8317

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (202): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
---
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (213): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-network] pods should successfully create sandboxes by adding pod to network
This test has passed 97.91% of 4311 runs on release 4.20 [Overall] in the last week.

Open Bugs
Component Readiness: pods should successfully create sandboxes by adding pod to network: expected pod UID "aa853924-c6c6-45b7-be56-e059960bc3c6" but got "ab26e0dc-d736-4945-aa02-91fa3f066cdc" from Kube API
"[sig-network] pods should successfully create sandboxes by adding pod to network" fails often on compact CI jobs

@everettraven
Copy link
Contributor Author

/retest

@openshift-trt
Copy link

openshift-trt bot commented Jul 25, 2025

Job Failure Risk Analysis for sha: bca8317

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (145): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (106) are below the historical average (169): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (104) are below the historical average (2856): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (104) are below the historical average (1425): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia IncompleteTests
Tests for this run (104) are below the historical average (2613): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Jul 25, 2025

Job Failure Risk Analysis for sha: bca8317

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (143): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (106) are below the historical average (169): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (104) are below the historical average (2864): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (104) are below the historical average (1432): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia IncompleteTests
Tests for this run (104) are below the historical average (2624): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@ibihim
Copy link
Contributor

ibihim commented Jul 28, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everettraven, ibihim, liouk, xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@everettraven
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 28, 2025

@everettraven: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-gcp 0b3f84e link false /test okd-e2e-gcp
ci/prow/e2e-aws-ovn-serial-publicnet-1of2 0b3f84e link false /test e2e-aws-ovn-serial-publicnet-1of2
ci/prow/e2e-gcp-disruptive bca8317 link false /test e2e-gcp-disruptive
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway bca8317 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-metal-ipi-ovn bca8317 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-openstack-ovn bca8317 link false /test e2e-openstack-ovn
ci/prow/e2e-gcp-fips-serial-2of2 bca8317 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-aws-ovn-single-node-upgrade bca8317 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-etcd-scaling bca8317 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 bca8317 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-gcp-ovn-etcd-scaling bca8317 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-ovn-dualstack bca8317 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 bca8317 link false /test e2e-gcp-ovn-techpreview-serial-2of2
ci/prow/e2e-aws-disruptive bca8317 link false /test e2e-aws-disruptive
ci/prow/e2e-azure-ovn-upgrade bca8317 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-azure-ovn-etcd-scaling bca8317 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-openstack-serial bca8317 link false /test e2e-openstack-serial
ci/prow/e2e-metal-ipi-virtualmedia bca8317 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout bca8317 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-fips-serial-1of2 bca8317 link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-vsphere-ovn-etcd-scaling bca8317 link false /test e2e-vsphere-ovn-etcd-scaling

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link

openshift-trt bot commented Jul 28, 2025

Job Failure Risk Analysis for sha: bca8317

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Low
Job run should complete before timeout
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:hidden Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-arch] events should not repeat pathologically for ns/openshift-multus
Potential external regression detected for High Risk Test analysis
---
[sig-arch] events should not repeat pathologically for ns/openshift-network-diagnostics
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (104) are below the historical average (1409): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia IncompleteTests
Tests for this run (104) are below the historical average (2611): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 48d56f2 and 2 for PR HEAD bca8317 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD debace2 and 2 for PR HEAD bca8317 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 5ef42f5 into openshift:main Jul 29, 2025
38 of 57 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.20.0-202507290444.p0.g5ef42f5.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants