Skip to content

Conversation

mtulio
Copy link
Contributor

@mtulio mtulio commented Jul 15, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes a leaked security group (SG) when a Service type-loadBalancer (CLB) is updated adding the BYO SG annotation (service.beta.kubernetes.io/aws-load-balancer-security-groups), which replaces all SG added to the Load Balancer without removing rules and deleting it when created by controller.

Which issue(s) this PR fixes:

Fixes #1208

Special notes for your reviewer:

The approach of creating isolated function was used specially to:

  • enhance code maintenance
  • enhance unit tests
  • allow to reuse the logic when NLB with SG is supported

The unit tests and documentation(function) comments have been assisted by Cursor AI(model claude-4-sonet): AIA HAb SeCeNc Hin R v1.0

Does this PR introduce a user-facing change?:

Fixed security group leak when updating Classic Load Balancer services with `service.beta.kubernetes.io/aws-load-balancer-security-groups` annotation. Controller-managed security groups are now properly cleaned up when switching to user-specified security groups.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 15, 2025
@k8s-ci-robot k8s-ci-robot requested review from hakman and kishorj July 15, 2025 17:23
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 15, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @mtulio. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 15, 2025
@mtulio mtulio changed the title Fix 1208 byosg update fix leak managed/owned security group on Service update with BYO SG Jul 15, 2025
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 03f9775 to 83c92f2 Compare July 15, 2025 20:41
@elmiko
Copy link
Contributor

elmiko commented Jul 15, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 15, 2025
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 83c92f2 to 23ba0b3 Compare July 15, 2025 21:11
@mtulio
Copy link
Contributor Author

mtulio commented Jul 15, 2025

/test all

@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 23ba0b3 to 0fec46d Compare July 15, 2025 21:55
@mtulio
Copy link
Contributor Author

mtulio commented Jul 15, 2025

Fixing doc strings and failed unit tests from previous unexpected behavior:

/test all

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 16, 2025
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 0fec46d to 1907542 Compare July 16, 2025 03:34
@mtulio
Copy link
Contributor Author

mtulio commented Jul 16, 2025

/test pull-cloud-provider-aws-e2e-kubetest2

@mtulio
Copy link
Contributor Author

mtulio commented Jul 16, 2025

/test all

@mtulio
Copy link
Contributor Author

mtulio commented Jul 16, 2025

I can't find connection between failures in pull-cloud-provider-aws-e2e-kubetest2 and existing changes.

I am going to convert to regular PR to ask for reviewers while we observe if this isnt a CI flake.

PTAL?
/assign @kmala @elmiko @JoelSpeed

@mtulio
Copy link
Contributor Author

mtulio commented Aug 1, 2025

/test pull-cloud-provider-aws-e2e

Introduce unit tests for functions added to validate Service update
to BYO Security Group annotations from a managed SG state.
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 1adf385 to 6af2646 Compare August 1, 2025 20:53
@mtulio
Copy link
Contributor Author

mtulio commented Aug 1, 2025

/test pull-cloud-provider-aws-e2e

@mtulio
Copy link
Contributor Author

mtulio commented Aug 1, 2025

unrelated failure in the lb-internal tests which expects to fail (hairpinning). Checking if it was flake:

/test pull-cloud-provider-aws-e2e

@mtulio mtulio marked this pull request as ready for review August 2, 2025 22:41
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 2, 2025
@k8s-ci-robot k8s-ci-robot requested a review from olemarkus August 2, 2025 22:41
@mtulio
Copy link
Contributor Author

mtulio commented Aug 2, 2025

Review comments addressed, new e2e added and e2e passing on CI. This PR is ready for review. PTAL? Thanks

@mtulio
Copy link
Contributor Author

mtulio commented Aug 4, 2025

e2e test [cloud-provider-aws-e2e] loadbalancer CLB with managed Security Group must update to BYO Security Group was added, here are the logs of the step which performs the Service update/checks: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/cloud-provider-aws/1209/pull-cloud-provider-aws-e2e/1952377003214639104#1:build-log.txt%3A2140-2167

Investigating if the failure AWS Cloud Provider End-to-End Tests: [It] [cloud-provider-aws-e2e] loadbalancer CLB internal should be reachable with hairpinning traffic in is related to e2e updates.

converting to draft while increasing debug on internal test of CLB, looks like it's failing to retrieve pod information, checking if this is related to the service account. Once I get more information and isolate the issue I will return to ready.

@mtulio mtulio marked this pull request as draft August 4, 2025 20:27
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 4, 2025
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from 6af2646 to edd4a11 Compare August 4, 2025 20:32
@mtulio
Copy link
Contributor Author

mtulio commented Aug 4, 2025

/test pull-cloud-provider-aws-e2e

@mtulio
Copy link
Contributor Author

mtulio commented Aug 5, 2025

FAIL: expected managed security group "sg-0a917e4b14501635f" removed by controller, got "sg-0a917e4b14501635f"

Checking if I need to enhance the controller update the sg:

/test pull-cloud-provider-aws-e2e

Introduce BYO Security Group(SG) update scenario to Service CLB to validate
SG leak when user has created a Service CLB with default SG and
eventually updated to a user-provided.

kubernetes#1208
@mtulio mtulio force-pushed the fix-1208-byosg-update branch from edd4a11 to f0b38b6 Compare August 5, 2025 19:44
@mtulio
Copy link
Contributor Author

mtulio commented Aug 5, 2025

increase verbosity

/test pull-cloud-provider-aws-e2e

@mtulio mtulio marked this pull request as ready for review August 6, 2025 01:47
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 6, 2025
@k8s-ci-robot k8s-ci-robot requested review from dims and nckturner August 6, 2025 01:47
@mtulio
Copy link
Contributor Author

mtulio commented Aug 6, 2025

e2e job green.

I am also leaving the e2e more verbose in case of test network failures, helping devs troubleshooting easier CI logs of internal connectivity / internal LB. LMK if that makes sense.

Converting to regular PR. PTAL? Thanks

@k8s-ci-robot
Copy link
Contributor

@mtulio: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-e2e f0b38b6 link true /test pull-cloud-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtulio mtulio marked this pull request as draft August 21, 2025 14:32
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 21, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Aug 21, 2025

Converting to draft while I return on it to rebase and run deepen investigation on e2e failures.

@mtulio
Copy link
Contributor Author

mtulio commented Oct 2, 2025

FWIW interim update, this PR is still alive and need to be fixed, and proposal could be used in the logic of BYOSG in NLBs. I am planning to return on it next week to rebase and ask for final review with recent updates in the Service NLB and e2e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Managed security group leak after annotation service.beta.kubernetes.io/aws-load-balancer-security-groups added to existing service
5 participants