Skip to content

Conversation

mtulio
Copy link
Contributor

@mtulio mtulio commented Jun 11, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR introduces managed security group support for Service type-LoadBalancer NLB through a global cloud-config option. Key changes:

  • Managed Security Groups: Introduces NLBSecurityGroupMode: Managed cloud-config to automatically provision Security Groups for new NLBs
  • Automatic Lifecycle Management: Controller creates, manages, and deletes security groups with proper AWS tagging
  • Backward Compatibility: Existing Service NLBs without security groups remain unchanged (no retrofitting)
  • BYO Security Group Limitation: Explicitly blocks service.beta.kubernetes.io/aws-load-balancer-security-groups and service.beta.kubernetes.io/aws-load-balancer-extra-security-groups annotations for NLB

Rationale for BYO SG Limitation:

BYO Security Group support for NLB is intentionally blocked in this release due to:

  1. AWS API complexity with NLB security group management mixing with the managed SG feature
  2. Security group leak bug (see #1208 / #1209) logic will be reused in NLB BYO SG feature

Which issue(s) this PR fixes:
Refs #1151

Special notes for your reviewer:

Future Work:
Managed SG is important to empower users to bypass managed SG, and enhance security control boundaries with user-managed security groups on NLBs. BYO Security Group support for NLB will be considered in a future release after:

  • The managed SG be merged with NLB SG management logic
  • Resolution of security group leak bug (#1209)
  • Comprehensive field testing of managed SG functionality
  • Community feedback on managed SG approach

The changes introducing the global, opt-in, cloud-config for enabling Security Group (SG) on NLB creation (similar CLB), is for users/administrators who intentionally wants to enforce SG across all new services - following AWS recommendations, and ALBC defaults. This won't change the default CCM behavior if the configuration isn't added.

Done checklist:

  • introduce cloud-config to signalize the controller to always provision Security Group on Service type-LoadBalancer NLB
  • Create Security Group when NLB is created, and global config is added
  • Delete Security Group when NLB is removed
  • Update Service port and ensure SG rules are updated
  • Review SG rules logic
  • Elaborate an e2e scenario that allows to update the cloud-config to exercise NLB+SG feature
  • Resolve pending question related to the approach
  • Create User Documentation for the feature

Related changes isolated from this PR:

https://github.com/kubernetes/cloud-provider-aws/pull/1207
https://github.com/kubernetes/cloud-provider-aws/issues/1206
https://github.com/kubernetes/cloud-provider-aws/pull/1163
https://github.com/kubernetes/cloud-provider-aws/pull/1205
https://github.com/kubernetes/cloud-provider-aws/pull/1159
https://github.com/kubernetes/cloud-provider-aws/pull/1153
https://github.com/kubernetes/cloud-provider-aws/pull/1215

Does this PR introduce a user-facing change?:

Add opt-in NLB Security Group provisioning support. Introduces support for provisioning Security Groups to Service type-LoadBalancer NLB by default through global cloud-config (`NLBSecurityGroupMode: Managed`). The managed security group approach is opt-in and includes automatic life cycle management of Security Groups and rules.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 11, 2025
@k8s-ci-robot k8s-ci-robot requested review from hakman and nckturner June 11, 2025 21:42
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 11, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 11, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mtulio. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtulio mtulio changed the title Splat 2253 nlb sg config Introduce Security Group provisioning for NLB Jun 11, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 11, 2025
@mtulio mtulio changed the title Introduce Security Group provisioning for NLB Introduce opt-in Security Group provisioning for NLB Jun 11, 2025
Copy link
Contributor Author

@mtulio mtulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done. A few pending items in TODO, and raising questions in key points from a broad group.

@mtulio mtulio changed the title Introduce opt-in Security Group provisioning for NLB Introduce opt-in NLB provisioning with Security Groups Jun 11, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Jun 11, 2025

Hey @kmala, I still have some items to review in this PR, but tests are passing locally. Would you mind stamping ok-to-test to validate if there is no local addiction from my env? Thanks

@kmala
Copy link
Member

kmala commented Jun 11, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2025
@mtulio mtulio force-pushed the SPLAT-2253-nlb-sg-config branch 2 times, most recently from e1e47cf to 1e8e527 Compare June 13, 2025 02:13
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 13, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Jun 13, 2025

/test all

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2025
@mtulio mtulio force-pushed the SPLAT-2253-nlb-sg-config branch from 1e8e527 to 9a0b0cc Compare June 18, 2025 20:10
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2025
@mtulio mtulio changed the title Introduce opt-in NLB provisioning with Security Groups Introduce config opt-in NLB provisioning with Security Groups Jun 18, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Sep 4, 2025

@mtulio demonstrated this change for me today, in general things work as expected. we did notice some odd behavior when removing the annotation, a new lb was created, but we think the bug is not related to this code change.

/lgtm

Thanks for peer review, Mike. I just recorded the steps in the issue (already existed in controller) #1254 . cc @kmala

@mtulio
Copy link
Contributor Author

mtulio commented Sep 4, 2025

Hey @kmala , would you mind taking a look at this PR, please? Thanks!

@JoelSpeed
Copy link
Contributor

Also gone through this today

/lgtm

Thanks @mtulio

Ensure the Security Group IDs is added on NLB load balancer creation.

Additionally, this is fixing the BYO SG update scenario by detecting the
replaced SG on CLB and delete it when it is owned by controller. The
same behavior will be implemented in the BYO SG scenario for NLB too.
Introduce hasClusterTagOwned() to validate if a resource has the
kubernetes cluster tag (`kubernetes.io/cluster/clusterID`) with
value `owned`, so it can quickly used when ensuring states to
cloud resources managed by controller, such as SG deletions, etc.
Introduce the documentation to use the feature Service type-LoadBalancer
with Security Group by opt-in through the cloud-config.
Introduce the NLB Security Group Mode configuration
(NLBSecurityGroupMode) to make the controller creates the Security Group
by default when provisioning Service type-LoadBalancer NLB.

This configuration is opt-in and global to the cluster.
@mtulio mtulio force-pushed the SPLAT-2253-nlb-sg-config branch from c6017ec to fba07ed Compare September 8, 2025 18:20
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025
@mtulio mtulio force-pushed the SPLAT-2253-nlb-sg-config branch from fba07ed to e55c57f Compare September 8, 2025 18:23
@kmala
Copy link
Member

kmala commented Sep 8, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025
Ensure the Security Group is managed when creating
a service type-LoadBalancer NLB object, considering the
global configuration to manage SGs in NLBs:
NLBSecurityGroupMode=NLBSecurityGroupModeManaged
Ensure annotation matches feature NLB with Security Groups by preventing
standard controller BYO SG annotations due existing controller
limitations.
Ensure unit tests on EnsureLoadBalancer, including case to test NLB with
security group by changing the cloud-config.
@mtulio mtulio force-pushed the SPLAT-2253-nlb-sg-config branch from e55c57f to 53a8527 Compare September 8, 2025 19:28
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2025
@mtulio
Copy link
Contributor Author

mtulio commented Sep 8, 2025

@mtulio: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-aws-check e55c57f link true /test pull-cloud-provider-aws-check
pull-cloud-provider-aws-test e55c57f link true /test pull-cloud-provider-aws-test
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

failures caused by missing unit test from recent changes which moved BYO SG annotations to dedicated validation functions, as well unit test failure caused by sdk v2 bump. Both are fixed in my last commit, additionally I moved BYO SG unit tests to the validation functions.

Awaiting new CI signals.

@mtulio
Copy link
Contributor Author

mtulio commented Sep 8, 2025

CI is green! Hey @elmiko @kmala are you ok to go here?

@kmala
Copy link
Member

kmala commented Sep 9, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 9, 2025
@JoelSpeed
Copy link
Contributor

/approve

Based on myself, @elmiko and @kmala having reviewed

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2025
@k8s-ci-robot k8s-ci-robot merged commit 5e18243 into kubernetes:master Sep 9, 2025
11 checks passed
@mtulio
Copy link
Contributor Author

mtulio commented Sep 9, 2025

Thanks @elmiko @kmala @JoelSpeed for your support on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants