Skip to content

Conversation

@bryan-cox
Copy link
Contributor

@bryan-cox bryan-cox commented Oct 24, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR implements support for configuring availability zones on Azure load balancers to enable zone-redundant configurations for high availability.

Azure load balancers can be configured as zone-redundant to ensure high availability across multiple availability zones within a region. This feature allows users to specify availability zones (1, 2, 3) on load balancers, which are then set on the frontend IP configurations.

Key changes:

  • Added AvailabilityZones field to LoadBalancerSpec API
  • Implemented service layer to set zones on frontend IP configurations
  • Added webhook validation to enforce Azure's zone immutability requirement
  • Included comprehensive documentation with examples and migration guidance
  • Added unit tests and E2E tests

Which issue(s) this PR fixes:
Fixes #5709

Special notes for your reviewer:

This implementation follows Azure's zone redundancy model:

  • For internal load balancers: zones are set directly on frontend IP configurations
  • For public load balancers: zones should be set on associated public IP addresses (documented)
  • Zones are immutable after creation per Azure platform requirements
  • Webhook validation prevents invalid zone modifications

The E2E test is optional and creates a cluster with zone-redundant load balancers to verify the feature works end-to-end in Azure.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Add support for zone-redundant load balancers. Users can now configure availability zones on load balancers (APIServerLB, NodeOutboundLB, ControlPlaneOutboundLB) to enable zone-redundant configurations for high availability across multiple availability zones.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 24, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 24, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jont828 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 24, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 7572b39 to 67685f0 Compare October 24, 2025 17:04
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 24, 2025
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 67685f0 to 2e5d373 Compare October 24, 2025 17:10
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 24, 2025
Add support for configuring availability zones on load balancers to enable
zone-redundant configurations for high availability.

For internal load balancers, zones are set directly on the frontend IP
configuration. For public load balancers, zones should be set on the
associated public IP addresses.

The field supports up to 3 zones and uses a set list type to prevent
duplicates. Zones are immutable after creation per Azure platform
requirements.
Implement the service layer changes to support zone-redundant load balancers:

- Update LBSpec to include AvailabilityZones field
- Modify getFrontendIPConfigs to set zones on frontend IP configurations
- Update all four load balancer specs in cluster scope to pass zones:
  - APIServerLB
  - NodeOutboundLB
  - ControlPlaneOutboundLB
  - ControlPlaneInternalLB

Zones are converted from []string to []*string for Azure SDK compatibility
and applied to frontend IP configurations for zone redundancy.
Add webhook validation to enforce Azure's requirement that availability
zones cannot be changed after a load balancer is created.

The validation checks all three load balancer types:
- APIServerLB
- NodeOutboundLB
- ControlPlaneOutboundLB

Any attempt to modify zones on an existing load balancer will be rejected
at admission time with a clear error message, preventing users from
attempting operations that would fail at the Azure API level.
Add comprehensive unit tests for zone-redundant load balancer functionality:

- Add test fixture (fakeInternalAPILBSpecWithZones) with zone configuration
- Add test case to verify zones are correctly set on frontend IP configs
- Validate that zones array contains all expected zone values (1, 2, 3)
- Ensure zones are properly converted to Azure SDK format

The tests verify that the service layer correctly translates the API spec
into Azure SDK structures with zones on frontend IP configurations.
Update generated CRD manifests to include the availabilityZones field
on LoadBalancerSpec with proper validation:

- Type: array of strings
- List type: set (prevents duplicates)
- Max items: 3 (Azure supports up to 3 zones per region)

This is the result of running 'make generate-manifests' after adding
the AvailabilityZones field to the API types.
Add comprehensive documentation for zone-redundant load balancer feature:

- Explain Azure zone redundancy concepts for load balancers
- Provide configuration examples for all load balancer types:
  - Internal load balancers (API server)
  - Public load balancers
  - Node outbound load balancers
  - Control plane outbound load balancers
- Include complete highly available cluster example
- Document important considerations:
  - Immutability of zones after creation
  - Region support requirements
  - Standard SKU requirement
  - Backend pool placement best practices
- Provide migration guidance for existing clusters
- Add troubleshooting section
- Document best practices
Add dedicated end-to-end test to verify zone-redundant load balancer
functionality in real Azure environments.

The test:
- Creates a cluster with zone-redundant load balancers configured
- Uses the apiserver-ilb flavor with zones set to 1,2,3
- Verifies zones are correctly set in AzureCluster spec
- Validates Azure resources have zones on frontend IP configurations
- Tests all three load balancer types:
  - API Server Load Balancer (internal)
  - Node Outbound Load Balancer
  - Control Plane Outbound Load Balancer

This is an optional test that validates the complete feature works
end-to-end by creating actual Azure infrastructure and verifying
the zone configuration.
@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 2e5d373 to e29bc2e Compare October 24, 2025 17:13
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Oct 24, 2025
@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 51.35135% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.58%. Comparing base (e22796f) to head (36f5c8f).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
api/v1beta1/azurecluster_webhook.go 33.33% 15 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5944      +/-   ##
==========================================
- Coverage   44.60%   44.58%   -0.03%     
==========================================
  Files         279      279              
  Lines       25132    25177      +45     
==========================================
+ Hits        11209    11224      +15     
- Misses      13110    13137      +27     
- Partials      813      816       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jackfrancis
Copy link
Contributor

@bryan-cox can you add this new functionality to the existing E2E scenario for a private cluster, which ships with/ an internal LB? E.g.:

$ git diff templates/flavors/private/patches/private-lb.yaml
diff --git a/templates/flavors/private/patches/private-lb.yaml b/templates/flavors/private/patches/private-lb.yaml
index 76e1539df..a2933e299 100644
--- a/templates/flavors/private/patches/private-lb.yaml
+++ b/templates/flavors/private/patches/private-lb.yaml
@@ -7,6 +7,10 @@ spec:
     apiServerLB:
       name: ${CLUSTER_NAME}-internal-lb
       type: Internal
+      availabilityZones:
+        - "1"
+        - "2"
+        - "3"
     nodeOutboundLB:
       frontendIPsCount: 1
     controlPlaneOutboundLB:

After you apply the above changes to the template partial above, render updated templates w/ kustomize by invoking make generate flavors from the git root directory.

cc @nojnhuh @mboersma

Updates the private cluster flavor to include availability zones
(1, 2, 3) on the API server internal load balancer for improved
high availability and resilience.

Updates kubernetes-sigs#5709

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove separate zone-redundant LB test
- Add API server LB zone verification inside AzurePrivateClusterSpec
- Verify zones in both AzureCluster spec and Azure infrastructure

Updates kubernetes-sigs#5709

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jackfrancis
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from e29bc2e to 3b77777 Compare October 27, 2025 20:20
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@bryan-cox
Copy link
Contributor Author

/retest

@bryan-cox bryan-cox force-pushed the issue-5709-lb-zone-redundancy branch from 3b77777 to 6fa7de7 Compare October 28, 2025 10:34
Updates the private cluster flavor to include availability zones
(1, 2, 3) on the API server internal load balancer for improved
high availability and resilience.

The private cluster E2E test is marked [OPTIONAL] so it will be
skipped in regions that don't support 3 availability zones.

Updates kubernetes-sigs#5709
@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 28, 2025

@bryan-cox: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-capi-e2e 67685f0 link false /test pull-cluster-api-provider-azure-capi-e2e
pull-cluster-api-provider-azure-e2e-optional 36f5c8f link false /test pull-cluster-api-provider-azure-e2e-optional

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Load balancers are not zone redundant and can't be configured as such

3 participants