-
Notifications
You must be signed in to change notification settings - Fork 131
Add control and data plane HPA #3492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @nowjean! Welcome to the project! 🎉 Thanks for opening this pull request! |
✅ All required contributors have signed the F5 CLA for this PR. Thank you! |
I have hereby read the F5 CLA and agree to its terms |
Thank you for you contribution to the project. Please run |
I’ve completed 'make generate-all'. Could you please review my PR? |
172c009
to
d081d68
Compare
So this only affects the control plane, correct? We probably want to support this for the nginx data plane as well (seems like that would be the more beneficial use case). In order to configure deployment options for the data plane, it requires a bit more work, specifically in our APIs and the code itself. The NginxProxy CRD holds the deployment configuration for the nginx data plane, which the control plane uses to configure the data plane when deploying it. Here is a simple example of how we add a new field to the API to allow for configuring these types of deployment fields: #3319. |
I'd also love a more descriptive PR title, as well as a release note in the description so we can include this feature in our release notes :) |
@sjberman Yes, this PR only affects the control plane. Can we also implement HPA for the data plane? AFAIK, the data plane Deployment is created by the NginxProxy CRD, and its name depends on the Gateway's HPA only applies to Deployments with a fixed name, like:
So, I think we can't implement HPA via the Helm chart, especially since data plane and control plane pods are now separated in 2.0. |
@nowjean I updated my comment with a description on how it can be implemented on the data plane side. Glad we're on the same page :) |
Will manual test this PR for both control plane and data plane when we have all the changes :) |
@sjberman @salonichf5 I've pushed my changes to this PR. From my testing, the code correctly applies HPA to both the control plane and data plane. |
Testing applying these HPA for control plane and data plane pods
values.yaml
HPA details
Needed to install the metrics server (enabling insecure TLS) to get metrics for resource memory and should this be communicated to end user about setting additional fields if we want scaling to be active
values.yaml
I saw HPA get configured for control plane pod but i couldn't see one configured for data plane pod. Events from the nginx deployment and logs could normal.
The NginxProxy resource reflects resources value but not
So a couple of observations
What am I doing wrong in terms of testing ? @sjberman @nowjean |
@salonichf5 @sjberman Thanks for testing! Please refer to below guide and review my PR again. I've patched Makefile generate-crds
This option off the description of CRDs. Because, new nginxProxy manifest file occurs
(In my case, I had to upgrade my runc version to build ngf docker images.)
End-users can create multiple Gateways, and each one needs its own HPA, so the logic now lives in the Gateway resource. Plus, I'm not sure about this part:
Normally, we assume that end users already have the Metrics Server running if they're using HPA or similar features. But maybe it's worth adding a note in the docs to avoid confusion. |
c57e992
to
e8399d9
Compare
@sjberman I rebased upstream main branch and resolved conflicts.
and then,
It was super complicated task for me. Anyway, I think hpa branch rebased successfully and resolved all conflicts. |
@@ -460,6 +466,49 @@ type DaemonSetSpec struct { | |||
Patches []Patch `json:"patches,omitempty"` | |||
} | |||
|
|||
// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))" | |||
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas" | |||
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to use CEL validation. You can just set minimum and maximum directly on the field. See other int values as examples in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be fixed (and the comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be specific, we just need to remove the validation for the values that specify max/min, and convert those to max/min kubebuilder tags. The other validation for "at least one metric" can stay.
// +kubebuilder:validation:XValidation:message="at least one metric must be specified when autoscaling is enabled",rule="!self.enabled || (has(self.targetCPUUtilizationPercentage) || has(self.targetMemoryUtilizationPercentage) || (has(self.autoscalingTemplate) && size(self.autoscalingTemplate) > 0))" | ||
// +kubebuilder:validation:XValidation:message="minReplicas must be less than or equal to maxReplicas",rule="self.minReplicas <= self.maxReplicas" | ||
// +kubebuilder:validation:XValidation:message="CPU utilization must be between 1 and 100",rule="!has(self.targetCPUUtilizationPercentage) || (self.targetCPUUtilizationPercentage >= 1 && self.targetCPUUtilizationPercentage <= 100)" | ||
// +kubebuilder:validation:XValidation:message="memory utilization must be between 1 and 100",rule="!has(self.targetMemoryUtilizationPercentage) || (self.targetMemoryUtilizationPercentage >= 1 && self.targetMemoryUtilizationPercentage <= 100)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
@nowjean We're looking at doing a release in the next couple of weeks, and would love to include these changes. If you're schedule is too busy to focus on this, that's totally okay. Let us know, and we can also look into taking over the remaining pieces of this work. |
@sjberman Cool, I think 2 or 3 weeks should do it. I'm on it! |
@nowjean Thanks, we would need this merged in by the end of next week. If that's not enough time for you, let me know and we can take over. |
I have pushed the feature. Please check it. It seems to be working correctly in my cluster. |
Sorry to butt in this PR but I just found this effort and have to say I really like what I'm seeing. Keda itself likes to create and manage the HPA so it can set the scale target etc by itself. Is there a way to incorporate a change either here or as a follow-up that basically would remove the necessity to provide the original number of replicas and turn off the NginxProxy-led scaling, so we can utilize Keda for it? This means the If that is not an option, I believe we can utilize the option to transfer ownership of the HPA to Keda to still make it work as long as we know the name of the HPA. |
@michasHL Also, the NGINX Ingress Controller Helm chart provides options for either HPA or KEDA. Additionally, KEDA requires a separate controller to be installed, whereas HPA is built into Kubernetes by default. This means KEDA is not a drop-in replacement for HPA. Therefore, I think if we need KEDA-specific features, we should implement them in a follow-up after this PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need #3492 (comment) and #3492 (comment) to be addressed.
Also need a rebase on main (and any conflicts fixed). Almost done, nice work!
@@ -615,7 +615,24 @@ func (p *NginxProvisioner) buildNginxDeployment( | |||
} | |||
|
|||
if deploymentCfg.Replicas != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if this replicas field is nil (which happens when a user doesn't specify anything in the NginxProxy resource for this field), we should still perform the HPA logic you added below.
So basically you'll have
var replicas *int32
if deploymentCfg.Replicas != nil {
replicas = deploymentCfg.Replicas
}
if isAutoscalingEnabled...
deployment.Spec.Replicas = replicas
if err == nil && hpa.Status.DesiredReplicas > 0 { | ||
// overwrite with HPA's desiredReplicas | ||
replicas = helpers.GetPointer(hpa.Status.DesiredReplicas) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a unit test for this case?
Absolutely, didn't want to diminish any work you've already done on this. Lucky for us, Keda introduced the concept of taking ownership of an existing HPA as long we know the name of it. So, I think our use-case will be covered by what you already have. Keep up the good work and I'm looking forward to seeing this change in the next release :) |
@michasHL By owning the HPA, does KEDA need to directly edit its config? Just want to clarify, since that would require some thought on how to implement, since right now every resource we provision is owned by our control plane, and can't be edited out of band (meaning that KEDA wouldn't be able to edit the HPA). |
Proposed changes
Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:
Problem: I want NGF to work with a HorizontalPodAutoscaler
Solution: Add HPA for deployement
Testing: Describe any testing that you did.
I've deployed my AKS cluster and checked hpa working correctly.
Closes #3447
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.