-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.priority/backlogHigher priority than priority/awaiting-more-evidence.Higher priority than priority/awaiting-more-evidence.
Description
What steps did you take and what happened?
I spawned an AWS workload cluster using CAPA v2.8.3 and the following flags set:
--feature-gates=EKS=true,EKSEnableIAM=true,EKSAllowAddRoles=false,EKSFargate=false,MachinePool=false,EventBridgeInstanceState=false,AutoControllerIdentityCreator=true,BootstrapFormatIgnition=false,ExternalResourceGC=false
Every 15m or so (it depends, sometime it can go up to 20m), I get the following message form the MachineHealthcheck controller:
E0616 07:31:16.533327 1 machinehealthcheck_controller.go:221] "Error creating remote cluster cache" err="error getting client: connection to the workload cluster is down" controller="machinehealthcheck" controllerGroup="cluster.x-k8s.io" controllerKind="MachineHealthCheck" MachineHealthCheck="flux-system/t00-use1-eks-test" namespace="flux-system" name="t00-use1-eks-test" reconcileID="420722f1-5b72-4829-9340-8a9c27537fd4" Cluster="flux-system/t00-use1-eks-test"
E0616 07:31:16.535055 1 machineset_controller.go:1218] "Unable to retrieve Node status" err="error getting client: connection to the workload cluster is down" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27" namespace="flux-system" name="t00-use1-eks-test-us-east-1a-md0-f6f27" reconcileID="4c4135df-6d55-4590-8751-b4af5d8b8982" Cluster="flux-system/t00-use1-eks-test" MachineDeployment="flux-system/t00-use1-eks-test-us-east-1a-md0" Machine="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27-gpm5b" Node=""
E0616 07:31:16.564003 1 machineset_controller.go:1218] "Unable to retrieve Node status" err="error getting client: connection to the workload cluster is down" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27" namespace="flux-system" name="t00-use1-eks-test-us-east-1a-md0-f6f27" reconcileID="e5a1e000-b16f-4d8a-9c75-6af4c7be8b9f" Cluster="flux-system/t00-use1-eks-test" MachineDeployment="flux-system/t00-use1-eks-test-us-east-1a-md0" Machine="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27-gpm5b" Node=""
E0616 07:31:16.614978 1 machineset_controller.go:1218] "Unable to retrieve Node status" err="error getting client: connection to the workload cluster is down" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27" namespace="flux-system" name="t00-use1-eks-test-us-east-1a-md0-f6f27" reconcileID="f0952489-d550-40bc-bbe5-fb1bfb72bbd1" Cluster="flux-system/t00-use1-eks-test" MachineDeployment="flux-system/t00-use1-eks-test-us-east-1a-md0" Machine="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27-gpm5b" Node=""
E0616 07:31:16.628004 1 machineset_controller.go:1218] "Unable to retrieve Node status" err="error getting client: connection to the workload cluster is down" controller="machineset" controllerGroup="cluster.x-k8s.io" controllerKind="MachineSet" MachineSet="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27" namespace="flux-system" name="t00-use1-eks-test-us-east-1a-md0-f6f27" reconcileID="8c2fd112-b159-438e-9c7e-e2b9ab38191a" Cluster="flux-system/t00-use1-eks-test" MachineDeployment="flux-system/t00-use1-eks-test-us-east-1a-md0" Machine="flux-system/t00-use1-eks-test-us-east-1a-md0-f6f27-gpm5b" Node=""
As far as I checked, I don't have any operational impact.
I didn't have this error using v1.7.9, it seems to appear starting v1.8.0.
What did you expect to happen?
If this is a real error with token renewal logic or the updated cache component, then try to fix it.
If it's not, suppress the message.
Cluster API version
v1.9.8
Kubernetes version
No response
Anything else you would like to add?
I tried to play with sync-period
on both capa-controller-manager
and capi-controller-manager
without any luck.
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.priority/backlogHigher priority than priority/awaiting-more-evidence.Higher priority than priority/awaiting-more-evidence.