-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Every time we delete an ingress of type alb, if the sg is also being managed by the load balancer controller, we get a couple of errors saying that there as a timeout deleting the SG.
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:02:59Z","logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"error","ts":"2025-07-22T12:03:10Z","msg":"Reconciler error","controller":"ingress","object":{"name":"XXXXXX"},"namespace":"","name":"XXXXXXXX","reconcileID":"b281e993-dfa6-451e-863c-bd7a6f0e7fe2","error":"failed to delete securityGroup: timed out waiting for the condition"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"error","ts":"2025-07-22T12:03:21Z","msg":"Reconciler error","controller":"ingress","object":{"name":"XXXXX"},"namespace":"","name":"XXXXXXXXX","reconcileID":"e83ee3c8-2e10-42ac-9119-650364208340","error":"failed to delete securityGroup: timed out waiting for the condition"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:03:23Z","logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:03:29Z","logger":"controllers.ingress","msg":"deleted securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}
Looking at the cloudtrail logs, the security groups is not able to be deleted for some time because it still has dependent resources. After a couple of seconds the SG is deleted, I guess AWS is waiting for the LB to be completely deleted.
It usually takes about 30s to delete the SG, and we get a timeout error every 10s.
My question is is this normal behavior?
If it is, why is this an error, an not a warning?
Can we increase the timeout value for resource deletion? I checked the documentation but did not find any way to do this.
We want to create some alerts based on the prometheus metric reconcile error rate, but this will throw a lot of false positives.