Skip to content

Time out while deleting Security Group #4272

@andreambrosioBS

Description

@andreambrosioBS

Every time we delete an ingress of type alb, if the sg is also being managed by the load balancer controller, we get a couple of errors saying that there as a timeout deleting the SG.

aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:02:59Z","logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"error","ts":"2025-07-22T12:03:10Z","msg":"Reconciler error","controller":"ingress","object":{"name":"XXXXXX"},"namespace":"","name":"XXXXXXXX","reconcileID":"b281e993-dfa6-451e-863c-bd7a6f0e7fe2","error":"failed to delete securityGroup: timed out waiting for the condition"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"error","ts":"2025-07-22T12:03:21Z","msg":"Reconciler error","controller":"ingress","object":{"name":"XXXXX"},"namespace":"","name":"XXXXXXXXX","reconcileID":"e83ee3c8-2e10-42ac-9119-650364208340","error":"failed to delete securityGroup: timed out waiting for the condition"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:03:23Z","logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}
aws-load-balancer-controller-548d884fcf-2qt7s aws-load-balancer-controller {"level":"info","ts":"2025-07-22T12:03:29Z","logger":"controllers.ingress","msg":"deleted securityGroup","securityGroupID":"sg-xxxxxxxxxxxxxxxxxx"}

Looking at the cloudtrail logs, the security groups is not able to be deleted for some time because it still has dependent resources. After a couple of seconds the SG is deleted, I guess AWS is waiting for the LB to be completely deleted.
It usually takes about 30s to delete the SG, and we get a timeout error every 10s.

My question is is this normal behavior?
If it is, why is this an error, an not a warning?
Can we increase the timeout value for resource deletion? I checked the documentation but did not find any way to do this.

We want to create some alerts based on the prometheus metric reconcile error rate, but this will throw a lot of false positives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions