Description
What happened:
I have two Kratos services that call each other through gRPC. My gRPC connection is established via a domain name, and there are multiple IPs behind this domain name, which is used for load balancing. When one of the IPs associated with this domain name goes offline, the health check of the gRPC client in Kratos will cause the other service to be inaccessible for a short period of time, but in fact, it is still accessible.
What you expected to happen:
Even if a certain IP behind the domain name goes offline, the connection should not be made unavailable. Specifically, since in this situation, the connection is usually made through only one domain name, is it possible that in the gRPC load balancing of Kratos, when it is determined that there is only one node, the load balancing can be skipped (in this way, the health check will also be avoided)
How to reproduce it (as minimally and precisely as possible):
Start two Kratos services, communicate through gRPC, with the endpoint being a domain name. This domain name points to multiple IPs. Then, keep sending requests to Kratos. At this time, randomly take one of the IPs corresponding to the domain name offline, and then the issue can be reproduced.
Anything else we need to know?:
This is caused by the health check in the grpc load balancing. Can the load balancing not take effect when there is only one node, or can users be allowed to choose the strategy by themselves?
Environment:
- Kratos version (use
kratos -v
): 2.x - Go version (use
go version
): 1.23.x - OS (e.g:
cat /etc/os-release
): mac - Others: