Skip to content

[BUG] The problem of health check in gRPC service-to-service invocation #3669

Open
@KenyeeC

Description

@KenyeeC

What happened:

I have two Kratos services that call each other through gRPC. My gRPC connection is established via a domain name, and there are multiple IPs behind this domain name, which is used for load balancing. When one of the IPs associated with this domain name goes offline, the health check of the gRPC client in Kratos will cause the other service to be inaccessible for a short period of time, but in fact, it is still accessible.

What you expected to happen:

Even if a certain IP behind the domain name goes offline, the connection should not be made unavailable. Specifically, since in this situation, the connection is usually made through only one domain name, is it possible that in the gRPC load balancing of Kratos, when it is determined that there is only one node, the load balancing can be skipped (in this way, the health check will also be avoided)

How to reproduce it (as minimally and precisely as possible):

Start two Kratos services, communicate through gRPC, with the endpoint being a domain name. This domain name points to multiple IPs. Then, keep sending requests to Kratos. At this time, randomly take one of the IPs corresponding to the domain name offline, and then the issue can be reproduced.

Anything else we need to know?:

This is caused by the health check in the grpc load balancing. Can the load balancing not take effect when there is only one node, or can users be allowed to choose the strategy by themselves?

Environment:

  • Kratos version (use kratos -v): 2.x
  • Go version (use go version): 1.23.x
  • OS (e.g: cat /etc/os-release): mac
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions