-
Notifications
You must be signed in to change notification settings - Fork 4.6k
grpc: Move channel to TRANSIENT_FAILURE on resolver creation failure #8643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8643 +/- ##
==========================================
- Coverage 79.45% 78.50% -0.95%
==========================================
Files 415 415
Lines 41339 41556 +217
==========================================
- Hits 32844 32623 -221
- Misses 6621 6843 +222
- Partials 1874 2090 +216
🚀 New features to boost your workflow:
|
|
Is it possible to maintain the existing behavior for And it seems like we need a regression test for the case where Also, will the channel still go back into idle mode correctly upon the idle timeout being reached? We might want to have a test for this, too. |
|
I remember us discussing (with Eric) the option of And if we have a utility for the
What is the benefit of transitioning back to idle in this case? |
That would be a large API change that I'd rather not mess with unless it's part of #6472. This effectively does exactly that when a resolver returns an error, so I think it's completely equivalent in functionality.
Consistency, I guess? The resolver shouldn't be doing non-static things while building, so it should presumably continue to fail to build no matter what. But we also don't document this, really, so it's also possible users are doing weird things in custom resolvers, and having a later opportunity to succeed might be important to them. |
ddf60e3 to
a14e7c5
Compare
|
After investigating, I confirmed that RPCs already fail with the correct error message. The connection is attempted when the stream creation function calls exitIdle, as seen here: Lines 180 to 186 in 2d92271
Therefore, no changes are needed to propagate the error. While cc.Connect itself cannot return an error due to its signature, any resolver build failure is correctly surfaced to the user during the next RPC. |
When a resolver fails to be created, the current behavior is inconsistent and makes debugging difficult (see discussion on #8602 (comment)):
Dial: Immediately returns an error.NewClient: Logs anINFOmessage only when an RPC is attempted, while the channel remains in anIDLEstate. This can hide the underlying configuration issue.This change unifies the behavior by transitioning the channel to
TRANSIENT_FAILUREupon a resolver creation failure. As a result, RPCs will fail immediately with the specific error produced by the resolver builder, making the problem easier to diagnose.RELEASE NOTES:
TRANSIENT_FAILUREinstead of stayingIDLE.Dialfunction no longer returns an error on resolver creation failure, instead the channel is created and it entersTRANSIENT_FAILUREstate.