-
Notifications
You must be signed in to change notification settings - Fork 78
mcp/streamable: add resumability for the Streamable transport #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
mcp/streamable: add resumability for the Streamable transport #133
Conversation
891161c
to
abc4c9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First preliminary review. With Rob out, I want to really understand this. That may take a day or two.
5a2d5aa
to
9d77544
Compare
9d77544
to
881274b
Compare
This CL implements a retry mechanism to resume SSE streams to recover from network failures.
881274b
to
649f399
Compare
} | ||
|
||
// The stream was interrupted or ended by the server. Attempt to reconnect. | ||
newResp, reconnectErr := s.reconnect(lastEventID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
is fine.
s.mu.Lock() | ||
s.err = reconnectErr | ||
s.mu.Unlock() | ||
s.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could handleSSE return an error so we can propagate the error from Close?
// handleSSE manages the entire lifecycle of an SSE connection. It processes | ||
// an incoming Server-Sent Events stream and automatically handles reconnection | ||
// logic if the stream breaks. | ||
func (s *streamableClientConn) handleSSE(initialResp *http.Response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very clean!
case <-s.done: | ||
return nil, fmt.Errorf("connection closed by client during reconnect") | ||
case <-time.After(calculateReconnectDelay(s.ReconnectOptions, attempt)): | ||
resp, reconnectErr := s.establishSSE(lastEventID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
again
return | ||
case s.incoming <- evt.Data: | ||
|
||
if !isResumable(resp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may also want to try resuming if the error from establishSSE is non-nil. For example, if the network is partitioned, that might manifest as a timeout error instead of an HTTP response. But we can leave that for a later PR.
serverClosed.Add(1) | ||
close(serverReadyToKillProxy) | ||
// Wait for the test to kill the proxy before sending the rest. | ||
serverClosed.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd wait for the channel to be closed here (<-serverClosed
)
proxyAddr := proxy.Listener.Addr().String() // Get the address to restart it later. | ||
|
||
// 3. Configure the client to connect to the proxy with default options. | ||
clientTransport := NewStreamableClientTransport(proxy.URL, &StreamableClientTransportOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nil
should give the same behavior as this: retry should be the default, since it doesn't cost anything. Not sure what the default MaxRetries should be, but 5 seems OK.
} | ||
|
||
// Perform handshake. | ||
initReq := &jsonrpc.Request{ID: jsonrpc2.Int64ID(100), Method: "initialize", Params: mustMarshal(t, &InitializeParams{})} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have to happen by hand?
I guess what I mean is, can you write this test with a Client
instead of at the transport level? If you need to reach into the transport for something, maybe you could add test hooks to the transport code. That may ultimately be cleaner that re-writing the init protocol.
t.Log("--- Killing proxy to simulate network failure ---") | ||
proxy.CloseClientConnections() | ||
proxy.Close() | ||
serverClosed.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you'd close the channel.
if err != nil { | ||
t.Fatalf("Failed to read notification: %v", err) | ||
} | ||
if req, ok := msg.(*jsonrpc.Request); ok && req.Method == "notifications/progress" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it also be an error if you see a non-notification?
This CL implements a retry mechanism to resume SSE streams to recover from network failures.
For #10
I referenced