Skip to content

mcp/streamable: add resumability for the Streamable transport #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

samthanawalla
Copy link
Contributor

@samthanawalla samthanawalla commented Jul 14, 2025

@samthanawalla samthanawalla requested a review from jba July 14, 2025 19:29
@samthanawalla samthanawalla marked this pull request as ready for review July 14, 2025 19:29
Copy link
Contributor

@jba jba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First preliminary review. With Rob out, I want to really understand this. That may take a day or two.

@samthanawalla samthanawalla force-pushed the resumability branch 2 times, most recently from 5a2d5aa to 9d77544 Compare July 15, 2025 16:43
This CL implements a retry mechanism to resume SSE streams to recover
from network failures.
}

// The stream was interrupted or ended by the server. Attempt to reconnect.
newResp, reconnectErr := s.reconnect(lastEventID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is fine.

s.mu.Lock()
s.err = reconnectErr
s.mu.Unlock()
s.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could handleSSE return an error so we can propagate the error from Close?

// handleSSE manages the entire lifecycle of an SSE connection. It processes
// an incoming Server-Sent Events stream and automatically handles reconnection
// logic if the stream breaks.
func (s *streamableClientConn) handleSSE(initialResp *http.Response) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very clean!

case <-s.done:
return nil, fmt.Errorf("connection closed by client during reconnect")
case <-time.After(calculateReconnectDelay(s.ReconnectOptions, attempt)):
resp, reconnectErr := s.establishSSE(lastEventID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err again

return
case s.incoming <- evt.Data:

if !isResumable(resp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may also want to try resuming if the error from establishSSE is non-nil. For example, if the network is partitioned, that might manifest as a timeout error instead of an HTTP response. But we can leave that for a later PR.

serverClosed.Add(1)
close(serverReadyToKillProxy)
// Wait for the test to kill the proxy before sending the rest.
serverClosed.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd wait for the channel to be closed here (<-serverClosed)

proxyAddr := proxy.Listener.Addr().String() // Get the address to restart it later.

// 3. Configure the client to connect to the proxy with default options.
clientTransport := NewStreamableClientTransport(proxy.URL, &StreamableClientTransportOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think nil should give the same behavior as this: retry should be the default, since it doesn't cost anything. Not sure what the default MaxRetries should be, but 5 seems OK.

}

// Perform handshake.
initReq := &jsonrpc.Request{ID: jsonrpc2.Int64ID(100), Method: "initialize", Params: mustMarshal(t, &InitializeParams{})}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have to happen by hand?
I guess what I mean is, can you write this test with a Client instead of at the transport level? If you need to reach into the transport for something, maybe you could add test hooks to the transport code. That may ultimately be cleaner that re-writing the init protocol.

t.Log("--- Killing proxy to simulate network failure ---")
proxy.CloseClientConnections()
proxy.Close()
serverClosed.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you'd close the channel.

if err != nil {
t.Fatalf("Failed to read notification: %v", err)
}
if req, ok := msg.(*jsonrpc.Request); ok && req.Method == "notifications/progress" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also be an error if you see a non-notification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants