-
Notifications
You must be signed in to change notification settings - Fork 163
[mysql] add SSH keepalive #3494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds SSH keepalive functionality to the MySQL connector to detect and handle connection failures. When the SSH tunnel becomes unresponsive, the system can now automatically close the database connection and log the failure.
- Implements SSH keepalive mechanism with configurable interval (15 seconds)
- Adds connection monitoring and automatic cleanup on SSH tunnel failure
- Enhances MySQL connector to respond to SSH keepalive failures
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| flow/connectors/utils/ssh.go | Adds keepalive channels, goroutine with periodic SSH requests, and channel management |
| flow/connectors/mysql/mysql.go | Integrates SSH keepalive monitoring into MySQL connector's connection management loop |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
flow/connectors/utils/ssh.go
Outdated
| _, _, err := tunnel.Client.SendRequest("[email protected]", true, nil) | ||
| if err != nil { | ||
| logger.Error("Failed to send keep alive", slog.Any("error", err)) | ||
| // don't close it, otherwise a select can read from channel forever until we error out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selecting a closed channel immediately returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just clarified comment, I don't want this behaviour as it'll keep trying to close MySQL connection while the rest of the system catches up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could close it, then nil out field in tunnel, so later calls to GetKeepaliveChan returns nil, then in code check that if function returns nil connection is down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How we can autotest this:
-
ToxiProxy + SSH + MySQL peer
- an integration test doing a ping (succeeds), starting a
SELECT SLEEP(60)query, sleeping for a few sec so it starts,proxy.Disable(), the query should return with an error in less than 60 seconds total. If the keepalive interval is configurable, the test can be shorter than 60s. #3493 adds a toxi for Postgres as an example SELECT SLEEP(10)with a very short keepalive finishes successfully
- an integration test doing a ping (succeeds), starting a
-
Unit tests for the channel logic, with a mock SSH client:
- multiple GetKeepaliveChan calls return the same channel
- keepalive stops on Close
- keepalive stops on context cancel
- nil client returns nil channel
- no goroutine leak after multiple start/stop cycles (aka many qrep partitions)
Code Opus is great for both docker stuff and mock stuff
flow/connectors/utils/ssh.go
Outdated
| for { | ||
| select { | ||
| case <-ticker.C: | ||
| _, _, err := tunnel.Client.SendRequest("[email protected]", true, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when the receiver is a black hole? I don't think we're setting a net.Conn deadline (different from Dialer deadline) explicitly anywhere so this might hang indefinitely (our reference extends the deadline on every request). Alternatively, can do time.After concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the reference, we don't have a handle on the underlying net.Conn and having one is non-trivial because the MySQL connector can create multiple connections with the dialer we provide.
Can revisit this if we see the failure mode in practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd have a time.After goroutine that closes connection if request is blocked too long
5205997 to
b0a3aa3
Compare
2adcd84 to
8902e56
Compare
8e1ac30 to
e45b4d1
Compare
No description provided.