-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat(mysql): add compression #4111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
abonander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have more nits later, but there's already quite a bit here to address.
| ) -> Self { | ||
| match compression { | ||
| #[cfg(feature = "compression")] | ||
| Some(c) if c.is_supported(&capabilities) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to emit a debug log here if the user enabled compression but the server didn't support it.
| name = "mysql" | ||
| path = "tests/mysql/mysql.rs" | ||
| required-features = ["mysql"] | ||
| required-features = ["mysql", "compression"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of requiring the compression feature, you should either just #[cfg] out those tests or create a new test target.
| zstd = { version = "0.13.3", optional = true, default-features = false, features = ["zdict_builder"] } | ||
| flate2 = { version = "1.1.5", optional = true, default-features = false, features = ["rust_backend", "zlib"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better if these were under separate features. Most of the time, the user should know which compression algorithms their server supports and it can be pretty annoying to deal with building dependencies you don't need.
| #[cfg(all( | ||
| not(any( | ||
| mariadb = "verylatest", | ||
| mariadb = "10_6", | ||
| mariadb = "10_11", | ||
| mariadb = "11_4", | ||
| mariadb = "11_8", | ||
| )), | ||
| feature = "mysql" | ||
| ))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually just be #[cfg(not(mariadb))].
The file itself requires feature = "mysql" and specifying individual versions is going to be too difficult to keep up to date.
| #[cfg(all( | ||
| not(any( | ||
| mariadb = "verylatest", | ||
| mariadb = "10_6", | ||
| mariadb = "10_11", | ||
| mariadb = "11_4", | ||
| mariadb = "11_8", | ||
| )), | ||
| feature = "mysql" | ||
| ))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| /// let options = MySqlConnectOptions::new() | ||
| /// .compression(Compression::Zlib.fast()); | ||
| /// ``` | ||
| pub fn compression(mut self, compression: CompressionConfig) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably important to specify that the compression level setting only takes effect on the server for Zstd.
For zlib compression, this only affects outgoing packets; the server hardcodes a compression level of 6: https://github.com/mysql/mysql-server/blob/056a391cdc1af9b17b5415aee243483d1bac532d/sql/auth/sql_authentication.cc#L3238
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing is that this API design doesn't allow the connection to fall back to Zlib compression if the server doesn't support Zstd.
If the application developer doesn't know if the server supports Zstd or not, they might prefer the connection to fall back to Zlib if at all possible (especially if it's getting built anyway).
Thus, it might be better to allow the user to set the compression levels for Zlib and Zstd separately (or disable one or the other entirely).
| let compressed_payload: CompressedPacket<Bytes> = buffered_socket | ||
| .read_with(compressed_payload_length, compressed_context) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something we need to be careful with is that compressing or decompressing a sufficiently large packet at high compression levels could block long enough to cause noticeable hiccups with the executor.
This is something we've already run into with SHA-256 hashing during authentication (albeit, in Postgres): #4006 (comment)
We need to yield roughly every 10-100 microseconds to keep the runtime happy.
If we look at some results from lzbench (which Zstd links to on their own website): https://github.com/inikep/lzbench?tab=readme-ov-file#benchmarks
The numbers are highly dependent on algorithm compression level, of course, but worst case is Zstd at compression level 22 which gives a paltry 2MB/s. That means in 100 microseconds it only compresses 200 bytes. And that's also with some brutal diminishing returns on compression ratio, so I'd hesitate to ever recommend it for anything but offline compression of files for archival.
So depending on the compression settings, actually, it may not even take a very large packet at all to trigger a noticeable hiccup. (Yes, those results came from a server processor with lower clock speeds than you can get on a desktop, but that's kind of our primary target anyway.)
Zlib's max compression level does 10 MB/s and that also seems to correspond roughly to Zstd at level 15, so we could just yield every 1 KB which would be 100 microseconds.
At lower compression levels that could technically be yielding too often, but even at 100 MB/s that's yielding every 10 microseconds, which is the low end of our target range anyway.
Alternatively, we could check the elapsed time every 1KB and yield if it's been more than 100 microseconds. Or we could just hardcode a lookup table based on compression algorithm and level.
Decompression is significantly faster across the board, so there we could yield every 10 or 100 KB instead.
And I would probably make it 1/10/100 kebibytes (multiples of 1024 bytes) for alignment purposes even though I've been using KB/MB here since that's what the benchmark results are listed in.
| pub fn default(self) -> CompressionConfig { | ||
| match self { | ||
| Compression::Zlib => CompressionConfig(self, 5), | ||
| Compression::Zstd => CompressionConfig(self, 11), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL Server defaults to 6 and 3, respectively: https://github.com/mysql/mysql-server/blob/056a391cdc1af9b17b5415aee243483d1bac532d/mysys/my_compress.cc#L354-L357
I don't know what performance testing they did to arrive at those numbers, but it's worth assuming they chose those for a reason.
| pub fn best(self) -> CompressionConfig { | ||
| match self { | ||
| Compression::Zlib => CompressionConfig(self, 9), | ||
| Compression::Zstd => CompressionConfig(self, 22), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://github.com/launchbadge/sqlx/pull/4111/files#r2553775951
The performance compared to compression ratio for level 22 is so bad that it should probably never be used in a real-time or interactive application.
At the very least, I would warn the user that there's a significant speed tradeoff here. At this level, it's highly likely that the compression itself becomes a bottleneck instead of the network link. Then you're just wasting time and CPU cycles.
Does your PR solve an issue?
Fixes #186.
This PR adds support for traffic compression in MySQL & MariaDB.
Data can now be compressed using either zstd or zlib, when the server supports the selected algorithm.
MariaDB supports zstd only.
Compression can be enabled by adding a parameter to the connection URL:
compression=<algorithm>:<level>For example:
Or programmatically:
mysql-compressionfeature flag.Is this a breaking change?
No. This is not a breaking change.
It extends the existing MySQL connection settings by introducing a new optional compression parameter.